SurveyGizmo, R, and MySQL on Amazon AWS

Those who have been in any sort of sociological research field should very familiar with the available survey platforms out on the web now (e.g., SurveyMonkeySurveyGizmo, or LimeSurvey). Getting your results usually involves a multi-step generate/export/import cycle. Is there a better way?

I asked the question when using R to digest a survey deployed on SurveyGizmo. With so many R packages out there, I had a hunch there was something to help me get my results from SG into R without having to run through the generate/export/import cycle. Enter RSurveyGizmo, a package that does exactly that.

Beyond aggregates and analytics, the survey results in SurveyGizmo should be stored elsewhere for future use. This raises more questions about ETL from the website itself to your database of choice. In this case, let’s assume we have a MySQL database running on Amazon AWS. I recommend this over a MSSQL instance because of the difficulty of using an ODBC connection on anything other than Windows (but it can be done).

Assumptions

  • SurveyGizmo account with surveys already active
  • MySQL database established on Amazon AWS
  • You know your host, port, dbname, username, and password for your MySQL database on Amazon AWS
  • R version 3.4.2

Part I: SurveyGizmo

  1. Log into your SurveyGizmo account and head over to your API access options. Find that under Account > Integrations > Manage API.
  2. If you don’t have an active API key listed, Create an API Key. You will then see the API key listed for your user account. Copy that key to a text editor, as you will need it momentarily.
  3. Go back to your SurveyGizmo home page and view the surveys you have out there. Choose one and click on it.
  4. You’ll be taken to the survey build page and the address will be something like https://app.surveygizmo.com/builder/build/id/xxxxxxx where xxxxxxx is a unique number. Copy that number to a text editor, as you will need it momentarily too.

Part II: R + SurveyGizmo

  1. Install RSurveyGizmo via devtools.
    library(devtools)
    install_github(repo="DerekYves/rsurveygizmo")
  2. Construct the script to grab your survey. You will need the API key and survey number.
    library(Rsurveygizmo)
    api <- "your_api_key"
    my.data <- pullsg(survey_number, api, completes_only=T)
  3. You will see loading progress and, depending on the size of your survey, will have a frame full of data in just a few moments. (Sometimes I get a JSON error, but it resolves itself in a few minutes.) SurveyGizmo does have API call limits, so please be judicious with how many times you do this. It’s generally good to run the process once you have enough data to start writing your analytics scripts, then again once the survey is closed.
  4. This is the simplest of the methods in the RSurveyGizmo package. You will want to explore the package documentation to learn all it can do for you.

Part III: R + MySQL

  1. Install the RMySQL package via your package loader.
  2. Construct the script to establish your connection, filling in your specific details.
    # load RMySQL
    library(RMySQL)
    
    # establish the MySQL connection
    con <- dbConnect(RMySQL::MySQL(),
     username = "user",
     password = "password",
     host = "name.something.zone.rds.amazonaws.com",
     port = 3306,
     dbname = "mydb"
    )
  3. Now con will serve as your pipeline for the RMySQL calls.
  4. Two common methods are dbWriteTable and dbSendQuery. As you might expect, to write an R data frame to a table in your MySQL database, you use dbWriteTable:
    dbWriteTable(con, "table_name", dataframe.name, overwrite=TRUE)

    Using overwrite=TRUE means your table is essentially dropped and recreated, rather than appended.
    To get an existing MySQL table into a new R data frame, you’d use dbSendQuery:

    newframe = dbSendQuery(con, "SELECT * FROM mydb.mytable")
  5. Here’s a wrinkle, though. SurveyGizmo downloads come with concatenated column names that may not be very helpful. I prefer to convert all my column names to a standard format and establish a reference table with all the original questions matched up. The following script grabs all the column names from an existing data frame and creates a table with a standard “qxxx” format matched to the original question name.
    # get question text into vector
    Question_Text <- colnames(mydata.original)
    
    # get length of that vector
    sq <- length(Question_Text)
    
    # generate sequence based on that length
    QKey <- sprintf("q%03d",seq(1:sq))
    
    # make a new data frame with the QKeys matched to the original question text
    mydata.questions <- data.frame(QKey, Question_Text)
    
    # replace original question text with the those keys
    colnames(mydata.original) = as.character(QKey);

    Now you have two frames: mydata.original with uniform column names, and mydata.questions with those column names matched to the original text.

    Assuming you want to get those frames into your MySQL database, use the following:

    dbWriteTable(con, "mydata_questions",mydata.questions, overwrite=TRUE)
    dbWriteTable(con, "mydata_original",mydata.original, overwrite=TRUE)

R, Azure SQL Server, and Mac OS X

There are a few options out there for those of us who work (mostly or solely) on a Mac and need to access MSSQL databases through R.

 

RODBC and RSqlServer are the two I’ve worked with. RODBC requires some additional workarounds with a Unix ODBC driver; RSqlServer has issues with rJava in OS X High Sierra. I’ve found RODBC to require the least amount of workarounds and frustration. Another wrinkle here is having an Azure SQL Server as the endpoint with SQL Authentication (not Windows Auth).

 

First, you will need to update your Mac with Homebrew then add a few packages. This page discusses how but did not work for me out of the box. There are extra steps.

 

1. Install HomeBrew via Terminal

 

2. Install unixODBC via Terminal
brew update
brew install unixodbc

 

3. Install freeTDS via Terminal

brew install freetds --with-unixodbc

 

4. Add to /usr/local/etc/odbcinst.ini
[MSSQL]
 
Description   = Microsoft SQL Server driver
 
Driver        = /usr/local/Cellar/freetds/0.95.18/lib/libtdsodbc.so

 

5. Add to /usr/local/etc/freetds.conf
[MY_SQL_SERVER] 
host = myazureserver.database.windows.net
 
port = 1433
 
tds version = 7.0

 

6. Add to /usr/local/etc/odbc.ini
[myazureserver]
Driver=/usr/local/lib/libtdsodbc.so
Trace=No
Server=myazureserver.database.windows.net
Port=1433
TDS_Version=8.0
Database=myazuredatabase

 

7. In Terminal, enter (with user and pass replaced by your credentials):
isql -v myazureserver user pass
You should see a success message and a new prompt. See the Connecting section on the page for what it should look like. Type quit to exit that SQL shell.

 

8. If that is successful, you know the system-level configuration is complete. Run the following commands in Terminal to create symbolic links:
ln -vs /usr/local/Cellar/freetds/0.95.18/etc/freetds.conf ~/.freetds.conf
ln -vs /usr/local/Cellar/unixodbc/2.3.2_1/etc/odbc.ini ~/.odbc.ini
ln -vs /usr/local/Cellar/unixodbc/2.3.2_1/etc/odbcinst.ini ~/.odbcinst.ini
9. In R, run this script (after replacing user and pass with your credentials)
# install RODBC package (can comment this out once run)
install.packages("RODBC", type = "source")

# call RODBC package
library(RODBC)

# create a connection
mycon <- odbcConnect("myazureserver", uid="user", pwd="pass")

# see what it looks like:
mycon 

# Select the top 100 records from table dbo.Table and load into dataframe "rs"
rs <- sqlQuery(mycon, "SELECT TOP (100) * FROM dbo.Table")

 

Now you should have a data frame named “rs” with 100 rows of data.

The Courage to Step Back

On February 26, 2008, Starbucks stores across the country closed for 3.5 hours for what CEO Howard Schultz characterized as “a reaffirmation of [their] coffee leadership.” An estimated $4-6 million in sales were lost, rival coffee stores offered promotions taking advantage of the competitor’s down time, and reactions were wildly mixed. This was an incredibly bold move in the midst of the Great Recession. Just when consumers needed signs of confidence from their trusted brands, a staple goes dark? After expanding at a breakneck speed, why was Starbucks stepping back?

I had very little insight into what went into this decision before reading Schultz’s book, Onward. In it, he explains how the drive to grow had overtaken the fundamentals of the company. In an environment of increasing demands at the front line, Starbucks had fallen into bad practices, even with the best of intentions. It became necessary to take a step back. It was time to refocus, retrain, and recommit to the Starbucks Experience. As Jon Picoult notes, Schultz did not view this as a cost—it was “a smart investment in the education of his employees.” Beyond that, it was a courageous move . . . one that ultimately worked out in the company’s favor.

In any business environment, the prospect of shutting down and stepping back from production to refocus on internal housekeeping seems contrary to conventional wisdom. It may be interpreted as a sign of weakness or lack of organization. But that same race for deliverables and production can introduce corner-cutting or ad-hoc fixes that are never meant to be sustainable. In Starbucks’ case, for example, baristas were pre-steaming milk for lattes and cappuccinos. This compromised the beverage.

Business intelligence efforts are particularly susceptible to the race for deliverables. Think of an analytics group bombarded with report requirements from different business units. Their revenue depends on the justification for these reports. In some cases, the source data may be given to them without sufficient explanation or rationale, and they are asked to make sense of it on the fly. Billable hours and the drive to “just get it done” takes precedence. A cycle of short-term fixes emerges and no clear ownership of the data is established.

This is a process driven by fear. Being a martyr to productivity is not only selfish, it is irresponsible. Starbucks recognized they were slinging an inferior product and chose to refocus. Howard Schultz had the courage to stand up and step back . . . and that’s just coffee. In the business domain, what critical data products get rushed out to production and are mediocre at best?

Someone must recognize that the vicious cycle is untenable. It might seem contrary to the pressures of billable hours and deliverables, but ultimately is a smart investment in the sustainability of processes. Taking the time up front to stop and get the house in order precludes the repeated short-term fixes that would inevitably snowball. It’s a courageous move amidst competing pressures.

Client Update: Venugopal Menon, MD

One of the most enriching manuscripts I’ve had the opportunity to edit came from Dr. Venugopal Menon, a prominent physician and native son of pre-independence India. His children encouraged him to write the memoirs and I had the opportunity to edit and organize the collection of stories.

Those memoirs were published by Outskirts Press and the book may be found at their store, Amazon.com, or BarnesandNoble.com.

 

Please, Stop Calling it a Hack

Have you used a lifehack? How does one exactly hack life? Chances are, you’re not a hacker. Using a binder clip in new and mind-blowing ways does not bestow a title upon you held by the likes of Kevin Mitnick and Sandra Bullock’s character in The Net. You just used a trick, a tip, or one bullet of listicle clickbait to use something in a different way.

Ushered in by the wildly popular Lifehacker blog (which I readily admit to reading), the term hack has come to replace a variety of words meaning tip. Perhaps it’s a desire to be hipster and ironic, or frame everything in terms of technology, or perhaps as Nikil Saval of the Pacific Standard called it in 2014, the “cult of self-optimization:”

Life-hacking wouldn’t be popular if it didn’t tap into something deeply corroded about the way work has, without much resistance, managed to invade every corner of our lives. The idea started out as a somewhat earnest response to the problem of fragmented attention and overwork—an attempt to reclaim some leisure time and autonomy from the demands of boundaryless labor. But it has since become just another hectoring paradigm of self-improvement.

To be sure, the underlying rationale for a “hack” is productivity, and even the cupcake-eating hack is about eating smarter, not harder (and maximizing the amount of cupcake you can get in your mouth with the least amount of mess). Yes, leave it to the lifehackers to turn something as innocent and joyous as eating a cupcake into an exercise measured in input, output, and waste.

When we move from tips and tricks to hacks, we introduce the assumption of “you’re doing it wrong.” Think of every single one of these lifehack lists as the annoying IT guy in your office who makes you feel incredibly stupid when you ask a simple technology question. I’ve been eating cupcakes for over 30 years and I don’t find anything particularly wrong with how it’s done. I know the different keys on my keyring without painting them in nail polish. I was a straw through the inverted tab of a soda can when I was a teenager, well before any clickbait list instructed me to.

So my quarrel is with both the word and the assumption. Calling something a hack doesn’t make it any more useful or chic than it was when it was a tip or a trick; in fact, it’s the etymological equivalent of a hipster flannel shirt and scarf. Likewise, it carries the pretentious assumption that it is inherently better while at the same time being fashionable before it was cool – think of George Costanza indignantly eating a Snickers bar with a knife and fork. Hacks are for the computer security world. Outside of that realm, it’s only short for hackneyed, and it most certainly is.

Finally, I’ll leave it to the folks at RightThisMinute.com to put a slightly more blunt spin on this.

How To Be a Good Copyeditor

There was a time when the title “Editor,” at least in terms of writing, had to be further clarified. Substantive Editor? Copyeditor? Proofreader? These were the days when editing killed a lot of trees and men stuck press credential in their hats.

Ok, perhaps not that long ago, but it does seem that way sometimes. With the advent of online content management, blog accessibility, and digital publishing, “editing” can be a catchall term covering an amalgam of skills. As a professional writer hiring an editor, you need to clarify what sort of expertise you’re getting. This has consequences for the student writer as well. If you’re faced with writing and revising a term paper, you’re going to want to approach revision as though you are wearing multiple hats, rather than covering all your editing bases in one pass.

I use the following graphic with my composition students. It’s a good representation of how the editing process can be segmented for maximum effectiveness.

Editing as a fluid process
Editing as a fluid process

There are four basic phases:

  1. Revision
  2. Substantive Editing
  3. Copyediting
  4. Proofreading

I’ll cover Revision and Substantive Editing in my next post, so let’s look at Copyediting and Proofreading for now. While the temptation may be great to do both in one pass, the distinction is necessary, especially if you operate under the “Work Smarter, Not Harder” mantra. Think of copyediting as clarifying the voice of your paper–how the sections and paragraphs come together to represent your ideas. A copyeditor wants to make sure the paper’s readability is maximized. Focus on clarity and consistency. Proofreading, on the other hand, about clarifying the look of your paper–typos, grammar mistakes, basic stuff. That should come as the absolute last step, when you know your content and flow are solid, and you just need to put that final inspection to be sure you don’t do something stupid, e.g. confuse there, their, and they’re.

 

How APA and MLA are Different, Part 3

Finishing the three-part series on APA and MLA differences, here are some key examples of references and citations among both the styles.

APA (“References”)
Book: Graber, D.A. (2002). Mass media & American politics. Washington, DC: CQ Press.
Journal: Donaldson, S. (1995). Protecting the troops from Hemingway: an episode in censorship. The Hemingway Review, 15, 87-93.
Website: Park, A. (2008, May 21). How safe are vaccines? Time. Retrieved from http://www.time.com.

Sample Citations
Graber (2002) suggests that “media are most influential in areas in which the audience knows least” (p. 210).
(Adams, 1979) or (Adams, 1979, p. 42)
(Lennon & McCartney, 1968) or (Lennon & McCartney, 1968, p. 999)
(Hexum, Martinez, & Sexton, 1994) or (Hexum, Martinez, & Sexton, 1994, p. 123)

MLA (“Works Cited”)
Book: Graber, Doris A. Mass Media & American Politics. Washington, DC: CQ Press, 2002. Print.
Journal: Donaldson, Scott. “Protecting the Troops from Hemingway: An Episode in Censorship.” The Hemingway Review 15 (1995): 87-93.
Website: Park, Alice. “How Safe Are Vaccines?” Time. Time Magazine, 21 May 2008. Web. 18 March 2011.

Sample Citations
Doris Graber suggests that “media are most influential in areas in which the audience knows least” (210).
(Adams 42)
(Lennon and McCartney 999)
(Hexum, Martinez, and Sexton 123)