SurveyGizmo, R, and MySQL on Amazon AWS

Those who have been in any sort of sociological research field should very familiar with the available survey platforms out on the web now (e.g., SurveyMonkeySurveyGizmo, or LimeSurvey). Getting your results usually involves a multi-step generate/export/import cycle. Is there a better way?

I asked the question when using R to digest a survey deployed on SurveyGizmo. With so many R packages out there, I had a hunch there was something to help me get my results from SG into R without having to run through the generate/export/import cycle. Enter RSurveyGizmo, a package that does exactly that.

Beyond aggregates and analytics, the survey results in SurveyGizmo should be stored elsewhere for future use. This raises more questions about ETL from the website itself to your database of choice. In this case, let’s assume we have a MySQL database running on Amazon AWS. I recommend this over a MSSQL instance because of the difficulty of using an ODBC connection on anything other than Windows (but it can be done).

Assumptions

  • SurveyGizmo account with surveys already active
  • MySQL database established on Amazon AWS
  • You know your host, port, dbname, username, and password for your MySQL database on Amazon AWS
  • R version 3.4.2

Part I: SurveyGizmo

  1. Log into your SurveyGizmo account and head over to your API access options. Find that under Account > Integrations > Manage API.
  2. If you don’t have an active API key listed, Create an API Key. You will then see the API key listed for your user account. Copy that key to a text editor, as you will need it momentarily.
  3. Go back to your SurveyGizmo home page and view the surveys you have out there. Choose one and click on it.
  4. You’ll be taken to the survey build page and the address will be something like https://app.surveygizmo.com/builder/build/id/xxxxxxx where xxxxxxx is a unique number. Copy that number to a text editor, as you will need it momentarily too.

Part II: R + SurveyGizmo

  1. Install RSurveyGizmo via devtools.
    library(devtools)
    install_github(repo="DerekYves/rsurveygizmo")
  2. Construct the script to grab your survey. You will need the API key and survey number.
    library(Rsurveygizmo)
    api <- "your_api_key"
    my.data <- pullsg(survey_number, api, completes_only=T)
  3. You will see loading progress and, depending on the size of your survey, will have a frame full of data in just a few moments. (Sometimes I get a JSON error, but it resolves itself in a few minutes.) SurveyGizmo does have API call limits, so please be judicious with how many times you do this. It’s generally good to run the process once you have enough data to start writing your analytics scripts, then again once the survey is closed.
  4. This is the simplest of the methods in the RSurveyGizmo package. You will want to explore the package documentation to learn all it can do for you.

Part III: R + MySQL

  1. Install the RMySQL package via your package loader.
  2. Construct the script to establish your connection, filling in your specific details.
    # load RMySQL
    library(RMySQL)
    
    # establish the MySQL connection
    con <- dbConnect(RMySQL::MySQL(),
     username = "user",
     password = "password",
     host = "name.something.zone.rds.amazonaws.com",
     port = 3306,
     dbname = "mydb"
    )
  3. Now con will serve as your pipeline for the RMySQL calls.
  4. Two common methods are dbWriteTable and dbSendQuery. As you might expect, to write an R data frame to a table in your MySQL database, you use dbWriteTable:
    dbWriteTable(con, "table_name", dataframe.name, overwrite=TRUE)

    Using overwrite=TRUE means your table is essentially dropped and recreated, rather than appended.
    To get an existing MySQL table into a new R data frame, you’d use dbSendQuery:

    newframe = dbSendQuery(con, "SELECT * FROM mydb.mytable")
  5. Here’s a wrinkle, though. SurveyGizmo downloads come with concatenated column names that may not be very helpful. I prefer to convert all my column names to a standard format and establish a reference table with all the original questions matched up. The following script grabs all the column names from an existing data frame and creates a table with a standard “qxxx” format matched to the original question name.
    # get question text into vector
    Question_Text <- colnames(mydata.original)
    
    # get length of that vector
    sq <- length(Question_Text)
    
    # generate sequence based on that length
    QKey <- sprintf("q%03d",seq(1:sq))
    
    # make a new data frame with the QKeys matched to the original question text
    mydata.questions <- data.frame(QKey, Question_Text)
    
    # replace original question text with the those keys
    colnames(mydata.original) = as.character(QKey);

    Now you have two frames: mydata.original with uniform column names, and mydata.questions with those column names matched to the original text.

    Assuming you want to get those frames into your MySQL database, use the following:

    dbWriteTable(con, "mydata_questions",mydata.questions, overwrite=TRUE)
    dbWriteTable(con, "mydata_original",mydata.original, overwrite=TRUE)