R, Azure SQL Server, and Mac OS X

There are a few options out there for those of us who work (mostly or solely) on a Mac and need to access MSSQL databases through R.

 

RODBC and RSqlServer are the two I’ve worked with. RODBC requires some additional workarounds with a Unix ODBC driver; RSqlServer has issues with rJava in OS X High Sierra. I’ve found RODBC to require the least amount of workarounds and frustration. Another wrinkle here is having an Azure SQL Server as the endpoint with SQL Authentication (not Windows Auth).

 

First, you will need to update your Mac with Homebrew then add a few packages. This page discusses how but did not work for me out of the box. There are extra steps.

 

1. Install HomeBrew via Terminal

 

2. Install unixODBC via Terminal
brew update
brew install unixodbc

 

3. Install freeTDS via Terminal

brew install freetds --with-unixodbc

 

4. Add to /usr/local/etc/odbcinst.ini
[MSSQL]
 
Description   = Microsoft SQL Server driver
 
Driver        = /usr/local/Cellar/freetds/0.95.18/lib/libtdsodbc.so

 

5. Add to /usr/local/etc/freetds.conf
[MY_SQL_SERVER] 
host = myazureserver.database.windows.net
 
port = 1433
 
tds version = 7.0

 

6. Add to /usr/local/etc/odbc.ini
[myazureserver]
Driver=/usr/local/lib/libtdsodbc.so
Trace=No
Server=myazureserver.database.windows.net
Port=1433
TDS_Version=8.0
Database=myazuredatabase

 

7. In Terminal, enter (with user and pass replaced by your credentials):
isql -v myazureserver user pass
You should see a success message and a new prompt. See the Connecting section on the page for what it should look like. Type quit to exit that SQL shell.

 

8. If that is successful, you know the system-level configuration is complete. Run the following commands in Terminal to create symbolic links:
ln -vs /usr/local/Cellar/freetds/0.95.18/etc/freetds.conf ~/.freetds.conf
ln -vs /usr/local/Cellar/unixodbc/2.3.2_1/etc/odbc.ini ~/.odbc.ini
ln -vs /usr/local/Cellar/unixodbc/2.3.2_1/etc/odbcinst.ini ~/.odbcinst.ini
9. In R, run this script (after replacing user and pass with your credentials)
# install RODBC package (can comment this out once run)
install.packages("RODBC", type = "source")

# call RODBC package
library(RODBC)

# create a connection
mycon <- odbcConnect("myazureserver", uid="user", pwd="pass")

# see what it looks like:
mycon 

# Select the top 100 records from table dbo.Table and load into dataframe "rs"
rs <- sqlQuery(mycon, "SELECT TOP (100) * FROM dbo.Table")

 

Now you should have a data frame named “rs” with 100 rows of data.

The Courage to Step Back

On February 26, 2008, Starbucks stores across the country closed for 3.5 hours for what CEO Howard Schultz characterized as “a reaffirmation of [their] coffee leadership.” An estimated $4-6 million in sales were lost, rival coffee stores offered promotions taking advantage of the competitor’s down time, and reactions were wildly mixed. This was an incredibly bold move in the midst of the Great Recession. Just when consumers needed signs of confidence from their trusted brands, a staple goes dark? After expanding at a breakneck speed, why was Starbucks stepping back?

I had very little insight into what went into this decision before reading Schultz’s book, Onward. In it, he explains how the drive to grow had overtaken the fundamentals of the company. In an environment of increasing demands at the front line, Starbucks had fallen into bad practices, even with the best of intentions. It became necessary to take a step back. It was time to refocus, retrain, and recommit to the Starbucks Experience. As Jon Picoult notes, Schultz did not view this as a cost—it was “a smart investment in the education of his employees.” Beyond that, it was a courageous move . . . one that ultimately worked out in the company’s favor.

In any business environment, the prospect of shutting down and stepping back from production to refocus on internal housekeeping seems contrary to conventional wisdom. It may be interpreted as a sign of weakness or lack of organization. But that same race for deliverables and production can introduce corner-cutting or ad-hoc fixes that are never meant to be sustainable. In Starbucks’ case, for example, baristas were pre-steaming milk for lattes and cappuccinos. This compromised the beverage.

Business intelligence efforts are particularly susceptible to the race for deliverables. Think of an analytics group bombarded with report requirements from different business units. Their revenue depends on the justification for these reports. In some cases, the source data may be given to them without sufficient explanation or rationale, and they are asked to make sense of it on the fly. Billable hours and the drive to “just get it done” takes precedence. A cycle of short-term fixes emerges and no clear ownership of the data is established.

This is a process driven by fear. Being a martyr to productivity is not only selfish, it is irresponsible. Starbucks recognized they were slinging an inferior product and chose to refocus. Howard Schultz had the courage to stand up and step back . . . and that’s just coffee. In the business domain, what critical data products get rushed out to production and are mediocre at best?

Someone must recognize that the vicious cycle is untenable. It might seem contrary to the pressures of billable hours and deliverables, but ultimately is a smart investment in the sustainability of processes. Taking the time up front to stop and get the house in order precludes the repeated short-term fixes that would inevitably snowball. It’s a courageous move amidst competing pressures.