big data – Jonathan Fowler

Thick Data and Big Data

Posted on December 11, 2019 by JonathanLeave a comment

In March 1968, Robert F. Kennedy said, of the Gross Domestic Product index: “It measures neither our wit nor our courage, neither our wisdom nor our learning, neither our compassion nor our devotion to our country, it measures everything in short, except that which makes life worthwhile.”

“What is measurable is not always what is valuable.” Wang (2016b) paraphrased Kennedy, originally referencing GDP and its inability to measure the qualitative human condition. With the exponential increase in attention to Big Data as of late, the focus on speed and scale have left out things that are “sticky” or “difficult to quantify” (Wang, 2016b). This disparity reflects the traditional gap between qualitative and quantitative research. In fact, Wang found referring to the qualitative efforts in traditional terms (e.g., ethnography) was met with enough skepticism and pushback that a new term friendly to data jargon had to emerge—and thus the term thick data was born.

https://miro.medium.com/max/1442/1*B4UOLidQEam25fJkNeZH8A.png — Courtesy Tricia Wang

At first glance, thick data is not attractive in the traditional sense of big data. It is inefficient, does not scale up, and is usually not reproducible. However, when combined with big data, it fills the gaps that the quantitative measures leave open. While big data can identify patterns, it cannot explain why those patterns exist. If big data can go broad, thick data can go deep. Thick data relies on human learning and complements the findings from machine learning that big data cannot provide adequate context for. It shows the social context of specified patterns and is able to handle irreproducible complexity. It is the qualitative complement to quantitative data, the color and nuance to a black-and-white picture.

Forces against the adoption of thick data typically stem from bias against qualitative data. Again, it is messy…inefficient, sticky, complicated, and nuanced. Most of the big data world values what can be quantified and the relationships that can be mapped. As (Wang, 2016a) notes, quantifying is addictive, and it can be easy to throw out data that doesn’t fit a numerical value. It isn’t a zero-sum game, however. Both big data and thick data complement each other. But “silo culture”—the same phenomenon that disrupts data integration and wreaks havoc across enterprise data environments—threatens the symbiosis between these two (Riskope, 2017). While thick data is not an innovation in the same sense of cutting-edge artificial intelligence or new developments in IoT technology, it is an innovation in how we think about the world around us and what is important when studying that world.

References

Riskope. (2017). Big data or thick data: Two faces of a coin. Retrieved from https://www.riskope.com/2017/05/24/big-data-or-thick-data-two-faces-of-a-coin/

Wang, T. (2016a). The human insights missing from big data. Retrieved from https://www.ted.com/talks/tricia_wang_the_human_insights_missing_from_big_data

Wang, T. (2016b). Why big data needs thick data. Retrieved from https://medium.com/ethnography-matters/why-big-data-needs-thick-data-b4b3e75e3d7

Where Clinical, Genomic, and Big Data Collide

Posted on August 7, 2019 by JonathanLeave a comment

One of the early proving grounds of big data is healthcare, and the constant cycle of insights catching up to volume hasn’t changed since the early days of the electronic patient record. Early healthcare data typically involved structured metrics such as ICD9 codes and other billing data, which yielded very little clinical detail. The introduction of new data points, both structured and unstructured, has opened the door to many new analytics possibilities. While the possibilities are there, “few viable automated processes” exist that can “extract meaning from data that is diverse, complex, and often unstructured” (Barlow, 2014, p. 18). Indeed, the gap continues to widen between the “rapid technological process in data acquisition and the comparatively slow functional characterization of biomedical information (Cirillo & Valencia, 2019, p. 161).

With so much available, a hospital or healthcare provider may find it difficult to determine a place to start, and either ignore the possibilities altogether or engage in initiatives that are not impactful to clinical quality or costs. There are five broad areas in which value can be delivered: clinical operations, payment & pricing, R&D, new business models, and public health; data are gathered from four broad sources including clinical, pharmaceutical, administrative, and consumer (Barlow, 2014, p. 21).

As of late, genomics have entered the conversation as both a consumer product (e.g., 23AndMe or Ancestry, known as personal genomic testing) and clinical practice. It is one thing to prescribe a medication based on a patient’s chart history, but an entirely different patient experience when a prescription is tailored to a patient’s particular metabolism, genetic predispositions, and risks (Barlow, 2014, p. 19). The wealth of patient-generated health data from a growing number of consumer devices has already contributed to the rise of “Personalized Medicine” (Cirillo & Valencia, 2019, p. 162) and the introduction of genomic data will move the needle even further. One can’t get much more personalized than a genetic footprint.

One debate around personal genomic testing is the value it provides when given directly to consumers without the benefit of clinician involvement. While the benefits of such testing include lifestyle changes that mitigate future disease risk, consumers are also prone to misinterpretation that may lead to unnecessary medical treatment (Meisel et al., 2015, p. 1). Beyond future risk, a recent study found the interest around personal genomic testing had a great deal to do with family or individual history of a particular affliction (Meisel et al., 2015). Consumers are mindful of explaining current risks and phenomena, not just predicting them.

References

Barlow, R. D. (2014). Great expectations for big data. Health Management Technology, 35(3), 18-21.

Cirillo, D., & Valencia, A. (2019). Big data analytics for personalized medicine. Current Opinion in Biotechnology, 58, 161-167.

Meisel, S. F., Carere, D. A., Wardle, J., Kalia, S. S., Moreno, T. A., Mountain, J. L., . . . Green, R. C. (2015). Explaining, not just predicting, drives interest in personal genomics. Genome Medicine, 7(1), 74.

Big Data: Human vs Material Agency

Posted on August 7, 2019 by JonathanLeave a comment

Lehrer, Wieneke, Vom Brocke, Jung, and Seidel (2018) studied four companies and their use of big data analytics in the business. Common to all companies in the case study was a two-layer service innovation process: first, automated customer-oriented actions based on trigger actions and preferences; and second, the combination of human and material agencies to produce customer-oriented interactions. The latter is of particular interest, as popular opinion sometimes tends to totalize big data as a replacement for human interaction. As illustrated in this study, the material agency (technology) exists to supplement the human agency.

One particular illustration is Company A, “the Swiss subsidiary of a multinational insurance firm that offers private individuals and corporate customers a broad range of personal, property, liability, and motor vehicle insurance” (Lehrer et al., 2018). Through a recent implementation of big data analytics tools and methodologies, the company has created new ways of more efficient interaction and supplemented employees’ customer service with better insights. In the latter case, the material agency guides employees’ own interactions with customers. That is, “the employees’ skill sets, experiences, and customer contact strategies [interact] with the material features of BDA to create new practices” (Lehrer et al., 2018, p. 438). This may include a number of sales- and service-oriented cues, such as social media or online shopping data points pointing to a major life event. On the other front, consider how the stream of data from various customer devices (e.g., home security system, automobile ODBC data trackers, smartphone location data) provides a wealth of data points that can be utilized by various machine learning methods to understand what typical behavior looks like for a customer and then know when anomalies show up. Personally, my home security system now knows it is an unusual occurrence for me to go outside a particular geographic region without arming the system. When that does occur, I receive an alert reminding me to arm it.

Reference

Lehrer, C., Wieneke, A., Vom Brocke, J. A. N., Jung, R., & Seidel, S. (2018). How big data analytics enables service innovation: Materiality, affordance, and the individualization of service. Journal of Management Information Systems, 35(2), 424-460. doi:10.1080/07421222.2018.1451953

What Makes Big Data “Big?”

Posted on August 7, 2019 by JonathanLeave a comment

I’ve never been a fan of buzzwords. The latests source of my discomfort is the term thought leader, which is one of those ubiquitous but necessary phrases in almost every professional space. That hasn’t kept me from poking fun at it, though, as I believe we should be able to laugh at ourselves and not take things too seriously.

Big Data is a buzzword. But it’s also my career.

What is the difference between regular, conventional, garden-variety data and Big Data? There’s a lot we could say here, but they key differences that come to mind for me are use, size, scope, and storage. I immediately think of two specific datasets I’ve used for teaching purposes: LendingClub and Stattleship.

LendingClub posts their loan history (anonymized, of course) for public consumption so that any audience may feed it into an engine or tool of their choice for analysis. I’ve used this dataset before to demonstrate predictive modeling and how financial institutions use it to aid decision-making in loan approvals. Stattleship is a sports data service with an API that allows access to a myriad of major league sports data. They also provide a custom wrapper to be used in R, and I’ve used these tools to teach R.

One of the primary differences between big data and conventional data is use case. Take these two datasets, for example. The architects of these sets understand that a variety of users will be downloading the data for various reasons, and there is no specific use case intended for either set. The possibilities are endless. With smaller troves of data, we typically have an intended use attached, and the data is specific to that use. Not so with big data.

These datasets illustrate two other key factors in big data: size and scope. Again, the datasets are not at all meant to answer one specific question or have a narrow focus. Sizing is often at least in gigabytes or terabytes—and in many cases tipping over into petabytes. The freedom to explore multiple lines of inquiry is inherent in big data sets without any sort of restriction on scope.

Finally, the storage and maintenance of big data is another key difference that sets it apart from conventional datasets. The trend of moving database operations offsite and using Database-as-a-Service models have enabled the growth of big data, as has the development of distributed computing and storage. Smaller conventional datasets do not require such an infrastructure and are not quite as impactful on a company’s bottom line.