XML and Standardization

XML is a true double-edged sword in the data analytics world, with both advantages and disadvantages not unlike relational databases or NoSQL. The global advantages and disadvantages inherent in XML are just as applicable in the healthcare field. For example, consider the flexibility of user-created tags on the fly—something that is both an advantage (for ease of use, compatibility, expandability, et cetera) and disadvantage (lack of standardization, potential incompatibility with user interfaces, et cetera) in the global sphere. These are equally applicable in healthcare settings. Considering an electronic health record (EHR), different providers and points of care may add to the EHR without having to conform to the standards of other providers; that is, data from a rheumatologist may be added to the patient record in with the same ease as a general practitioner or psychologist. The portability of the XML format means that the record can be exchanged amongst providers or networks as long as the recipient can read XML. However, this versatility comes at a price, as the lack of standardization means that all tags and fields in any given record must be known prior to query and can be quite a time-consuming process.

Considering an analogy to a different industry, think of a consumer packaged goods (CPG) manufacturer. The CPG has its own internal master data schemas in relational databases and reserves XML for its reseller data interface, so that the different wholesalers and retail network can share sales data back to the CPG in a common format. While all participants use a handful of core attributes (e.g., manufacturer SKU and long description), each wholesaler and retailer has its own set of attributes that are proprietary. XML allows the different participants to feed data back to the CPG without conforming to a schema imposed across the entire retail network and allows the CPG to glean the requisite data shared amongst all participants. However, the process requires setting up the known tags for each new participant so that the CPG knows ahead of time what specific tags are relevant to each participant.

References

Brewton, J., Yuan, X., & Akowuah, F. (2012). XML in health information systems. Paper presented at the World Congress in Computer Science, Computer Engineering, and Applied Computing, Las Vegas, NV.

Jumaa, H., Rubel, P., & Fayn, J. (2010, 1-3 July 2010). An XML-based framework for automating data exchange in healthcare. Paper presented at the The 12th IEEE International Conference on e-Health Networking, Applications and Services.

Stockemer, M. (2007). How Do HL7 and XML Co-Exist in Clinical Interfacing? Retrieved from https://healthstandards.com/blog/2007/08/10/how-do-hl7-and-xml-coexist-in-clinical-

The Role of Data Brokers in Healthcare

In courses I’ve led before, we looked at the disjointed data privacy regulations in the United States and current events in data privacy (e.g., Facebook, Cambridge Analytica, personal genomics testing, etc). The overall issue is repeatable in any setting: giving a single entity a large amount of data inevitably raises questions of ethics, privacy, security, and motivation.

Where healthcare data brokers are concerned, the stated goals differ by type of data. Where direct patient interaction with the data is concerned, the goal is to give patients “more control over the data” (Klugman, 2018) and perhaps bypass the clunky patient portals set up by providers. Of the data that is not personally identifiable, it can have much less altruistic goals, such as being a player in a multi-billion-dollar market (Patientory, 2018) or contributing to health insurance discrimination (Butler, 2018). I am not naïve enough to think that all exercises in healthcare should be altruistic, and the concept of insurance itself has a certain modicum of discrimination in its core; however, weaponizing the data to aid in unfair practices is beyond the pale here.

No alt text provided for this image

From a data engineering perspective, a broker in the truest sense of the word may act as a clearinghouse between providers with disparate systems, enabling the seamless transfer of patient data between those providers without putting the burden of ETL on either of them. Whereas XML formatting and other portability developments have allowed providers using different EHR systems to port patient data, a data brokerage would act as an independent party acting on the patient’s behalf and handling the technical details on integrating their data between all providers and interested parties. Beyond holding the data, the broker would be responsible for ensuring each provider and biller has access to the same single source of truth on that particular patient.

This would, of course, require a data warehouse of sorts for the single source to be held, and puts the questions of security, privacy, transparency, and ethics on the broker. The broker has to make money to survive and a business model must emerge, so it would not be immune to market forces. The aggregation of so much patient data in one place would be too great a temptation to let sit and not make money as de-identified commodities, so a secondary market would emerge and lead to the same issues cited above. Call me pessimistic, but the best predictor of future actions is past behavior, and thus far the companies holding massive amounts of data about our lives either can’t keep it secure from breaches or are perfectly happy selling it while turning a blind eye to what is done with it.

References

Butler, M. (2018). Data brokers and health insurer partnerships could result in insurance discrimination. Retrieved from https://journal.ahima.org/2018/07/24/data-brokers-and-health-insurer-partnerships-could-result-in-insurance-discrimination/

Klugman, C. (2018). Hospitals selling patient records to data brokers: A violation of patient trust and autonomy. Retrieved from http://www.bioethics.net/2018/12/hospitals-selling-patient-records-to-data-brokers-a-violation-of-patient-trust-and-autonomy/

Patientory. (2018). Data brokers have access to your information, do you? Retrieved from https://medium.com/@patientory/data-brokers-have-access-to-your-health-information-do-you-562b0584e17e

Data-in-Motion or Data-at-Rest?

Reading the available material on data-in-motion reminds me of when I first read about data lakes over data warehouses, or NoSQL over SQL: the urgency of the former, and outright danger of the latter, are both overblown. Put simply, data-in-motion provides real-time insights. Most of our analytics efforts across data science spheres apply to stored data, be it years, weeks, or hours old. Taking a look at data-in-motion means not storing it prior to analyzing it, and extracting insights as the data rolls in. This is one workstream of dual efforts that tell us the whole picture: historical data providing insight on what happened and potential to train for future detection, and real-time data to get what’s happening right now.

Churchward (2018) argues a fair point here: once data is stored, it isn’t real-time by definition. But taking that argument to a logical extreme by asking whether we would like to make decisions on data three months old is a stretch. While it is true that matters such as security and intrusion detection must have real-time detection, categorically dismissing data-at-rest analytics is reckless. It vilifies practices that are the foundation of any comprehensive analytics strategy. Both data-at-rest and data-in-motion are valuable drivers of any business intelligence effort that seeks to paint a total picture of any phenomena. 

There are, of course, less frantic cases to be made for data-in-motion. Patel (2018) illustrates a critical situation on an oil drilling rig, in which information even a few minutes old can be life-threatening. In this case, written for Spotfire X, there may be some confusion of monitoring versus analytics. The dashboard shown on the website and written scenario paint more a picture of monitoring and dashboarding than the sort of analytics we would consider deploying Spark or Kafka for. I don’t need a lot of processing power to tell me that a temperature sensor readings are increasing.

Performing real-time analytics on data-in-motion is an intensive task, requiring quite a bit of computing resources. Scalable solutions such as Spark or Kafka are available but may eventually hit a wall. Providers such as Logtrust (2017) differentiate themselves as a real-time analytics provider by pointing out the potential shortfalls of those solutions and offer a single platform for both data-in-motion and data-at-rest.

References

Churchward, G. (2018). Why “true” real-time data matters: The risks of processing data-at-rest rather than data-in-motion. Retrieved from https://insidebigdata.com/2018/03/22/true-real-time-data-matters-risks-processing-data-rest-rather-data-motion/

Logtrust. (2017). Real-time IoT big data-in-motion analytics.

Patel, M. (2018). A new era of analytics: Connect and visually analyze data in motion. Retrieved from https://www.tibco.com/blog/2018/12/17/a-new-era-of-analytics-connect-and-visually-analyze-data-in-motion/

Challenges of Health Informatics in the Cloud

Alghatani and Rezgui (2019) present a framework for remote patient monitoring via cloud architecture. The primary intention is to reduce disparate data sources and walls between various data siloes, increasing cost effectiveness, response time, and quality of care. The cloud architecture involves the database itself, user interface(s), and artificial intelligence. This cloud is used by four primary groups: patients, hospitals, insurance companies, and controllers (system stewards).

The authors outline a number of advantages here. Telemedicine can be a great thing but has a number of barriers to overcome, not the least of which are cost, culture, political environment, and infrastructure. The cloud architecture seeks to mitigate the cost and infrastructure issues. IT resources can be extended dynamically based on need and the decentralized nature of the system allows for better scalability, flexibility, and reliability.

There are a number of challenges to be considered. The authors highlight seven:

  1. Security
  2. Data management
  3. Governance
  4. Control
  5. Reliability
  6. Availability
  7. Business continuity

An extensive discussion on data collection challenges is presented, outlining a number of possible methods for collection and synchronization. There must be an assumption that no device on this architecture will maintain constant contact with the cloud, and consistency models must be taken into consideration. One option is for each device to maintain local storage and upload to the cloud once a stable connection is available. Another option is a whisper network of its own, much like the early Amazon Kindle devices. A third and final option—also the authors’ proposal—is the utilization of fog computing as a layer between these devices and the cloud.

Privacy is always an issue and cloud architecture muddies the waters a bit, as there is no on-premise server locked down that holds the personally identifiable information. Banks and hospitals have typically been the slowest to adopt cloud computing, in my experience. As Alghatani and Rezgui (2019) note, governance and control are concerns here. The Health Insurance Portability and Accountability Act (HIPAA) requires confidentiality in all individually-identifiable health information; in 2013, this law was extended to genetic information by way of the Genetic Information Nondiscrimination Act (GINA). While the rules prohibit use of genetic information for underwriting purposes, there is no restriction on the sharing or use of genetic information that has been de-identified (National Human Genome Research Institute, 2015). De-identification is not entirely foolproof. There are cases in which the data can be re-identified (Rosenbaum, 2018).

References

Alghatani, K., & Abdelmounaam, R. (2019). A cloud-based intelligent remote patient monitoring architecture. Paper presented at the International Conference on Health Informatics & Medical Systems, HIMS’19, Las Vegas, NV.

National Human Genome Research Institute. (2015). Privacy in genomics. Retrieved from https://www.genome.gov/about-genomics/policy-issues/Privacy

Rosenbaum, E. (2018). Five biggest risks of sharing your DNA with consumer genetic-testing companies. Retrieved from https://www.cnbc.com/2018/06/16/5-biggest-risks-of-sharing-dna-with-consumer-genetic-testing-companies.html