XML and Standardization

XML is a true double-edged sword in the data analytics world, with both advantages and disadvantages not unlike relational databases or NoSQL. The global advantages and disadvantages inherent in XML are just as applicable in the healthcare field. For example, consider the flexibility of user-created tags on the fly—something that is both an advantage (for ease of use, compatibility, expandability, et cetera) and disadvantage (lack of standardization, potential incompatibility with user interfaces, et cetera) in the global sphere. These are equally applicable in healthcare settings. Considering an electronic health record (EHR), different providers and points of care may add to the EHR without having to conform to the standards of other providers; that is, data from a rheumatologist may be added to the patient record in with the same ease as a general practitioner or psychologist. The portability of the XML format means that the record can be exchanged amongst providers or networks as long as the recipient can read XML. However, this versatility comes at a price, as the lack of standardization means that all tags and fields in any given record must be known prior to query and can be quite a time-consuming process.

Considering an analogy to a different industry, think of a consumer packaged goods (CPG) manufacturer. The CPG has its own internal master data schemas in relational databases and reserves XML for its reseller data interface, so that the different wholesalers and retail network can share sales data back to the CPG in a common format. While all participants use a handful of core attributes (e.g., manufacturer SKU and long description), each wholesaler and retailer has its own set of attributes that are proprietary. XML allows the different participants to feed data back to the CPG without conforming to a schema imposed across the entire retail network and allows the CPG to glean the requisite data shared amongst all participants. However, the process requires setting up the known tags for each new participant so that the CPG knows ahead of time what specific tags are relevant to each participant.

References

Brewton, J., Yuan, X., & Akowuah, F. (2012). XML in health information systems. Paper presented at the World Congress in Computer Science, Computer Engineering, and Applied Computing, Las Vegas, NV.

Jumaa, H., Rubel, P., & Fayn, J. (2010, 1-3 July 2010). An XML-based framework for automating data exchange in healthcare. Paper presented at the The 12th IEEE International Conference on e-Health Networking, Applications and Services.

Stockemer, M. (2007). How Do HL7 and XML Co-Exist in Clinical Interfacing? Retrieved from https://healthstandards.com/blog/2007/08/10/how-do-hl7-and-xml-coexist-in-clinical-

NXD and RDBMS Solutions

Comparing native XML database (NXD) and relational DBMS solutions is close to comparing apples and oranges. Both are spherical fruit, but they have very different flavors, applications, and characteristics. RDBMS has been around for a long time and is much more established than NXD; as a result, there is less collective knowledge around NXD and its implementations. RBMS solutions are practically ubiquitous and have a number of different implementations, both open-source and proprietary. Tables are normalized and typically in a fact/dimension model or star schema.

On the other hand, comparative NXD solutions rely on containers and documents in a simple tree structure. Complex joins and queries that are allowable in RDBMS are typically more difficult in NXD (Pavlovic-Lazetic, 2007). One area that NXD shows promise is in Web-enabled data warehousing (Salem, R., Boussaïd, O., & Darmont, J., 2013). Bringing multiple sources of unstructured and structured data together in an Active XML Repository addresses data heterogeneity, distribution, and interoperability issues.

A typical RDMBS implementation for business is a data warehouse in which structured data from various systems of record are brought into a common area and reconciled. These other systems of record may include proprietary relational database systems, mainframe non-relational databases, data exported to delimited formats, et cetera. A data dictionary may be maintained and reconciliation policies may be drawn up by a central data governance board. The output from this data warehouse allows users from different divisions using different systems of record to understand a common organization-wide data taxonomy.

One possible NXD solution involves an IoT data environment. Imagine a number of environmental sensors (e.g., temperature, humidity, pressure) being read on regular intervals and pushed to a central web location. In a typical XML tree structure, readings from each sensor or central controller (handling multiple sensors) could be placed in an XML document. This data does not require complex joins, and is much better suited for a NXD solution.

References

Pavlovic-Lazetic, G. (2007). Native XML databases vs. relational databases in dealing with XML documents. Kragujevac Journal of Mathematics, 30, 181-199.

Salem, R., Boussaïd, O., & Darmont, J. (2013). Active XML-based web data integration. Information Systems Frontiers, 15(3), 371-398.