Semantic Interoperability and Self-Describing Data

Semantic Interoperability and Self-Describing Data

This is the first post in a series that describes the use of self-describing data objects as a foundation for semantic interoperability.

Misunderstandings make good comedy. Trevor Noah has a comedy bit about the word “napkin”, which apparently means diaper in South Africa. The humor is in his misunderstanding of the consequences as he is offered napkins along with the first tacos he buys in the US. Misunderstandings can also be very expensive. In 1998, the Mars Climate Orbiter was lost because one of the collaborating organizations used the metric system, while their partner used the English system.

When misunderstandings happen between machines we call that there is “a lack of interoperability”. In healthcare, this misunderstanding is very expensive, estimated to cost the U.S. over $30 billion a year

Interoperability, "…a characteristic of a product or system to work with other products or systems", happens on multiple levels. The following is a classification introduced by the HIMSS in the healthcare domain, but it applies to interoperability in general:

Interoperability at the “foundational level” is about the ability to communicate. This is when one system sends a message to another, and the other system can receive it, but there is no consideration of whether the recipient understands the message or not. For computers, this level of interoperability corresponds to connectivity.

The “structural level” of interoperability is about data structure. One system can send a message to another, and the other system can receive and process it because the message is in an expected format. The receiving system can extract data fields and display them to a user who can interpret what the message means. For computers, this corresponds to using standard formats like JSON or XML with common names for data elements. Such structural concerns can be addressed by the use of “schemas”. A schema contains a machine-readable description of data. Semantics web technologies offer solutions for cross-domain structural interoperability problems. For instance, what one system calls a “firstName” might be called a “name” by another, so to work with both, one can map them to a common identifier, “givenName”. There are many standard web ontologies that define such identifiers and relationships between them, for instance, https://schema.org is used extensively by web applications.

And finally, the “semantic level” of interoperability is about the structure of data as well as data values. Here, one system sends a message to another system, which can receive and process the information but can also understand whatever the sender means because the message is in an expected format and data values are understandable. In many cases, that implies some sort of codification of values, such as through ontologies, coding systems, standard units, and more. But not everything can be coded. For example, let’s say one system uses “2001-03-02” to show birth dates whereas another uses “03-02-2001”. Knowing that the field is a birth date does not really help here, the receiving system must know how to interpret the value. One can usually guess the date format, but not always. Does “03-02-2001” mean the second of March or the third of February? A similar but more complicated situation applies to measured values (such as laboratory test values) where the same value may be represented using different units.

Semantic interoperability is necessary for enabling computers interpret and process data without human help. In many domains, and especially in healthcare, it still remains an elusive goal. With many standards around how data values should be represented, one thing is clear: a single common standard that covers all possible use cases in a given domain is not possible. Multiple standards arise to address different aspects of information in different applications of similar use cases. There are regional, regulatory, and cultural differences that cause variations in the meaning of data.

Self-describing data objects can achieve semantic interoperability between systems even if there are multiple standards and various applications of those standards. A self-describing data object contains not only the data but also “machine-readable instructions” on how to interpret that data. These instructions can be represented in the form of “schemas” that describe the structure of data, additional layers of “semantics” that capture the meaning of underlying data in different domains, and layers of “metadata” that capture the context that may include information about how data elements were collected, what the purpose was, how reliable it is, etc.

In the next few posts, we will develop a self-describing data architecture that incorporates multiple standards with added semantics and contextual metadata to achieve semantic interoperability.