JSON-LD vs. Layered JSON Schemas

JSON-LD vs. Layered JSON Schemas

TL;DR: A layered JSON schema can be used to encode linked data just like JSON-LD, with the added benefit of providing structural validation.

What is a “layered JSON schema”, you might ask.

A JSON schema defines the structure of a JSON document, but it doesn’t provide a way of managing metadata or semantic information. A layered JSON schema annotates a JSON schema using overlays to add metadata and semantic information. So if you have different uses for the same schema, you can use different layers to annotate it to fit those use cases.

Overlays can add annotations to a schema to describe how to translate a document to something else, like a knowledge graph. To add JSON-LD-like features to JSON schemas, we will annotate a JSON schema to denote a field as a node in the knowledge graph (RDF IRI node, a value node, or a blank node), or as an edge between nodes (an RDF predicate.)

First, a few words about JSON-LD: JSON-LD is JSON for Linked Data. It works by mapping JSON keys (field names) to standard concepts in an ontology using a @context. These concepts are represented using URIs, so they can link related concepts across different websites on the Internet. JSON-LD is also used to achieve interoperability between different representations of data.

In the following example, two different JSON objects are translated into the same knowledge graph using two different contexts mapping JSON field names to standard concepts from the schema.org ontology. As you can see, the context only provides a mapping for field names. It does not provide any guidance or validation on the structure of the document.

JSON-LD Processing Pipeline

In a JSON-LD document, you cannot specify if a JSON property should be an RDF node or an edge (predicate). The structure of the JSON document and whether or not there is an @id for the object determine that.

Now let’s do the same thing using a JSON schema. Take the following JSON document as an example (it is the same object from the right side of the above image). We will translate this object to a knowledge graph using a layered JSON schema:

{
  "id": "http://linkedin.com/jane-doe",
  "contact": {
    "city": "Denver",
    "state": "CO
  }
}

A JSON schema is a standard way of defining the structure of this object. The following schema can be used by any JSON schema validator to ensure that a JSON object follows the specified structure:

{
  "type": "object",
  "properties": {
    "id": {
      "type": "string"
    },
    "contact": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string"
        },
        "state": {
          "type": "string"
        }
      }
    }
  }
}

The idea is to annotate this JSON schema to tell which data fields should be translated into knowledge graph nodes and which ones should be used as edge labels between nodes (RDF predicates.) We use a schema overlay to add that information. An overlay for this schema simply follows the same structure, but adds additional information to the schema fields:

{
  "properties": {
    "id": {
       "x-ls": {
         "rdfIRI": "."
       }
    },
    "contact": {
      "x-ls": {
        "rdfPredicate": "http://schema.org/address"
      },
      "properties": {
        "city": {
          "x-ls": {
            "rdfPredicate": "http://schema.org/addressLocality"
          }
        },
        "state": {
          "x-ls": {
            "rdfPredicate": "http://schema.org/addressRegion"
          }
        }
      }
    }
  }
}

In this overlay, each JSON field is annotated with an x-ls object. This is a convention suggested by the JSON schema specification to add extensions to a schema. x- means it is an extension, and ls means “Layered Schema”.

The rdfIRI: "." annotation tells that the value of the id field will be used to create an RDF IRI node. The remaining fields are annotated with rdfPredicate annotations. The rdfPredicate annotation tells that the JSON property should be translated as an RDF predicate (an edge in the graph).

We use a “schema bundle”, a file that combines the schema with the overlays used in this instance. Thus:

Layered Schema Processing Pipeline

This provides a fairly straight-forward algorithm to translate a JSON document into a knowledge graph based on schema annotations: If the schema says the attribute should be a node, we create a node, if the schema says it should be an edge, we create an edge with the given label. If the schema ends up connecting two edges without a node, we simply add a blank node.

A proof-of-concept implementation of this algorithm is here.

What is the benefit of this? It uses JSON schemas instead of a JSON-LD context, which is widely used to specify data structures. You can determine if a JSON document is valid, and you can also document the expected structure of data. Using schema overlays, you can create mappings to different ontologies for different use cases. Implementations of schemas may show variations based on different versions, and different practical and legal concerns, so the behavior of a schema can be tweaked to account for such variations, enhancing semantic interoperability.

And, you can extend the annotations to transform JSON objects into different types of data, including RDF knowledge graphs, labeled property graphs, or different JSON objects.