JSON-LD vs. Layered JSON Schemas
TL;DR: A layered JSON schema can be used to encode linked data just like JSON-LD, with the added benefit of providing structural validation.
What is a “layered JSON schema”, you might ask.
A JSON schema defines the structure of a JSON document, but it doesn’t provide a way of managing metadata or semantic information. A layered JSON schema annotates a JSON schema using overlays to add metadata and semantic information. So if you have different uses for the same schema, you can use different layers to annotate it to fit those use cases.
Overlays can add annotations to a schema to describe how to translate a document to something else, like a knowledge graph. To add JSON-LD-like features to JSON schemas, we will annotate a JSON schema to denote a field as a node in the knowledge graph (RDF IRI node, a value node, or a blank node), or as an edge between nodes (an RDF predicate.)
First, a few words about JSON-LD: JSON-LD is JSON for
Linked Data. It works by
mapping JSON keys (field names) to standard concepts in an ontology
using a @context
. These concepts are represented using URIs, so they
can link related concepts across different websites on the
Internet. JSON-LD is also used to achieve interoperability between
different representations of data.
In the following example, two different JSON objects are translated into the same knowledge graph using two different contexts mapping JSON field names to standard concepts from the schema.org ontology. As you can see, the context only provides a mapping for field names. It does not provide any guidance or validation on the structure of the document.
In a JSON-LD document, you cannot specify if a JSON property should be
an RDF node or an edge (predicate). The structure of the JSON document
and whether or not there is an @id
for the object determine that.
Now let’s do the same thing using a JSON schema. Take the following JSON document as an example (it is the same object from the right side of the above image). We will translate this object to a knowledge graph using a layered JSON schema:
{
"id": "http://linkedin.com/jane-doe",
"contact": {
"city": "Denver",
"state": "CO
}
}
A JSON schema is a standard way of defining the structure of this object. The following schema can be used by any JSON schema validator to ensure that a JSON object follows the specified structure:
{
"type": "object",
"properties": {
"id": {
"type": "string"
},
"contact": {
"type": "object",
"properties": {
"city": {
"type": "string"
},
"state": {
"type": "string"
}
}
}
}
}
The idea is to annotate this JSON schema to tell which data fields should be translated into knowledge graph nodes and which ones should be used as edge labels between nodes (RDF predicates.) We use a schema overlay to add that information. An overlay for this schema simply follows the same structure, but adds additional information to the schema fields:
{
"properties": {
"id": {
"x-ls": {
"rdfIRI": "."
}
},
"contact": {
"x-ls": {
"rdfPredicate": "http://schema.org/address"
},
"properties": {
"city": {
"x-ls": {
"rdfPredicate": "http://schema.org/addressLocality"
}
},
"state": {
"x-ls": {
"rdfPredicate": "http://schema.org/addressRegion"
}
}
}
}
}
}
In this overlay, each JSON field is annotated with an x-ls
object. This is a convention suggested by the JSON schema
specification to add extensions to a schema. x-
means it is an
extension, and ls
means “Layered Schema”.
The rdfIRI: "."
annotation tells that the value of the id
field will
be used to create an RDF IRI node. The remaining fields are annotated
with rdfPredicate
annotations. The rdfPredicate
annotation tells
that the JSON property should be translated as an RDF predicate (an
edge in the graph).
We use a “schema bundle”, a file that combines the schema with the overlays used in this instance. Thus:
This provides a fairly straight-forward algorithm to translate a JSON document into a knowledge graph based on schema annotations: If the schema says the attribute should be a node, we create a node, if the schema says it should be an edge, we create an edge with the given label. If the schema ends up connecting two edges without a node, we simply add a blank node.
A proof-of-concept implementation of this algorithm is here.
What is the benefit of this? It uses JSON schemas instead of a JSON-LD context, which is widely used to specify data structures. You can determine if a JSON document is valid, and you can also document the expected structure of data. Using schema overlays, you can create mappings to different ontologies for different use cases. Implementations of schemas may show variations based on different versions, and different practical and legal concerns, so the behavior of a schema can be tweaked to account for such variations, enhancing semantic interoperability.
And, you can extend the annotations to transform JSON objects into different types of data, including RDF knowledge graphs, labeled property graphs, or different JSON objects.