XClose
Menu

Semantic file formats

The dream of a semantic web

So how can we fulfill the dream of a file-format which is self-documenting: universally unambiguous and interpretable?

(Of course, it might not be true, but we don't have capacity to discuss how to model reliability and contested testimony.)

By using URIs to define a controlled vocabulary, we can be unambiguous.

But the number of different concepts to be labelled is huge: so we need a distributed solution: a global structure of people defining ontologies, (with methods for resolving duplications and inconsistencies.)

Humanity has a technology that can do this: the world wide web. We've seen how many different actors are defining ontologies.

We also need a shared semantic structure for our file formats. XML allows everyone to define their own schema. Our universal file format requires a restriction to a basic language, which allows us to say the things we need:

The Triple

We can then use these defined terms to specify facts, using a URI for the subject, verb, and object of our sentence.

In [1]:
%%writefile reaction.ttl 

<http://dbpedia.org/ontology/water>
    <http://purl.obolibrary.org/obo/PATO_0001681>
        "18.01528"^^<http://purl.obolibrary.org/obo/UO_0000088>
            .
Writing reaction.ttl

This is an unambiguous statement, consisting of a subject, a verb, and an object, each of which is either a URI or a literal value. Here, the object is a literal with a type.

RDF file formats

We have used the RDF semantic format, in its "Turtle" syntactic form:

subject verb object .
subject2 verb2 object2 .

We can parse it:

In [2]:
from rdflib import Graph

graph = Graph()
graph.parse("reaction.ttl", format="ttl")

len(graph) # prints 2

for statement in graph:
    print(statement)
(rdflib.term.URIRef('http://dbpedia.org/ontology/water'), rdflib.term.URIRef('http://purl.obolibrary.org/obo/PATO_0001681'), rdflib.term.Literal('18.01528', datatype=rdflib.term.URIRef('http://purl.obolibrary.org/obo/UO_0000088')))

The equivalent in RDF-XML is:

In [3]:
print(graph.serialize(format='xml').decode())
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns:ns1="http://purl.obolibrary.org/obo/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:about="http://dbpedia.org/ontology/water">
    <ns1:PATO_0001681 rdf:datatype="http://purl.obolibrary.org/obo/UO_0000088">18.01528</ns1:PATO_0001681>
  </rdf:Description>
</rdf:RDF>

We can also use namespace prefixes in Turtle:

In [4]:
print(graph.serialize(format='ttl').decode())
@prefix ns1: <http://purl.obolibrary.org/obo/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://dbpedia.org/ontology/water> ns1:PATO_0001681 "18.01528"^^ns1:UO_0000088 .


Normal forms and Triples

How do we encode the sentence "water has two hydrogen atoms" in RDF?

See Defining N-ary Relations on the Semantic Web for the definitive story.

I'm not going to search carefully here for existing ontologies for the relationships we need: later we will understand how to define these as being the same as or subclasses of concepts in other ontologies. That's part of the value of a distributed approach: we can define what we need, and because the Semantic Web tools make rigorous the concepts of rdfs:sameAs and subclassOf rdfs:subclassOf this will be OK.

However, there's a problem. We can do:

In [5]:
%%writefile reaction.ttl 

@prefix disr: <http://www.hep.ucl.ac.uk/cdt-dis/ontologies/reactions/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .

dbo:water obo:PATO_0001681 "18.01528"^^obo:UO_0000088 ;
          disr:containsElement obo:CHEBI_33260 .
Overwriting reaction.ttl

We've introduced the semicolon in Turtle to say two statements about the same entity. The equivalent RDF-XML is:

In [6]:
graph = Graph()
graph.parse("reaction.ttl", format="ttl")
print(graph.serialize(format='xml').decode())
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns:disr="http://www.hep.ucl.ac.uk/cdt-dis/ontologies/reactions/"
   xmlns:obo="http://purl.obolibrary.org/obo/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:about="http://dbpedia.org/ontology/water">
    <disr:containsElement rdf:resource="http://purl.obolibrary.org/obo/CHEBI_33260"/>
    <obo:PATO_0001681 rdf:datatype="http://purl.obolibrary.org/obo/UO_0000088">18.01528</obo:PATO_0001681>
  </rdf:Description>
</rdf:RDF>

However, we can't express "hasTwo" in this way without making an infinite number of properties!

RDF doesn't have a concept of adverbs. Why not?

It turns out there's a fundamental relationship between the RDF triple and a RELATION in the relational database model.

  • The subject corresponds to the relational primary key.
  • The verb (RDF "property") corresponds to the relational column name.
  • The object corresponds to the value in the corresponding column.

We already found out that to model the relationship of atoms to molecules we needed a join table, and the number of atoms was metadata on the join.

So, we need an entity type (RDF class) which describes an ElementInMolecule.

Fortunately, we don't have to create a universal URI for every single relatioship, thanks to RDF's concept of an anonymous entity. (Uniquely defined only by its relationships.) Imagine if we had to make a URN for oxygen-in-water, hydrogen-in-water etc!

In [7]:
%%writefile reaction.ttl 

@prefix disr: <http://www.hep.ucl.ac.uk/cdt-dis/ontologies/reactions/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix xs: <http://www.w3.org/2001/XMLSchema> .

dbo:water obo:PATO_0001681 "18.01528"^^obo:UO_0000088 ;
          disr:containsElement obo:CHEBI_33260 ;
          disr:hasElementQuantity [ 
              disr:countedElement obo:CHEBI_33260 ; 
              disr:countOfElement "2"^^xs:integer ]  .
Overwriting reaction.ttl

Here we have used [ ] to indicate an anonymous entity, with no subject. We then define two predicates on that subject, using properties corresponding to our column names in the join table.

Another turtle syntax for an anonymous "blank node" is this:

In [8]:
%%writefile reaction.ttl 

@prefix disr: <http://www.hep.ucl.ac.uk/cdt-dis/ontologies/reactions/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix xs: <http://www.w3.org/2001/XMLSchema> .

dbo:water obo:PATO_0001681 "18.01528"^^obo:UO_0000088 ;
          disr:containsElement obo:CHEBI_33260 ;
          disr:hasElementQuantity _:a .
                
_:a disr:countedElement obo:CHEBI_33260 ; 
              disr:countOfElement "2"^^xs:integer  .
Overwriting reaction.ttl

Serialising to RDF

Here's code to write our model to Turtle:

In [9]:
%%writefile chemistry_turtle_template.mko

@prefix disr: <http://www.hep.ucl.ac.uk/cdt-dis/ontologies/reactions/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix xs: <http://www.w3.org/2001/XMLSchema> .
        
[ 
%for reaction in reactions:
        disr:hasReaction [
          %for molecule in reaction.reactants.molecules:
          disr:hasReactant [
              % for element in molecule.elements:
              disr:hasElementQuantity [
                  disr:countedElement [
                      a obo:CHEBI_33259;
                      disr:symbol "${element.symbol}"^^xs:string
                  ] ;
                  disr:countOfElement "${molecule.elements[element]}"^^xs:integer
              ];
              % endfor
              a obo:CHEBI_23367
          ] ;
          %endfor
          %for molecule in reaction.products.molecules:
          disr:hasProduct [
              % for element in molecule.elements:
              disr:hasElementQuantity [
                  disr:countedElement [
                      a obo:CHEBI_33259;
                      disr:symbol "${element.symbol}"^^xs:string
                  ] ;
                  disr:countOfElement "${molecule.elements[element]}"^^xs:integer
              ] ;
              % endfor
              a obo:CHEBI_23367
          ] ;
          %endfor
          a disr:reaction
      ] ;
      %endfor
    a disr:system
  ].
Writing chemistry_turtle_template.mko

"a" in Turtle is an always available abbreviation for http://www.w3.org/1999/02/22-rdf-syntax-ns#type

I've skipped serialising the stoichiometries : to do that correctly I also need to create a relationship class for molecule-in-reaction.

And we've not attempted to relate our elements to their formal definitions, since our model isn't recording this at the moment. We could add this statement later.

In [10]:
import mako

from parsereactions import parser

from IPython.display import display, Math
system=parser.parse(open('system.tex').read())
display(Math(str(system)))
$\displaystyle C_6H_{12}O_6 + 6O_2 \rightarrow 6CO_2 + 6H_2O\\ 2H_2 + O_2 \rightarrow 2H_2O$
In [11]:
from mako.template import Template

mytemplate = Template(filename='chemistry_turtle_template.mko')
with open('system.ttl','w') as ttlfile:
    ttlfile.write((mytemplate.render( **vars(system))))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-7ae131fb6883> in <module>
      3 mytemplate = Template(filename='chemistry_turtle_template.mko')
      4 with open('system.ttl','w') as ttlfile:
----> 5     ttlfile.write((mytemplate.render( **vars(system))))

~/virtualenv/python3.6.3/lib/python3.6/site-packages/mako/template.py in render(self, *args, **data)
    460 
    461         """
--> 462         return runtime._render(self, self.callable_, args, data)
    463 
    464     def render_unicode(self, *args, **data):

~/virtualenv/python3.6.3/lib/python3.6/site-packages/mako/runtime.py in _render(template, callable_, args, data, as_unicode)
    836 
    837     _render_context(template, callable_, context, *args,
--> 838                     **_kwargs_for_callable(callable_, data))
    839     return context._pop_buffer().getvalue()
    840 

~/virtualenv/python3.6.3/lib/python3.6/site-packages/mako/runtime.py in _render_context(tmpl, callable_, context, *args, **kwargs)
    871         # if main render method, call from the base of the inheritance stack
    872         (inherit, lclcontext) = _populate_self_namespace(context, tmpl)
--> 873         _exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
    874     else:
    875         # otherwise, call the actual rendering method specified

~/virtualenv/python3.6.3/lib/python3.6/site-packages/mako/runtime.py in _exec_template(callable_, context, args, kwargs)
    897             _render_error(template, context, e)
    898     else:
--> 899         callable_(context, *args, **kwargs)
    900 
    901 

chemistry_turtle_template_mko in render_body(context, **pageargs)

AttributeError: 'str' object has no attribute 'symbol'
In [12]:
!cat system.ttl
In [13]:
graph = Graph()
graph.parse("system.ttl", format="ttl")
Out[13]:
<Graph identifier=Nad039e8139cc48b286dd98a83f7218eb (<class 'rdflib.graph.Graph'>)>

We can see why the group of triples is called a graph: each node is an entity and each arc a property relating entities.

Note that this format is very very verbose. It is not designed to be a nice human-readable format.

Instead, the purpose is to maximise the capability of machines to reason with found data.

Formalising our ontology: RDFS

Our http://www.hep.ucl.ac.uk/cdt-dis/ontologies/reactions/ namespace now contains the following properties:

  • disr:hasReaction
  • disr:hasReactant
  • disr:hasProduct
  • disr:containsElement
  • disr:countedElement
  • disr:hasElementQuantity
  • disr:countOfElement
  • disr:symbol

And two classes:

  • disr:system
  • disr:reaction

We would now like to find a way to formally specify some of the relationships between these.

The type (http://www.w3.org/1999/02/22-rdf-syntax-ns#type or a) of the subject of hasReaction must be disr:system.

RDFS will allow us to specify which URNs define classes and which properties, and the domain and range (valid subjects and objects) of our properties.

For example:

In [14]:
%%writefile cdt_dis_ontology.ttl


@prefix disr: <http://www.hep.ucl.ac.uk/cdt-dis/ontologies/reactions/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix xs: <http://www.w3.org/2001/XMLSchema> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

disr:system a rdfs:Class .
disr:reaction a rdfs:Class .
disr:hasReaction a rdf:Property .
disr:hasReaction rdfs:domain disr:system .
disr:hasReaction rdfs:range disr:reaction .          
Writing cdt_dis_ontology.ttl

This will allow us to make our file format briefer: given this schema, if

_:a hasReaction _:b

then we can infer that

_:a a disr:system . _:b a disr:reaction .

without explicitly stating it.

Obviously there's a lot more to do to define our other classes, including defining a class for our anonymous element-in-molecule nodes.

This can get very interesting:

In [15]:
%%writefile cdt_dis_ontology.ttl


@prefix disr: <http://www.hep.ucl.ac.uk/cdt-dis/ontologies/reactions/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix xs: <http://www.w3.org/2001/XMLSchema> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

disr:system a rdfs:Class .
disr:reaction a rdfs:Class .
disr:hasReaction a rdf:Property .
disr:hasReaction rdfs:domain disr:system .
disr:hasReaction rdfs:range disr:reaction .     

disr:hasParticipant a rdf:Property .
disr:hasReactant rdfs:subPropertyOf disr:hasParticipant .
disr:hasProduct rdfs:subPropertyOf disr:hasParticipant
Overwriting cdt_dis_ontology.ttl

OWL extends RDFS even further.

Inferring additional rules from existing rules and schema is very powerful: an interesting branch of AI. (Unfortunately the python tool for doing this automatically is currently not updated to python 3 so I'm not going to demo it. Instead, we'll see in a moment how to apply inferences to our graph to introduce new properties.)

SPARQL

So, once I've got a bunch of triples, how do I learn anything at all from them? The language is so verbose it seems useless!

SPARQL is a very powerful language for asking questions of knowledge bases defined in RDF triples:

In [16]:
results=graph.query(
    """SELECT DISTINCT ?asymbol ?bsymbol
       WHERE {
          ?molecule disr:hasElementQuantity ?a .
          ?a disr:countedElement ?elementa .
          ?elementa disr:symbol ?asymbol .
          ?molecule disr:hasElementQuantity ?b .
          ?b disr:countedElement ?elementb .
          ?elementb disr:symbol ?bsymbol
       }""")

for row in results:
    print("Elements %s and %s are found in the same molecule" % row)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-16-6b95c6482ce4> in <module>
      8           ?b disr:countedElement ?elementb .
      9           ?elementb disr:symbol ?bsymbol
---> 10        }""")
     11 
     12 for row in results:

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/graph.py in query(self, query_object, processor, result, initNs, initBindings, use_store_provided, **kwargs)
   1087 
   1088         return result(processor.query(
-> 1089             query_object, initBindings, initNs, **kwargs))
   1090 
   1091     def update(self, update_object, processor='sparql',

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/processor.py in query(self, strOrQuery, initBindings, initNs, base, DEBUG)
     73         if not isinstance(strOrQuery, Query):
     74             parsetree = parseQuery(strOrQuery)
---> 75             query = translateQuery(parsetree, base, initNs)
     76         else:
     77             query = strOrQuery

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in translateQuery(q, base, initNs)
    762     # absolutize/resolve prefixes
    763     q[1] = traverse(
--> 764         q[1], visitPost=functools.partial(translatePName, prologue=prologue))
    765 
    766     P, PV = translate(q[1])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in traverse(tree, visitPre, visitPost, complete)
    382     """
    383     try:
--> 384         r = _traverse(tree, visitPre, visitPost)
    385         if complete is not None:
    386             return complete

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    345             e[k] = _traverse(val, visitPre, visitPost)
    346 
--> 347     _e = visitPost(e)
    348     if _e is not None:
    349         return _e

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in translatePName(p, prologue)
    140     if isinstance(p, CompValue):
    141         if p.name == 'pname':
--> 142             return prologue.absolutize(p)
    143         if p.name == 'literal':
    144             return Literal(p.string, lang=p.lang,

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/sparql.py in absolutize(self, iri)
    372         if isinstance(iri, CompValue):
    373             if iri.name == 'pname':
--> 374                 return self.resolvePName(iri.prefix, iri.localname)
    375             if iri.name == 'literal':
    376                 return Literal(

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/sparql.py in resolvePName(self, prefix, localname)
    355         ns = self.namespace_manager.store.namespace(prefix or "")
    356         if ns is None:
--> 357             raise Exception('Unknown namespace prefix : %s' % prefix)
    358         return URIRef(ns + (localname or ""))
    359 

Exception: Unknown namespace prefix : disr

We can see how this works: you make a number of statements in triple-form, but with some quantities as dummy-variables. SPARQL finds all possible subgraphs of the triple graph which are compatible with the statements in your query.

We can also use SPARQL to specify inference rules:

In [17]:
graph.update(
    """INSERT { ?elementa disr:inMoleculeWith ?elementb }
       WHERE {
          ?molecule disr:hasElementQuantity ?a .
          ?a disr:countedElement ?elementa .
          ?elementa disr:symbol ?asymbol .
          ?molecule disr:hasElementQuantity ?b .
          ?b disr:countedElement ?elementb .
          ?elementb disr:symbol ?bsymbol
       }"""
)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-17-b7d7154080f5> in <module>
      8           ?b disr:countedElement ?elementb .
      9           ?elementb disr:symbol ?bsymbol
---> 10        }"""
     11 )

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/graph.py in update(self, update_object, processor, initNs, initBindings, use_store_provided, **kwargs)
   1110             processor = plugin.get(processor, query.UpdateProcessor)(self)
   1111 
-> 1112         return processor.update(update_object, initBindings, initNs, **kwargs)
   1113 
   1114 

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/processor.py in update(self, strOrQuery, initBindings, initNs)
     52     def update(self, strOrQuery, initBindings={}, initNs={}):
     53         if isinstance(strOrQuery, str):
---> 54             strOrQuery=translateUpdate(parseUpdate(strOrQuery), initNs=initNs)
     55 
     56         return evalUpdate(self.graph, strOrQuery, initBindings)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in translateUpdate(q, base, initNs)
    739         # absolutize/resolve prefixes
    740         u = traverse(
--> 741             u, visitPost=functools.partial(translatePName, prologue=prologue))
    742         u = _traverse(u, _simplifyFilters)
    743 

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in traverse(tree, visitPre, visitPost, complete)
    382     """
    383     try:
--> 384         r = _traverse(tree, visitPre, visitPost)
    385         if complete is not None:
    386             return complete

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    345             e[k] = _traverse(val, visitPre, visitPost)
    346 
--> 347     _e = visitPost(e)
    348     if _e is not None:
    349         return _e

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in translatePName(p, prologue)
    140     if isinstance(p, CompValue):
    141         if p.name == 'pname':
--> 142             return prologue.absolutize(p)
    143         if p.name == 'literal':
    144             return Literal(p.string, lang=p.lang,

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/sparql.py in absolutize(self, iri)
    372         if isinstance(iri, CompValue):
    373             if iri.name == 'pname':
--> 374                 return self.resolvePName(iri.prefix, iri.localname)
    375             if iri.name == 'literal':
    376                 return Literal(

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/sparql.py in resolvePName(self, prefix, localname)
    355         ns = self.namespace_manager.store.namespace(prefix or "")
    356         if ns is None:
--> 357             raise Exception('Unknown namespace prefix : %s' % prefix)
    358         return URIRef(ns + (localname or ""))
    359 

Exception: Unknown namespace prefix : disr
In [18]:
graph.query("""
    SELECT DISTINCT ?asymbol ?bsymbol
    WHERE {
          ?moleculea disr:inMoleculeWith ?moleculeb .
          ?elementa disr:symbol ?asymbol .
          ?elementb disr:symbol ?bsymbol
       }""")

for row in results:
    print("Elements %s and %s are found in the same molecule" % row)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-18-5db801aea27e> in <module>
      5           ?elementa disr:symbol ?asymbol .
      6           ?elementb disr:symbol ?bsymbol
----> 7        }""")
      8 
      9 for row in results:

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/graph.py in query(self, query_object, processor, result, initNs, initBindings, use_store_provided, **kwargs)
   1087 
   1088         return result(processor.query(
-> 1089             query_object, initBindings, initNs, **kwargs))
   1090 
   1091     def update(self, update_object, processor='sparql',

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/processor.py in query(self, strOrQuery, initBindings, initNs, base, DEBUG)
     73         if not isinstance(strOrQuery, Query):
     74             parsetree = parseQuery(strOrQuery)
---> 75             query = translateQuery(parsetree, base, initNs)
     76         else:
     77             query = strOrQuery

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in translateQuery(q, base, initNs)
    762     # absolutize/resolve prefixes
    763     q[1] = traverse(
--> 764         q[1], visitPost=functools.partial(translatePName, prologue=prologue))
    765 
    766     P, PV = translate(q[1])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in traverse(tree, visitPre, visitPost, complete)
    382     """
    383     try:
--> 384         r = _traverse(tree, visitPre, visitPost)
    385         if complete is not None:
    386             return complete

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in <listcomp>(.0)
    337 
    338     if isinstance(e, (list, ParseResults)):
--> 339         return [_traverse(x, visitPre, visitPost) for x in e]
    340     elif isinstance(e, tuple):
    341         return tuple([_traverse(x, visitPre, visitPost) for x in e])

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    343     elif isinstance(e, CompValue):
    344         for k, val in e.items():
--> 345             e[k] = _traverse(val, visitPre, visitPost)
    346 
    347     _e = visitPost(e)

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in _traverse(e, visitPre, visitPost)
    345             e[k] = _traverse(val, visitPre, visitPost)
    346 
--> 347     _e = visitPost(e)
    348     if _e is not None:
    349         return _e

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/algebra.py in translatePName(p, prologue)
    140     if isinstance(p, CompValue):
    141         if p.name == 'pname':
--> 142             return prologue.absolutize(p)
    143         if p.name == 'literal':
    144             return Literal(p.string, lang=p.lang,

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/sparql.py in absolutize(self, iri)
    372         if isinstance(iri, CompValue):
    373             if iri.name == 'pname':
--> 374                 return self.resolvePName(iri.prefix, iri.localname)
    375             if iri.name == 'literal':
    376                 return Literal(

~/virtualenv/python3.6.3/lib/python3.6/site-packages/rdflib/plugins/sparql/sparql.py in resolvePName(self, prefix, localname)
    355         ns = self.namespace_manager.store.namespace(prefix or "")
    356         if ns is None:
--> 357             raise Exception('Unknown namespace prefix : %s' % prefix)
    358         return URIRef(ns + (localname or ""))
    359 

Exception: Unknown namespace prefix : disr

Exercise for reader: express "If x is the subject of a hasReaction relationship, then x must be a system" in SPARQL.

Exercise for reader: search for a SPARQL endpoint knowledge base in your domain.

Connect to it using Python RDFLib's SPARQL endpoint wrapper and ask it a question.

In [ ]: