David R Newman

<< Back to PhD Homepage

How to Write Good OWL Ontologies

In my experience of reviewing other ontologies, reading papers on OWL and writing my own ontologies, I have found some good design practices and some really quite bad ones. Therefore, in this blog I intend to set out good design practices for OWL ontologies and why I believe these to be better than some of the practices that I have found.

Contents

Entries

<< Previous Entry | Next Entry >>

23/11/2006 REVIEW:- Putting OWL in Order: Patterns for Sequences in OWL

In my blog "OWL doesn't like RDF:Bag and RDF:Seq", I mentioned a paper that implemented lists in OWL, so that they could be reasoned over with a DL reasoner. This paper was entitled Putting OWL in Order: Patterns for Sequences in OWL and it written by researchers as the University of Manchester.

The Vision of OWLLists

Their vision was to basically recreate RDF lists but using the set theory properties provided by OWL to make it possible to reason with them in OWL DL. This is not possible with RDF lists because OWL serialisation uses RDF, meaning that certain parts of RDF cannot be reasoned over. One interesting thing that I discovered whilst reading this report was that RDF:Seq is actually legal in OWL DL but the reasoner has no access to the logical semantics, which means that it has no way of traversing the list in an ordered fashion.

The properties of an OWL list are basically replacements for the properties of an RDF list but they are better defined. I.e. each property has a domain and two of them have a range and concepts such as functionality and transivity can be ascribed to these properties where necessary.

RDF PropertyOWL PropertyDomainRangeType(s)
firsthasContentsOWLList Object, Functional
resthasNextOWLListOWLListFunctional, Object
-isFollowedByOWLListOWLListObject, Transitive
-hasListPropertyOWLList Object

As you can see there is two additional properties not found in RDF list, hasListProperty is just an abstract superclass for the other three properties, isFollowedBy is defined to aid pattern matching of lists this will be explained later.

One problem with lists is how to terminate them, RDF lists solve this problem by having an something called RDF:nil which is basically an instance of the class RDF:List defined within the RDF namespace that represents the empty list. This could cause a problem because it means that if a list is not explicitly defined as empty (RDF:nil), even if it is empty there is no way of being certain of this. OWLLists go about solving this problem a similar way but because they have the ability to describe things using the set theory, provided within OWL, an empty list can defined as a class with restrictions on what properties any instance of the class must have, making it possible to reason whether a list is empty. Before we can consider what the class of EmptyList is, we must know what the class of OWLList is. The paper defines an OWLList as an OWL:Thing that has zero or more isFollowedBy properties that all have a value that is an OWLList.

EmptyList is initially defined as an OWLList that has a max cardinality of zero for its hasContents property. This on its own is not sufficient to describe the class EmptyList because it still allows instances that have one or more isFollowedBy properties, this is slightly more difficult to resolve than you may think. Firstly isFollowedBy is transitive, this makes it illegal to use the max cardinality = 0, like what is used for hasContents. Instead the complement of the restriction that at least one non-null isFollowedBy property must exist, i.e. no isFollowedBy property must exist. If this is defined as an equivalent class to the class which has the restriction on hasContents, using equivalence axioms it is possible to infer that once you know an OWLList does not have either a hasContents or a isFollowedBy property you can be both sure it is an EmptyList instance but it also has no value for the other property. Below is the RDF/XML for the EmptyList Class. Click here to see the RDF/XML for all the components of OWLLists.

<owl:Class rdf:about="#EmptyList">
  <owl:equivalentClass>
    <owl:Class>
      <owl:intersectionOf rdf:parseType="Collection">
        <owl:Class rdf:about="#OWLList"/>
        <owl:Restriction>
          <owl:onProperty>
            <owl:ObjectProperty rdf:about="#hasContents"/>
          </owl:onProperty>
          <owl:maxCardinality 
          rdf:datatype="http://www.w3.org/2001/XMLSchema#int">0
          </owl:maxCardinality>
        </owl:Restriction>
      </owl:intersectionOf>
    </owl:Class>
  </owl:equivalentClass>
  <owl:equivalentClass>
    <owl:Class>
      <owl:intersectionOf rdf:parseType="Collection">
        <owl:Class rdf:about="#OWLList"/>
        <owl:Class>
          <owl:complementOf>
            <owl:Restriction>
              <owl:onProperty>
                <owl:ObjectProperty rdf:about="#isFollowedBy"/>
              </owl:onProperty>
              <owl:someValuesFrom 
              rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
            </owl:Restriction>
          </owl:complementOf>
        </owl:Class>
      </owl:intersectionOf>
    </owl:Class>
  </owl:equivalentClass>
</owl:Class>

When OWLLists get Large

When a user defines an OWLList, they must nest the list one inside each other, something like this:

<ol:OWLList rdf:about="http://www.example.com/ol/owllist1">
  <ol:hasContents rdf:resource="http://www.example.com/ol/item1"/>
  <ol:hasNext>
    <ol:OWLList rdf:about="http://www.example.com/ol/owllist2">
      <ol:hasContents rdf:resource="http://www.example.com/ol/item2"/>
      <ol:hasNext>
        ...
        ...
      </ol:hasNext>
    </ol:OWLList>
  </ol:hasNext>
</ol:OWLList>

It is clear that a list with 100+ elements will be very big, as some of the applications for OWLLists want to be able to define lists with over a million items this could be a major problem. The amount of triples that would be required for a million item list should not be too bad, as I believe only 3 triples per item should be required, (3 million in total). The problem with storage arises when you consider the isFollowedBy property because it is transitive and the super property of hasNext. This means that the first item of a million triple list could have 999,999 isFollowedBy triples (that is a total of 999,999 x 500,000 = 499,999,500,000, for the whole list) that has to either be stored by the triplestore or these triples will have to be inferenced at query time. Neither of these ways is very efficient. All current triplestores struggle to be efficient with such a number of triples and would have a very long import time. At query time, even if only a fraction of the triples had to be infered by a reasoner, the query time would still be far longer than would be tolerable using current DL reasoners.

Unwanted Additional List Branches

isFollowedBy creates a another problem because it is transitive and the super property of hasNext. hasNext being functional means it cannot create lists that have more than one branch. Unfortunately because isFollowedBy is transitive it cannot be functional, therefore, isFollowedBy could be used to create additional branches. The paper fully acknowledges this and suggests that this could be resolved by making isFollowedBy the transitive closure of hasNext. By doing this it would restrict the isFollowedBy relations that could be defined so that a user could no accidently or maliciously define an isFollowedBy relation that adds an additional branch to a list. The reason that isFollowedBy has not been defined as a transitive closure of hasNext is because no current DL reasoners can handle bi-implication of transitivity. I.e. a DL reasoner can determine isFollowedBy relations from hasNext relations, e.g. "a hasNext b . b hasNext c . c hasNext d" implies "a isFollowedBy d" but it cannot take "a isFollowedBy d" and work out using the hasNext relations whether it is a legal.

To work out whether an isFollowedBy relation is legal is not trivial, unless the transitive closure set (i.e. all the legal isFollowedBy relations) for the list has been generated. As already discussed doing this at either import or query time would not be very efficient. Even working out whether one isFolloweBy relation is legal just by following the hasNext relations would not be very fast with a long list, unless the the subject and object happenend to be very close to each other in the list.

Circular Lists

A further problem that the paper cites is that circular lists cannot be prevented. The following two hasNext statements can be defined "a hasNext b" and "b hasNext a", without breaking the rule that hasNext must be functional or that isFollowedBy must be its transitive closure. The only way to prevent circular lists is to not only make isFollowedBy a transitive closure but also make it antisymmetric. Doing this ensures that any relation that can be in the transitive closure set, can only be the the inverse of another relation, (i.e. subject and object switched), if the subject and object themselves are the same thing. This would acheive the intended goal by making one of the transformations of "a hasNext b" and b hasNext a" into "a hasNext a" and "b hasNext a", breaking the rule that hasNext is functional and thus preventing circular lists. The only problem with this is that OWL DL cannot currently define antisymmetry.

How Long is the List

List lengths cannot easily be defined, unless all the OWLList instances that make up one complete list are all defined as members of the same group, which would signicantly add to the number of triples that would be required to store a list. In theory this information is available through the hasNext relations and therefore storing this twice is uneconomical and could lead to inconsistency. Another way of trying to set a specific length for a list is by placing cardinality constraints on the isFollowedBy property, although this is not possible in OWL DL because it is transitive. The paper does not explain in general how this would be acheived anyway. Presumably there would be a restriction set on the transitive closure set for a list, that no set of isFollowedBy relations with the same subject may have a cardinality the same or more than a certain value and one set of isFollowedBy relations with the same subject must have a cardinality that is one less than that certain value.

Pattern Matching in OWLLists

One use for the isFollowedBy is to allow compilations of regular expressions for lists, e.g. you can define the restriction for all the lists that must start with an instance of class A and end with an instance of class B. Unfortuntely the paper cites a few cases where it is not possible to express certain regular expressions. It is impossible to express a pattern that starts "0 or more" or "1 or more" of a certain type of class without expressing what must directly follow this pattern, i.e. you can have (A*) and (A+, B, ...) but you cannot have (A*, ...). This is because these statements require a recursive call, so the directly following pattern or type of instance must be defined to give a terminating condition for the recursion. A* and A+ do not require a recursive call if they are the end of a list, they have an implicit empty list terminating condition, that allows the recursive call to be avoided.

Regular expression matching is also impossible if you want to find a list that is a one sublist followed directly by another, such as ((A, B, C),(D, E, F)), this must be concatenated down to (A, B, C, D, E, F) before any list can be pattern matched. The reason for this is that only way to define the pattern with two separate sublists is "List_starts_ABC AND isFollowedBy SOME List_starts_DEF", which will allow 0 or more elements between the sub list. You may think that this could be solved by defining the pattern as "List_starts_ABC AND hasNext SOME List_starts_DEF", this would not work because the hasNext would refer to the first OWLList in the List_start_ABC, making the OWLList with the element that is an instance of class A have to hasNext relations, making it non-functional and therefore the definition illegal. See the expanded Manchester Syntax for this here.

What Classes are in my List

One further problem the paper cites that there is no way to define a class that subsumes all the classes of list instances in an OWLList. I.e. if L = (A, B, C, A, C), there is no way to infer a class M that subsumes classes A, B and C. This is the case for the same reason that it is difficult to set the length of a list, as the list is physically list instances nested one inside another, there is not a over-arching instance, that can be used as a membership group for all the list instances in the list, there is no easy way of deferencing all the contents of the individual OWLLists and therefore a restriction cannot be defined to express M as the union of all classes that have instances that are members of a certain list. I am not absolutely certain that even with this overarching instance for a list it would be possible to define this class M.

Conclusion

One of the main purposes of this review was for me to gain a greater understanding of how set theory for OWL DL currently works and how it could be improved to increase its expressivity. By trying to explain in greater depth why the paper cites OWLLists as having the limitation they do, this has not only helped me improve my understanding but has given me a reference to look back on if in future I have a complex set theory problem in OWL DL. Also it may help those who have read the paper understand why it states the limitations it does because the first couple of times I read the paper I really only had a surface understanding of why these limitations were true.

<< Back to PhD Homepage

Page written by David R Newman (drn[at]ecs.soton.ac.uk). Last updated June 27 2018 14:44:18.