麻辣俏护士在哪下载:RDF: Ontologies and Metadata

来源:百度文库 编辑:九乡新闻网 时间:2024/04/29 14:18:58

Ontologies and Metadata

A Draft Discussion of issues raised by the Semantic Web TechnologiesWorkshop, 22-23 November 2000.

Author: Libby Miller
Date: 2000-11-30
Latest version: http://ilrt.org/discovery/2000/11/lux/

Abstract

A discussion of what ontologies might mean in the context of the semanticweb. This is not a full and complete description of the workshop: a link topresentions will be made when they are available.

Status of this Document

This is a draft! comments welcome.

Introduction

22-23 November I attended a 'semantic web technologies workshop' [SWTW] in Luxembourg. The workshop was organised under theauspices of the EC's Information Society Technologies program.

The theme of the conference might be summarised as 'making content machineunderstandable' on the web. The invited talks included presentations aboutontologies, the wireless web, multimedia, agents, and business opportuniteson the semantic web. There were also 25 short presentations, on a variety ofsubjects, some very relevant and interesting; some half-baked projectproposals.

Although the presentations had many different themes, I am going to look attwo basic focal points of the workshop: ontologies and metadata.

Before attending the workshop I had only the vaguest idea of what anontology was. I knew that ontologies were used in logic programming, andthat was about it. The presentations themselves did not provide an easyintroduction to ontologies, or explain really why they were so important:this document has therefore turned into a kind of exploration of whatontologies might mean to non-logic programmers - to people who developsubject gateways, or people like me who think of themselves as trying tocontribute to the semantic web, but who are not logic programmers.

About Ontololgies

As I understand it, ontologies are rather like classification schemes. Theyare ways of defining the relationships between objects in the world. Butontologies also have more to them than that: a classification scheme isusually a way of organising objects by placing them under subjectcategories, but an ontology also defines how you are going to divide theobjects up. This might not be by subject.

An example: the description of a hierarchy of employees in a business:

 employee subclassOf persondirector subclassOf employee project manager subclassOf employee worksOn Project reportsTo Director lackey subclassOf employee worksOn project reportsTo project manager

An ontology will define the way these things in the world interact (can aproject manager be a lackey?) and cardinality constraints (can a projectmanager work on more than one project?), and so on.

Ontologies have two main functions in logic programming

  1. they provide a way of viewing the world, and hence for organising information.
  2. they are required for interoperability, to define a shared vocabularyand meanings for terms with respect to other terms.

For the first case, for example suppose you are a logic engine who is given the following information:

Libby worksOn IMeshtk
Libby worksOn Harmony

and you want to work out if Libby can work on two projects. We would needto know if IMeshtk and Harmony are projects, and if Libby is a person, and ifLibby is a project manager or a lackey or a director. We would also need toknow from the definition of a project whether a person can work on twoprojects, and whether this differs if the person is a lackey or a projectmanager. This is the sort of information an ontology would need to contain.

For the second case, imagine that you have an inferencing engine that already hasinformation about project managers and so on in one company, but then is giveninformation about a different corporate heirarchy which could be subtly ordramatically different, in terms of the names of its parts and the way they aredefined and relate together. In this case reasoning about data with respect to thenew way of looking at the world within the old framework would require a'cross-walk' mapping items and connections in one ontology to the other. Similarly,if you want to combine ontologies to talk about different aspects of objects, thenyou need to describe how the ontologies relate to each other. This can be a verydifficult and time consuming problem.

Ontologies are often very complicated, and are difficult to write, maintain and compare. The problem of building an ontology, say for an organisation,is the same as the problem of building a model of the important elements of thatorganisation. There will be different ways of looking at the organisation,and there will be different priorities for different people. Then, as you getmore information, your view of the organisation may change, or theorganisation might be restructured, requiring that you have rewrite theontology. The problem is rather like deciding on the structure of arelational database and then perhaps having to reorganise it after you haveadded lots of data.

Ontologies and Subject Gateways

What surprised me was how similar many of these problems are to thosefaced by the library community in its web manifestation in defining andaugmenting systems to organise resources, for example lists of internetsites in subject gateways. Librarians created DDC, a huge organising systemfor subject-based classification of resources. With DDC, librarians classifyresources according to their subject; however subject is only part of thecommonly used metadata (data about data) used to describe resources. Books,for example also have a title, an author, a publication date and so on. Whilea classification scheme enables you to relate books by their subject, therelationships between the properties we see as important about books orresources need to be defined in another way.

For example, depending on how we wanted to describe a book we could say

book hasPart chapterchapter hasElement pagepage hasElement paragraph...

or, we might find the following more useful for the purposes of finding whatwe need from a book

book hasTitle text hasDescription text hasSubject descriptor hasDatePublished date

So (to me at any rate) a schema like this looks like at least the beginningsof an ontology, although we might expect an ontology used in logicprogramming to be more formally defined, in terms of the sorts of things atitle can consist of, how many titles are allowed per book, and so on.

But this sort of thing, perhaps informally defined, is exactly what subjectgateways use in the classification of resources. To a lesser extent, anysearch engine will also use properties of web pages such as title,description, date and url.

One piece of knowledge that subject gateways and the logicprogramming experts share is that complex classificationsystems/ontologies are hard to create and manage, and even harder to share.One approach is to put effort into making the creation, management andsharing of ontologies easier: several of the presentations in Luxembourg wereabout systems which provided tools for the creation (Ian Horrocks, OIL) and viewing (Mikael Nilsson, Conzilla) of ontologies, and which offered the possibility ofsharing them (Dietel Fensel, On-to-knowledge proposal).

Although this approach may help, it doesn't solve the problem that differentorganisations or users of ontologies will tend to have different needs, maybediffering only subtly, but still making reuse of ontologies difficult.Consider the Dublin Core, which is a set of metadata elements for describingdocuments on the web, including their title, description, identifier and so on. The Dublin Core Metadata Element Set Reference Description [DC] is atextual description of how one should use the elements to describe metadata.

Dublin Core has a very flat, very general structure, and and its elements look likethey should also be useful for describing lots of things that are similar to webdocuments, for example images. RDFPic (Bert Bos, [PIC]) is avery nice Java tool for embedding classification data inside JPEGs, and uses DublinCore, but you have to interpret several of the elements to fit into what they reallymean in the contextof a picture. For example the dc:creator is interpreted as the photographer and thedc:coverage is the place where the photograph is taken. Both of these arecompatible with the definitions given in [DC] but are alsoslightly different to what you use them for if describing a document.

One solution is simply to define a new ontology whenever you need it. However,this can mean that there is a proliferation of ontologies which may not be relatedto each other.

RDF and metadata

Metadata is just data about data. RDF (Resource Description Framework) can be usedto describe metadata. Here's an example of the Dublin Core Metadata about this pagegenerated by DC-Dot [DOT] in RDF

             RDF: Ontologies and Metadata              Libby Miller              Ontologies and Metadata; RDF              A Draft Discussion of issues raised by the Semantic Web      Technologies Workshop, 22-23 November 2000;               2000-12-05              text              text/html              15112 bytes              en      

Within RDF there is a mechanism for including different ways of classifying thingsin the same documents, using XML namespaces. This means that several different waysof classifying the world can be combined. The easiest example of how this isdone is with RSS 1.0 [RSS], which is a way of describing webresources in a verysimple way (using title, description and link) but which can be extended byadding modules under different namespaces, so that you can describe the samelinks in different ways, adding further information to the description(Eric Van der Vlist, RSS1.0 [RSS-Eric]). An example (taken from [RSS]) is below. Theblack text is the plain rss channel, essentially just a list of links. The blue textis the dublin core module, which is declared using a namespace
xmlns:dc="http://purl.org/dc/elements/1.1/"
and then used to add infiormation about the resourcehttp://c.moreover.com/click/here.pl?r123 such as distription, publisher and subject.

xmlns:dc="http://purl.org/dc/elements/1.1/"  xmlns="http://purl.org/rss/1.0/"  >       Meerkat     http://meerkat.oreillynet.com     Meerkat: An Open Wire Service     The O'Reilly Network     Rael Dornfest (mailto:rael@oreilly.com)     Copyright © 2000 O'Reilly & Associates, Inc.     2000-01-01T12:00+00:00    2     2000-01-01T12:00+00:00                                                     Meerkat Powered!     http://meerkat.oreillynet.com/icons/meerkat-powered.jpg     http://meerkat.oreillynet.com           XML: A Disruptive Technology     http://c.moreover.com/click/here.pl?r123            XML is placing increasingly heavy loads on the existing technical      infrastructure of the Internet.          The O'Reilly Network     Simon St.Laurent (mailto:simonstl@simonstl.com)     Copyright © 2000 O'Reilly & Associates, Inc.    XML          Search Meerkat     Search Meerkat's RSS Database...     s     http://meerkat.oreillynet.com/     regex   

Using XML namespaces in RDF we can use several different ways of looking at theworld when describing the same resource, for example a webpage. For machineunderstandable content however, we need a way of defining how these ontologicalstructures described using namespaces relate together.

In the example

libby foaf:mbox libby.miller@bristol.ac.uklibby rdf:type wn:person

The RDF Schema RDFS allows you to create a schema for thenamespace http://xmlns.com/foaf/0.1/ (abbrieviated to foaf) and to use this to saythat foaf:mbox should always point at something of type wn:person, for example. Theontology creation language OIL OIL extends RDF Schema andallows you to be much more specific about what sort of thing a person is, theproperties a thing needs to have to be a wn:person and so on.

Suppose we said (using RDF or OIL or something similar) that foaf:person is asubclass of wn:person. Then we have created a one-to-one crosswalk betweena part of these ontologies, which is one way of relating them. Several of thepresentations at the Luxembourg workshop talked about methods of creatingcross-walks betweenontologies (e.g. Jerome Euzenat; Atanas Kirakov, OntoMap). But even one cross-walkbetween complex ontologies is extremely time consuming if done by hand. Another problem with this incremental approach is that it can't help a robot which does notunderstand either of our schema; nor can it provide a mapping where one has not beencreated.

However, if it is an RSS 1.0 robot, it will understand the underlying RSScore framework of titles and urls and so it will be able to do something withthe data it finds, even if it is unable to interpret the information undercertain namespaces (modules).

In a similar way, a very simple base classification schema of things intosay, people, documents and organisations, could provide a short-circuit tothe cross-walk problem. People would have to define their schemas or data interms of these simple classifications, but this is different to trying to create acomplete universal ontology that can be mapped to all other ontologies without lossof information. Instead, this would be a way of simplifying the data found if therethe robot did not understand the schema that the data was actually written in.

Closed and Open worlds

Where interoperability is not important, for example within aclosed world system such as the learning environments described by Peter Fankhauser and Luca Bottori in Luxembourg, metadata about objects can be used with a singleontology (orseveral with a known cross-mapping) to do things like describe the qualities of astudent, or in different areas describe the characteristics of a device (JohanHjelm), describe the characteristics of a document or a multimedia objects (Harold Boley and Jos de Roo). It can be used forclassification by hand or autoclassification of results, and for combiningobjects for curricular or presentations (Wolfgang Klas; Lynda Hartman) to create complex multimedia objects.

But if the management of ontologies is devolved to the people using the web,then mapping between ontologies becomes a very difficult and verylabour-intensive process. It might be possible to create ad hoccross-walks between ontologies as they are needed, and maybe partiallyautomate this process. There is also the simpleridea of 'dumbing down' used by RSS 1.0 and proposed in the ABC document [ABC].

For logical inferencing the problem is even more acute. Again, if inferencing is occurring in a closed world, a single ontology is bothnecessary and sufficient for it to function. But if it is an open world, there willbe large gaps in the information the inferencing engine has,because some ontologies will be unknown, which will render large chunks ofinformation useless.

Essentially, someone, somewhere needs to tell the system that each object ina new ontology maps to something it knows about. One generalized solution foropen systems like the web is a very simple mapping between objects in thesystem and a very simple schema or ontology of things, at the level ofdocuments, people and events. However, it is an open question how one woulddecide what this schema should consist of.

Conclusions

It seems to me as someone new to logic programing that the methodology ofcreating complex ontologies and cross-walks between them is not appropriatefor the web because it isn't a scalable strategy. The logic programmingmethodologies might be appropriate for closed systems and B2B applications,but they are not necessarily appropriate for a mass-market, open and chaoticsystem like the web.

I have been working on ways to make RDF usable and accessible, by using RSS 1.0 anda simple query language for RDF (I actually did a presentation at the workshop[LIB], although I didn't really think it fitted in too well withthe general logicprogramming theme). From this point of view I think that the semantic web need notstart as a web of reasoning robots; lots of gains can be made with much simplersystems, which can also perhaps provide provision for inferencing systems at alater date.

References

[OIL]OIL
http://www.ontoknowledge.org/oil

[DC] Dublin Core Metadata Element Set, Version 1.1: Reference Description
http://purl.org/dc/documents/rec-dces-19990702.htm

[DOT]DC-DOT
http://www.ukoln.ac.uk/metadata/dcdot/

[RSS] RDF Site Summary 1.0 Specification, Release Candidate 1
http://www.egroups.com/files/rss-dev/specification.html

[RSS-Eric]RSS 1.0 and its Taxonomy Module: bringing metadata back into RSS
http://www.egroups.com/files/rss-dev/Presentations/Rss-Luxembourg.zip

[SWTW]Semantic Web Technologies Workshop
http://www.cordis.lu/ist/ka3/iaf/swt_workshop.htm

[ABC]ABC Strawman
http://www.ilrt.bris.ac.uk/discovery/harmony/docs/abc/abc_draft.html

[LIB]Querying RSS1.0 with SQUISH
0.html
Based on: RDF: Extending and Querying RSS channels
http://ilrt.org/discovery/2000/11/rss-query/