Mash-ups are so last year…

Posted in Semantic Web with tags , , , , , , , , on June 14, 2009 by john225

Mash-ups are cool – ever since Ordnance Survey, Google, Yahoo! and Microsoft launched there various mapping APIs we’ve seen quite a few of them. This weekend I’ve been experimenting with creating a map mesh-up. I’m not sure if there is any strict definition of a mesh-up, but Kinglsey Idehen gave a pretty good account of mesh-up versus mash-up in this blog entry. I’ll leave it up to you the reader to decide if what I have done is truly a mesh-up, but I like to think I did the best I could given the current semantic web infrastructure.

Given my day job I thought it would be cool to do some kind of map mesh-up around regions in the UK (however being a typical researcher I’ve only done four locations so far just to prove the concept). The new version of Ordnance Survey’s mapping API (OS OpenSpace) provides easy API calls to let you display the boundaries of administrative regions in Great Britain (except for civil parishes and communities). This made OS OpenSpace a no brainer for this mesh-up (and of course the superior cartography is an added bonus :) ). In order to process the RDF I used the ARC PHP library.

I’ll now explain how I did each of the various mesh-ups starting with the most straightforward one – the basic map with region information (e.g Southampton). This basic map mesh-up was made using the Ordnance Survey RDF for administrative units in Great Britain. This is hosted as linked data on the rkbexplorer site and has a SPARQL endpoint. This RDF data contains topological relations and name information for the administrative regions in Great Britain. For example, take a look at Southampton. For a given region the ARC library was used to issue a SPARQL query to find the bordering regions, contained regions and containing regions along with the area of the region. The result of these queries was then displayed in the map information pop-out. So to find the bordering regions for Southampton the query is very straightforward:

SELECT ?border
WHERE
{
<http://os.rkbexplorer.com/id/osr7000000000037256> admingeo:borders ?border .
}

The family tree mesh-up was done in a similar way. I documented in a previous blog entry how I had started converting my family tree into RDF. In fact since my last blog entry I now have that data available as linked data (this was done using Paget, for example: http://www.johngoodwin.me.uk/family/I0002). The data was stored on the Talis Platform and again ARC was used to do a SPARQL query. You may notice for the Birmingham family tree map I list members of my family that were born in Birmingham and died in Birmingham. I also list relatives that were born in areas bordering Birmingham. I was able to do this because my family tree data was connected to the Ordnance Survey boundaries RDF. So from the OS data I could find all areas bordering Birmingham, and then return all family members born in these areas from my family tree data. Because the data was linked over the web is was easy to do this in a very simple SPARQL query:

SELECT ?s ?name
WHERE
{

?place admingeo:borders <http://os.rkbexplorer.com/id/osr7000000000000018>.

?s dbpedia-owl:birthplace ?place .

?s foaf:name ?name
}

The BBC mesh-ups are arguably more interesting. The BBC recently announced a SPARQL endpoint for its RDF data. An example of the queries you can do are given here. The observant amongst you will notice that the BBC data does provide location information, but the URIs for the location are currently taken from DBpedia and not from the Ordnance Survey data. To get round this I used a new service called sameas.org. The sameas.org service offers a service that helps you to find co-references between different data sets. You can use this to look up other sources that represent your chosen URI. For example http://os.rkbexplorer.com/id/osr7000000000037256 has the equivalent URIs given here.

However, I didn’t want to hard code the equivalent URIs in my code. I’ll explain what I did using the Southampton example. First I issued a call to sameas.org to look up coreferences for the Ordnance Survey Southampton URI. I returned the URIs as an RDF file and used the ARC library to parse the RDF file for equivalent resources from dbpedia. I then issued a SPARQL query using the dbpedia URIs to return the artist/programme information from the BBC SPARQL endpoint.  So in a nutshell:

  1. take Ordnance Survey URI
  2. issue a look-up for that URI to sameas.org
  3. return URIs in an RDF file
  4. parse the RDF file using ARC for dbpedia URIs
  5. issue query to BBC endpoint using the dbpedia URIs.

The revyu mesh-up was done in a similar way.

I hope this all made sense. Comments and questions welcome – though please no comments on my HTML/web design being very 1995. It’s all about the RDF for me  :)

The mesh-up is here http://www.johngoodwin.me.uk/boundaries/meshup.html

Reblog this post [with Zemanta]

Genealogy and the Semantic Web 2

Posted in Semantic Web with tags , , , , , , , , on April 18, 2009 by john225

I’ve been busy converting my parents hard work on their  family tree into RDF. I blogged about initial attempts here. It’s far from finished, but at around 500,000 triples already it looks like it’s going to be a lot of RDF!

You can view the RDF (as it is) here, but seeing as RDF is for machines a more human friendly version can be browsed here. So far I’ve been concentrating on linking places of death and birth to various other datasets include geonames, DBpedia, Freebase and Ordnance Survey (though there still a fair few places to link).

To be done:

1) Finish connecting all the places.

2) Sort date formats out.

3) Turn into linked data with dereferencable URIs and content negotation.

A more detailed write up when it’s all finished…

Reblog this post [with Zemanta]

Linked Ontology Web

Posted in Semantic Web with tags , , , , , on April 1, 2009 by john225

I’ve been thinking a bit at work about how we should publish OWL ontologies on the semantic web, and if this can be done in a way analagous to the linked data web. I want to quickly blog my thoughts before I head to the pub :)

There are currently a number of great tutorials on how to publish RDF as linked data. Without going into too much detail every URI in the published RDF is dereferencable, which very roughly speaking means that it returns some information when your visit it. A URI such as http://os.rkbexplorer.com/id/osr7000000000037256 will return some RDF/XML if they client requests RDF or some HTML if you are visiting from, say, a web browser. There are a number of ways to modularise the data, but typically the information returned on a URI will be the result of a SPARQL describe query or will be the triples where the URI appears as either the subject or object of that triple.  Apologies for the quick and dirty description of linked data, but more information can be found in the previous linked tutorials.

RDF vocabularies and ontologies are typically just published on the web as dumps of RDF/XML and only in some cases are the classes and properties dereferencable. In other words the whole file is simply uploaded in bulk. There are guidelines for publishing RDF vocabularies here.

It seems to me that this will be inadequate for publishing larger and more complex ontologies on the web. Do we need a way to publish large ontologies on the web in a linked data stylee? I think it would certainly be useful.

The dereferencable URI bit is easy enough and can be done as per linked data. For example,  the HTML page for the URI http://www.ontology.com/River could provide a description of the class River using some controlled natural language. The question then is what to put in the OWL/RDF file that is retrieved for the class River  from that URI? What is the class or OWL equivalent of a “SPARQL describe”?

The problem to me seems to be similar to the problem of ontology modularisation discussws here. Suppose I am building an ontology about animals and I need to use the concept Farm and Zoo from a building ontology. When I import the class Zoo how am I sure that I include all the relevant axioms to describe a Zoo, and only the relevant axioms to describe a Zoo? I’ll not describe the how’s here as it has been discussed in this tutorial (and numerous supporting papers). The point is that there are tools (try one online) for extracting the correct axioms from an ontology for describing a given class. Should these tools be used in the linked data community as a means to enable us to publish detailed ontologies on the linked data web? So to be clear:

1) We publish the full OWL file on the web (in an analagous way to a dump of RDF data in the linked data web) – this would be, say, the complete buildings ontology.

2) We make URI derefencable and use content negotiation to retrieve either RDF/XML or HTML as required as we do for linked data.

3) When we deference a class URI (e.g. www.ontology.com/Zoo) the axioms contained in the RDF file returned for “Zoo” are determined by the ontology modularisation tools described here rather than some perhaps more naive approach (where as for linked data this would be, say, a SPARQL describe on the URI).

I’d love to know if there are any links to similar work and to know what people think about this proposal.

Reblog this post [with Zemanta]

The Guardian Open Platform and Data Store

Posted in Semantic Web with tags , , , , , , , on March 10, 2009 by john225

Today the Guardian launched their Open Platform. According to their website “The Open Platform is the suite of services that make it possible for our partners to build applications with the Guardian.” The Open Platform contains two products: The Content API and the Data Store. The content API provides a REST (-ish apparently) mechanism to query a vast amount of documents and content from the Guardian. The Data Store is “a collection of important and high quality data sets curated by Guardian journalists”. This is all very cool!

Currently the Data Store provides a large number of datasets on subjects as diverse as military spending, carbon emissions and university rankings. Currently the data is provided as spreadsheets that have been uploaded to Google Docs allowing easy access. Again all very cool.

This work will obviously result in a lot of cool applications and mash-ups. However, the semantic web geek in me can’t help this that mash-ups are so last week :) It seems obvious (?) that the next step for the Guardian Data Store is to provide the data in RDF and host it in as linked data. These datasets would be a fantastic addition to the linked data web, allowing mesh-ups where the data from various linked data sources can be fused in different ways.

Time to convince the Guardian that this is the next logical step for this, already great, piece of work.

Reblog this post [with Zemanta]

The Non Golden Rules of Geo

Posted in Semantic Web with tags , on February 26, 2009 by john225

Nice blog post from Yahoo! explaning why geography (especially GB geography) is so complicated. The post highlights the six non golden rules of geo:

  1. Any attempt to codify a series of geo rules into a formal, one size fits all, taxonomy will fail due to Rule 2.
  2. Geo is bizarre, odd, eclectic and utterly human.
  3. People will in the main agree with Rule 1 with the exception of the rules governing their own region, area or country, which they will think are perfectly logical.
  4. People will, in the main, think that postal, administrative and colloquial hiearachies are one and the same thing and will overlap.
  5. Taking Rule 4 into account, they will then attempt to codify a one size fits all geo taxonomy.
  6. There is no Rule 6, see Rule 1.

I think this could well explain the headache I am getting trying to write an ontology of GB geographies.

Reblog this post [with Zemanta]

SPARQL your way to a Stupor

Posted in Semantic Web with tags , , , , , , on February 15, 2009 by john225

A few years back two intrepid explorers set out to survery all of the pubs in Southampton. Their adventure is documented here

A while back I decided to turn their page into linked data and you can see the result here. Mapping is provided by OS OpenSpace as the cartography is far nicer than that of Google or Yahoo! (IMHO of course :) ). As of today the site is linked up to Revyu (in both the RDF and HTML) where applicable. You can also now browse the RDF using the OpenLink Data Explorer, Zitgist or Tabulator. No SPARQL endpoint as yet, but maybe one day.

Reblog this post [with Zemanta]

Sindice and Twine

Posted in Semantic Web with tags , , , on February 4, 2009 by john225
Image representing Twine as depicted in CrunchBase
Image via CrunchBase

I originally posted with on Twine, but thought it was worth repeating here. Twine is a new service that provides a way for users to track, find, and share what interests you with like minded people. Twine uses semantic web technology under the bonnet and exposes all of its data as linked data using RDF.

This RDF (along with other exposed RDF) is crawled an indexed by various semantic web search engines such as Sindice.

Not only have Sindice been busy indexing lots of RDF all over the semantic web – including Twine – but they have also provided a relatively simple query engine (not SPARQL). For the semantic web geeks/command line fans amongst you this provides a way to query Twine that is arguably quicker than Twine itself (though obviously not as easy or up to date).

For example in the advance query box the following:

* <http://www.radarnetworks.com/core#wasCreatedBy> <http://www.twine.com/user/gothwin>

will find all articles I created. The more complex query:

(* <http://www.radarnetworks.com/core#wasCreatedBy> <http://www.twine.com/user/gothwin> AND * <http://www.radarnetworks.com/2007/09/12/basic#tag> “semantic web”)

finds articles I created and tagged with “semantic web”.

Have a play :)

Reblog this post [with Zemanta]

Is OWL really that hard?

Posted in Semantic Web with tags , , , , on January 26, 2009 by john225

I’ve been working with semantic web technologies for around seven/eight years now and one thing that still puzzles to some extent is why a lot of IT professionals and computer scientists find OWL (the Web Ontology Language) so hard. I do not have a computer science background (I’m a mathematician by training) but during my career I’ve had to get to grips with a number of technologies from Java to SQL and RDBMS. I’ve dabbled with OO design, UML, GIS etc.

Personally I’ve found OWL no harder to pick up that any of these, and to be honest I think OWL is considerably easier to work with than, say, developing in C++ or implementing a complex RDBMS. So I am genuinely curious – what makes people think OWL is scary (esp. compared to some of the technologies I’ve listed here)?

Reblog this post [with Zemanta]

Web 3.0 and Social Networks

Posted in Semantic Web with tags , , , , , , , on January 25, 2009 by john225
Icon for the FOAF (Friend of a Friend) project...
Image via Wikipedia

It is probably fair to say that FOAF is where the social web meets the semantic web. FOAF, which has been around for a while now, basically creates a machine readable graph of the sort of information you might include on sites like facebook, myspace etc. Your FOAF file can include links to people you know, your interests and other personal information. It is probably also fair to say that FOAF files were, until now, the sole property of the geek. However, this has changed, and a number of social networking sites such as livejournal, identi.ca and friend feed build FOAF files from your profile information (are there others?). At least now you don’t need to know how to edit RDF in order to have your own FOAF file. Despite that, these profiles are limited by the features offered on the respective sites.

Recently though QDOS launched a new service that makes FOAF profiles extremely easy to build. This service allows uses to create a FOAF profile generated from information contained in your last.fm, livejournal and flickr profiles as well as importing existing FOAF files. You are then given the option to manually enter other information. Furthmore, you can create a public and private view of your FOAF file. I would not recommend including information like your address, phone number or date of birth in a public FOAF file.  So what are you waiting for – go building yourself a FOAF file and join the linked data web.  My FOAF profile can be found here (my original one is maintained here).

For any linked data geeks one other interesting thing about the QDOS FOAF builder is that it has started linking music data from last.fm to the new music linked data service from the BBC. Hopefully this will be just the beginning and we’ll see links to other linked data services from DBpedia, geonames and Ordnance Survey.

Reblog this post [with Zemanta]

Genealogy and the Semantic Web

Posted in Semantic Web with tags , , , , on January 21, 2009 by john225

My folks have a keen interest in genealogy, and have built up quite an impressive family tree over the past few years. I always thought that genealogy would be a potentially cool application for the semantic web (imagine several independently constructed family trees being connected via their common nodes). It seems I wasn’t the first person to think this as this recent blog post from Dan Brickley suggests. Dan has written quick and horrid Perl script (his words) to convert the common family tree GED format to RDF/XML. A dump of my family tree in RDF/XML can be found here.

The family trees contain information about births and deaths of people. All of these events are then connected to places. I think one weakness of the current conversion script is that it represents the place information as a string, for example:

<foaf: Person rdf:about=”I1884.xml#I1884″>
<foaf:name>William Parsonage</foaf:name>
<foaf:givenname>William</foaf:givenname>
<foaf:family_name>Parsonage</foaf:family_name>
<bio:event>
<bio:Birth>
<bio:date>30 AUG 1721</bio:date>
<bio:place>Birmingham</bio:place>
</bio:Birth>
</bio:event>

It would be far more interesing if instead the place was actually connected to a URI representing the place on the semantic web. For example, in this case there are a number of URIs for Birmingham on the semantic web, for example http://os.rkbexplorer.com/id/osr7000000000000018 from Ordnance Survey or http://sws.geonames.org/2655603/about.rdf from Geonames. The RDF/XML could them be modified as follows:

<foaf: Person rdf:about=”I1884.xml#I1884″>
<foaf:name>William Parsonage</foaf:name>
<foaf:givenname>William</foaf:givenname>
<foaf:family_name>Parsonage</foaf:family_name>
<bio:event>
<bio:Birth>
<bio:date>30 AUG 1721</bio:date>
<bio:place rdf:resource=”http://os.rkbexplorer.com/id/osr7000000000000018“/>
</bio:Birth>
</bio:event>

If I get chance at the weekend I’ll see how much work it will take to add this information to the family tree RDF/XML. It will also be interesting to see if the combination of these two datasets provides extra information and insights to budding genealogists.

Reblog this post [with Zemanta]