Genealogy and Linked Data (revisited)

Posted in Semantic Web with tags , , , , , on November 4, 2009 by john225

I now have a new improved version of my family tree up as linked data here. To produce this family tree I converted the original family tree that my parents created using a perl script that takes GEDCOM to RDF. I then manually cleaned up the RDF to get the URIs in a form that I wanted.

This resulted in an RDF file giving information about parent/child, sibling and spouse relations for my family members. The vocabularies (or ontologies) used for this were FOAF, BIO and RELATIONSHIP.

I was interested in displaying more than just parent/child, sibling and spouse relationships and decided a simple extension could be to have grandparent/grandchild and ancestor/descendant information. To compute this information I used the Protege 4 OWL 2 editor. To compute grandparent information I used a property of OWL 2 called “property chains“. The property chain for computing grandchild relationships from child ones was straightforward:

childOf o childOf -> grandChildOf

(or for those who prefer rules: childOf(?x,?y) , childOf(?y,?z) -> grandChildOf(?x,?z) )

This simply states that “the child of the child of someone is a grandchild of that someone”.

The ancester information was event more straightforward to compute. Here we just make the property parentOf a subproperty of ancesterOf and then make ancesterOf a transitive property.

Given the two axioms above we can then let the OWL reasoner in Protege 4 do all the hard work and compute the implicit relationships based on the explicitly stated ones. Anyone interested in using OWL to compute more family relations should read this paper by Robert Stevens.

So I now have some RDF containing parent/child, ancester/descendant, sibling and spouse relationships. Also in this data are notions of family groups and information about birth [1] and death events. These events contain information about dates and places (given as text) of birth/death. Having this information as literals is not very interesting as it means I then have to go and use Google (or similar) to find additional information about the dates/places. To get round this (and create some links in my linked data) I decided to connect the places of birth/death to the corresponding resource in DBpedia (an RDF version of wikipedia) and do similarly for the dates [2]. An example of this can be seen here http://www.johngoodwin.me.uk/family/event1917. This means I can now find additional information about a persons place of death/birth by following the links in the data if I should choose to do so. To link birth/death events to dates/place I used the event ontology.

In order to host the data as linked data I used the Talis Platform and the Paget (2) PHP library.

There is a SPARQL endpoint for the data here. We can use this to query for my uncles as follows:

PREFIX rel: <http://purl.org/vocab/relationship/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX family: <http://www.johngoodwin.me.uk/family/>
select ?uncle ?name
where
{
family:I0243 rel:childOf ?parent .      ( finds my parents)
?parent rel:siblingOf ?uncle .          (finds my parents siblings)
?uncle foaf:gender “male” .          ( find the male siblings)
?uncle foaf:name ?name .         (this returns their names)
}

My next plan is to build some mash-ups using this data. Such a mash-up could use resources on the web of linked data to find famous people born in the same place/year as various family members, identify BBC programmes that are about said places etc. etc.

Now all I need to do is find a long lost relative who is also into genealogy and linked data so I can connect some nodes…what are the chances???

 

 

 

[1] – for obvious privacy reasons no birth information is given for people still living.

[2] – this was a fairly tedious manualish process – but some scripting helped.

 

 

Reblog this post [with Zemanta]

/location /location /location – exploring Ordnance Survey Linked Data

Posted in Uncategorized with tags , , , , on October 25, 2009 by john225

Ordnance Survey now have some linked data available here. This data includes information about the local authority and voting regions of Great Britain. Included in this data are the names (and official names as set out by Statutory Instrument where applicable), census code and area in hectares of the region. Also included are topological relationships between the administrative areas. These allow users to do qualitative spatial queries on the data.  So for example, the data contains information about which regions are contained by other regions. Bordering information is given between regions of the same type (e.g. between consituencies). There is one exception to this where additional bordering information is given between counties, unitary authorities, districts and metropolitan districts [1].

So what can you do with the data? First you can simply explore it in your browser. For example look at the URI for The City of Southampton:  http://data.ordnancesurvey.co.uk/id/7000000000037256. As you can see this contains a list of the regions Southampton borders, contains and overlaps [2].

It is possible to perform free text searches on the data here. The results are returned as an RSS feed. Try it out – type the name of the region you are looking for in the first search box. Typing in Southampton gives three results: the unitary authority The City of Southampton and two westminster constituencies Southampton, Test and Southampton, Itchen.

The interesting queries, however, are done at the SPARQL endpoint located here.  I’ll give a handful of SPARQL queries to get you going. You will need to add this at the top of each query:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/>
PREFIX spatialrelations: <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/>

So first of all I can ask for a list of the types of the things in the data:

select distinct ?type
where
{
?a rdf:type ?type .
}

Seeing the data mentions Unitary Authorities I can ask for a list of all unitary authorities and their official names:

select ?a ?name
where
{
?a rdf:type admingeo:UnitaryAuthority .
?a admingeo:hasOfficialName ?name .
}

I can now issue a topological query: find me all westminster consituencies contained by the unitary authority Southampton:

select ?a ?name
where
{
<http://data.ordnancesurvey.co.uk/id/7000000000037256> spatialrelations:contains ?a .
?a rdf:type admingeo:WestminsterConstituency .
?a foaf:name ?name .
}

or find me the regions (and their names) that contain the district of Winchester:

select ?a ?name
where
{
?a spatialrelations:contains
<http://data.ordnancesurvey.co.uk/id/7000000000017754> .
?a foaf:name ?name .
}

This query finds me the regions (and their name and type) that border Winchester:

select ?a ?name ?type
where
{
<
http://data.ordnancesurvey.co.uk/id/7000000000017754 > spatialrelations:borders ?a .
?a rdf:type ?type .
?a foaf:name ?name .
}

One final note for people wanting to do mashups with this data. If you wish to see the boundary on a map then the area code and unit ID attributes can be used in the OS OpenSpace API to display the boundary.

So for example, for Southampton (http://data.ordnancesurvey.co.uk/id/7000000000037256) the area code is UTA (for unitary authority) and the unit ID is 37256. These values can be used as follows:

/*here we set-up the our variable called ‘boundaryLayer’ with the strategies that we require.
In this case, it is its ID and type i.e. Unitary Authority */
boundaryLayer = new OpenSpace.Layer.Boundary(“Boundaries”, {
strategies: [new OpenSpace.Strategy.BBOX()],
admin_unit_ids: ["37256"],
area_code: ["UTA"]
});
//then we add the bounadry to the map
osMap.addLayer(boundaryLayer);
//this effectively refreshes the map, so that the boundary is visible
osMap.setCenter(osMap.getCenter());

to display the Southampton boundary using the OS OpenSpace API. See http://openspace.ordnancesurvey.co.uk/openspace/support.html for more details. An example of the output can be seen here.

Happy SPARQLing…

[1] – if you are (rightly) confused about the geography of Great Britain then there is a handy glossary here.

[2] – the regions that contain Southampton will be added shortly.

Reblog this post [with Zemanta]

Mash-ups are so last year…

Posted in Semantic Web with tags , , , , , , , , on June 14, 2009 by john225

Mash-ups are cool – ever since Ordnance Survey, Google, Yahoo! and Microsoft launched there various mapping APIs we’ve seen quite a few of them. This weekend I’ve been experimenting with creating a map mesh-up. I’m not sure if there is any strict definition of a mesh-up, but Kinglsey Idehen gave a pretty good account of mesh-up versus mash-up in this blog entry. I’ll leave it up to you the reader to decide if what I have done is truly a mesh-up, but I like to think I did the best I could given the current semantic web infrastructure.

Given my day job I thought it would be cool to do some kind of map mesh-up around regions in the UK (however being a typical researcher I’ve only done four locations so far just to prove the concept). The new version of Ordnance Survey’s mapping API (OS OpenSpace) provides easy API calls to let you display the boundaries of administrative regions in Great Britain (except for civil parishes and communities). This made OS OpenSpace a no brainer for this mesh-up (and of course the superior cartography is an added bonus :) ). In order to process the RDF I used the ARC PHP library.

I’ll now explain how I did each of the various mesh-ups starting with the most straightforward one – the basic map with region information (e.g Southampton). This basic map mesh-up was made using the Ordnance Survey RDF for administrative units in Great Britain. This is hosted as linked data on the rkbexplorer site and has a SPARQL endpoint. This RDF data contains topological relations and name information for the administrative regions in Great Britain. For example, take a look at Southampton. For a given region the ARC library was used to issue a SPARQL query to find the bordering regions, contained regions and containing regions along with the area of the region. The result of these queries was then displayed in the map information pop-out. So to find the bordering regions for Southampton the query is very straightforward:

SELECT ?border
WHERE
{
<http://os.rkbexplorer.com/id/osr7000000000037256> admingeo:borders ?border .
}

The family tree mesh-up was done in a similar way. I documented in a previous blog entry how I had started converting my family tree into RDF. In fact since my last blog entry I now have that data available as linked data (this was done using Paget, for example: http://www.johngoodwin.me.uk/family/I0002). The data was stored on the Talis Platform and again ARC was used to do a SPARQL query. You may notice for the Birmingham family tree map I list members of my family that were born in Birmingham and died in Birmingham. I also list relatives that were born in areas bordering Birmingham. I was able to do this because my family tree data was connected to the Ordnance Survey boundaries RDF. So from the OS data I could find all areas bordering Birmingham, and then return all family members born in these areas from my family tree data. Because the data was linked over the web is was easy to do this in a very simple SPARQL query:

SELECT ?s ?name
WHERE
{

?place admingeo:borders <http://os.rkbexplorer.com/id/osr7000000000000018>.

?s dbpedia-owl:birthplace ?place .

?s foaf:name ?name
}

The BBC mesh-ups are arguably more interesting. The BBC recently announced a SPARQL endpoint for its RDF data. An example of the queries you can do are given here. The observant amongst you will notice that the BBC data does provide location information, but the URIs for the location are currently taken from DBpedia and not from the Ordnance Survey data. To get round this I used a new service called sameas.org. The sameas.org service offers a service that helps you to find co-references between different data sets. You can use this to look up other sources that represent your chosen URI. For example http://os.rkbexplorer.com/id/osr7000000000037256 has the equivalent URIs given here.

However, I didn’t want to hard code the equivalent URIs in my code. I’ll explain what I did using the Southampton example. First I issued a call to sameas.org to look up coreferences for the Ordnance Survey Southampton URI. I returned the URIs as an RDF file and used the ARC library to parse the RDF file for equivalent resources from dbpedia. I then issued a SPARQL query using the dbpedia URIs to return the artist/programme information from the BBC SPARQL endpoint.  So in a nutshell:

  1. take Ordnance Survey URI
  2. issue a look-up for that URI to sameas.org
  3. return URIs in an RDF file
  4. parse the RDF file using ARC for dbpedia URIs
  5. issue query to BBC endpoint using the dbpedia URIs.

The revyu mesh-up was done in a similar way.

I hope this all made sense. Comments and questions welcome – though please no comments on my HTML/web design being very 1995. It’s all about the RDF for me  :)

The mesh-up is here http://www.johngoodwin.me.uk/boundaries/meshup.html

Reblog this post [with Zemanta]

Genealogy and the Semantic Web 2

Posted in Semantic Web with tags , , , , , , , , on April 18, 2009 by john225

I’ve been busy converting my parents hard work on their  family tree into RDF. I blogged about initial attempts here. It’s far from finished, but at around 500,000 triples already it looks like it’s going to be a lot of RDF!

You can view the RDF (as it is) here, but seeing as RDF is for machines a more human friendly version can be browsed here. So far I’ve been concentrating on linking places of death and birth to various other datasets include geonames, DBpedia, Freebase and Ordnance Survey (though there still a fair few places to link).

To be done:

1) Finish connecting all the places.

2) Sort date formats out.

3) Turn into linked data with dereferencable URIs and content negotation.

A more detailed write up when it’s all finished…

Reblog this post [with Zemanta]

Linked Ontology Web

Posted in Semantic Web with tags , , , , , on April 1, 2009 by john225

I’ve been thinking a bit at work about how we should publish OWL ontologies on the semantic web, and if this can be done in a way analagous to the linked data web. I want to quickly blog my thoughts before I head to the pub :)

There are currently a number of great tutorials on how to publish RDF as linked data. Without going into too much detail every URI in the published RDF is dereferencable, which very roughly speaking means that it returns some information when your visit it. A URI such as http://os.rkbexplorer.com/id/osr7000000000037256 will return some RDF/XML if they client requests RDF or some HTML if you are visiting from, say, a web browser. There are a number of ways to modularise the data, but typically the information returned on a URI will be the result of a SPARQL describe query or will be the triples where the URI appears as either the subject or object of that triple.  Apologies for the quick and dirty description of linked data, but more information can be found in the previous linked tutorials.

RDF vocabularies and ontologies are typically just published on the web as dumps of RDF/XML and only in some cases are the classes and properties dereferencable. In other words the whole file is simply uploaded in bulk. There are guidelines for publishing RDF vocabularies here.

It seems to me that this will be inadequate for publishing larger and more complex ontologies on the web. Do we need a way to publish large ontologies on the web in a linked data stylee? I think it would certainly be useful.

The dereferencable URI bit is easy enough and can be done as per linked data. For example,  the HTML page for the URI http://www.ontology.com/River could provide a description of the class River using some controlled natural language. The question then is what to put in the OWL/RDF file that is retrieved for the class River  from that URI? What is the class or OWL equivalent of a “SPARQL describe”?

The problem to me seems to be similar to the problem of ontology modularisation discussws here. Suppose I am building an ontology about animals and I need to use the concept Farm and Zoo from a building ontology. When I import the class Zoo how am I sure that I include all the relevant axioms to describe a Zoo, and only the relevant axioms to describe a Zoo? I’ll not describe the how’s here as it has been discussed in this tutorial (and numerous supporting papers). The point is that there are tools (try one online) for extracting the correct axioms from an ontology for describing a given class. Should these tools be used in the linked data community as a means to enable us to publish detailed ontologies on the linked data web? So to be clear:

1) We publish the full OWL file on the web (in an analagous way to a dump of RDF data in the linked data web) – this would be, say, the complete buildings ontology.

2) We make URI derefencable and use content negotiation to retrieve either RDF/XML or HTML as required as we do for linked data.

3) When we deference a class URI (e.g. www.ontology.com/Zoo) the axioms contained in the RDF file returned for “Zoo” are determined by the ontology modularisation tools described here rather than some perhaps more naive approach (where as for linked data this would be, say, a SPARQL describe on the URI).

I’d love to know if there are any links to similar work and to know what people think about this proposal.

Reblog this post [with Zemanta]

The Guardian Open Platform and Data Store

Posted in Semantic Web with tags , , , , , , , on March 10, 2009 by john225

Today the Guardian launched their Open Platform. According to their website “The Open Platform is the suite of services that make it possible for our partners to build applications with the Guardian.” The Open Platform contains two products: The Content API and the Data Store. The content API provides a REST (-ish apparently) mechanism to query a vast amount of documents and content from the Guardian. The Data Store is “a collection of important and high quality data sets curated by Guardian journalists”. This is all very cool!

Currently the Data Store provides a large number of datasets on subjects as diverse as military spending, carbon emissions and university rankings. Currently the data is provided as spreadsheets that have been uploaded to Google Docs allowing easy access. Again all very cool.

This work will obviously result in a lot of cool applications and mash-ups. However, the semantic web geek in me can’t help this that mash-ups are so last week :) It seems obvious (?) that the next step for the Guardian Data Store is to provide the data in RDF and host it in as linked data. These datasets would be a fantastic addition to the linked data web, allowing mesh-ups where the data from various linked data sources can be fused in different ways.

Time to convince the Guardian that this is the next logical step for this, already great, piece of work.

Reblog this post [with Zemanta]

The Non Golden Rules of Geo

Posted in Semantic Web with tags , on February 26, 2009 by john225

Nice blog post from Yahoo! explaning why geography (especially GB geography) is so complicated. The post highlights the six non golden rules of geo:

  1. Any attempt to codify a series of geo rules into a formal, one size fits all, taxonomy will fail due to Rule 2.
  2. Geo is bizarre, odd, eclectic and utterly human.
  3. People will in the main agree with Rule 1 with the exception of the rules governing their own region, area or country, which they will think are perfectly logical.
  4. People will, in the main, think that postal, administrative and colloquial hiearachies are one and the same thing and will overlap.
  5. Taking Rule 4 into account, they will then attempt to codify a one size fits all geo taxonomy.
  6. There is no Rule 6, see Rule 1.

I think this could well explain the headache I am getting trying to write an ontology of GB geographies.

Reblog this post [with Zemanta]

SPARQL your way to a Stupor

Posted in Semantic Web with tags , , , , , , on February 15, 2009 by john225

A few years back two intrepid explorers set out to survery all of the pubs in Southampton. Their adventure is documented here

A while back I decided to turn their page into linked data and you can see the result here. Mapping is provided by OS OpenSpace as the cartography is far nicer than that of Google or Yahoo! (IMHO of course :) ). As of today the site is linked up to Revyu (in both the RDF and HTML) where applicable. You can also now browse the RDF using the OpenLink Data Explorer, Zitgist or Tabulator. No SPARQL endpoint as yet, but maybe one day.

Reblog this post [with Zemanta]

Sindice and Twine

Posted in Semantic Web with tags , , , on February 4, 2009 by john225
Image representing Twine as depicted in CrunchBase
Image via CrunchBase

I originally posted with on Twine, but thought it was worth repeating here. Twine is a new service that provides a way for users to track, find, and share what interests you with like minded people. Twine uses semantic web technology under the bonnet and exposes all of its data as linked data using RDF.

This RDF (along with other exposed RDF) is crawled an indexed by various semantic web search engines such as Sindice.

Not only have Sindice been busy indexing lots of RDF all over the semantic web – including Twine – but they have also provided a relatively simple query engine (not SPARQL). For the semantic web geeks/command line fans amongst you this provides a way to query Twine that is arguably quicker than Twine itself (though obviously not as easy or up to date).

For example in the advance query box the following:

* <http://www.radarnetworks.com/core#wasCreatedBy> <http://www.twine.com/user/gothwin>

will find all articles I created. The more complex query:

(* <http://www.radarnetworks.com/core#wasCreatedBy> <http://www.twine.com/user/gothwin> AND * <http://www.radarnetworks.com/2007/09/12/basic#tag> “semantic web”)

finds articles I created and tagged with “semantic web”.

Have a play :)

Reblog this post [with Zemanta]

Is OWL really that hard?

Posted in Semantic Web with tags , , , , on January 26, 2009 by john225

I’ve been working with semantic web technologies for around seven/eight years now and one thing that still puzzles to some extent is why a lot of IT professionals and computer scientists find OWL (the Web Ontology Language) so hard. I do not have a computer science background (I’m a mathematician by training) but during my career I’ve had to get to grips with a number of technologies from Java to SQL and RDBMS. I’ve dabbled with OO design, UML, GIS etc.

Personally I’ve found OWL no harder to pick up that any of these, and to be honest I think OWL is considerably easier to work with than, say, developing in C++ or implementing a complex RDBMS. So I am genuinely curious – what makes people think OWL is scary (esp. compared to some of the technologies I’ve listed here)?

Reblog this post [with Zemanta]