Archive

Archive for the ‘Semantic Web’ Category

Quick Play with Cayley Graph DB and Ordnance Survey Linked Data

June 29, 2014 2 comments

Earlier this month Google announced the release of the open source graph database/triplestore Cayley. This weekend I thought I would have a quick look at it, and try some simple queries using the Ordnance Survey Linked Data.

Cayley is written in Go, so first I had to download and install that. I then downloaded Cayley from here. As an initial experiment I decided to use the Boundary Line Linked Data, and you can grabbed the data as n-triples here. I only wanted a subset of this data – I didn’t need all of the triplestores storing the complex boundary geometries for my initial test so I discarded the files of the form *-geom.nt and the files of the form county.nt, dbu.nt etc. (these are the ones with the boundaries in). Finally I put the remainder of the data into one file so it was ready to load into Cayley.

It is very easy to load data into Cayley – see the getting started section part on the Cayley pages here. I decided I wanted to try the web interface so loading the data (in a file called all.nt) was a simple case of typing:

./cayley http –dbpath=./boundaryline/all.nt

Once you’ve done this point your web browser to http://localhost:64210/ and you should see something like:

Screen Shot 2014-06-29 at 10.43.35

 

One of the things that will first strike people used to using RDF/triplestores is that Cayley does not have a SPARQL interface, and instead uses a query language based on Gremlin. I am new to Gremlin, but seems it has already been used to explore linked data – see blog from Dan Brickley from a few years ago.

The main purpose of this blog post is to give a few simple examples of queries you can perform on the Ordnance Survey data in Cayley. If you have Cayley running then you can find the query language documented here.

At the simplest level the query language seems to be an easy way to traverse the graph by starting at a node/vertex and following incoming or outgoing links. So to find All the regions that touch Southampton it is a simple case of starting at the Southampton node, following a touches outbound link and returning the results:

g.V(“http://data.ordnancesurvey.co.uk/id/7000000000037256“).Out(“http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches“).All()

Giving:

Screen Shot 2014-06-29 at 10.56.15

If you want to return the names and not the IDs:

g.V(“http://data.ordnancesurvey.co.uk/id/7000000000037256“).Out(“http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches“).Out(“http://www.w3.org/2000/01/rdf-schema#label“).All()

Screen Shot 2014-06-29 at 10.58.30

You can used also filter – so to just see the counties bordering Southampton:

g.V(“http://data.ordnancesurvey.co.uk/id/7000000000037256“).Out(“http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches“).Has(“http://www.w3.org/1999/02/22-rdf-syntax-ns#type“,”http://data.ordnancesurvey.co.uk/ontology/admingeo/County“).Out(“http://www.w3.org/2000/01/rdf-schema#label“).All()

Screen Shot 2014-06-29 at 11.01.17

 

The Ordnance Survey linked data also has spatial predicates ‘contains’, ‘within’ as well as ‘touches’. Analogous queries can be done with those. E.g. find me everything Southampton contains:

g.V(“http://data.ordnancesurvey.co.uk/id/7000000000037256“).Out(“http://data.ordnancesurvey.co.uk/ontology/spatialrelations/contains“).Out(“http://www.w3.org/2000/01/rdf-schema#label“).All()

So after this very quick initial experiment it seems that Cayley is very good at providing an easy way of doing very quick/simple queries. One query I wanted to do was find everything in, say, Hampshire – the full transitive closure. This is very easy to do in SPARQL, but in Cayley (at first glance) you’d have to write some extra code (not exactly rocket science, but a bit of a faff compared to SPARQL). I rarely touch Javascript these days so for me personally this will never replace a triplestore with a SPARQL endpoint, but for JS developers this tool will be a great way to get started with and explore linked data/RDF. I might well brush up on my Javascript and provide more complicated examples in a later blog post…

 

 

 

Visualising the Location Graph – example with Gephi and Ordnance Survey linked data

March 28, 2014 2 comments

This is arguably a simpler follow up to my previous blog post, and here I want to look at visualising Ordnance Survey linked data in Gephi. Now Gephi isn’t really a GIS, but it can be used to visualise the adjacency graph where regions are represented as nodes in a graph, and links represent adjacency relationships.

The approach here will be very similar to the approach in my previous blog. The main difference is that you will need to use the Ordnance Survey SPARQL endpoint and not the DBpedia one. So this time in the Gephi semantic web importer enter the following endpoint URL:

http://data.ordnancesurvey.co.uk/datasets/os-linked-data/apis/sparql

The Ordnance Survey endpoint returns turtle by default, and Gephi does not seem to like this. I wanted to force the output as XML. I figured this could be done in the using a ‘REST parameter name’ (output) with value equal to xml. This did not seem to work, so instead I had to do a bit of a hack. In the ‘query tag…’ box you will need to change the value from ‘query’ to ‘output=xml&query’. You should see something like this in the Semantic Web Importer now:

Screen Shot 2014-03-28 at 11.28.28

Now click on the query tab. If we want to, for example, view the adjacent graph for consistuencies we can enter the following query:

prefix gephi:<http://gephi.org/>
construct {
?s gephi:label ?label .
?s gephi:lat ?lat .
?s gephi:long ?long .
?s <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches> ?o .}
where
{
?s a <http://data.ordnancesurvey.co.uk/ontology/admingeo/WestminsterConstituency> .
?o a <http://data.ordnancesurvey.co.uk/ontology/admingeo/WestminsterConstituency> .
?s <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches> ?o .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .
?s <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .
}

and click ‘run’. To visualise the output you will need to follow the exact same steps mentioned here (remember to recast the lat and long variables to decimal).

If we want to view adjacency of London Boroughs then we can do this with a similar query:

prefix gephi:<http://gephi.org/>
construct {
?s gephi:label ?label .
?s gephi:lat ?lat .
?s gephi:long ?long .
?s <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches> ?o .}
where
{
?s a <http://data.ordnancesurvey.co.uk/ontology/admingeo/LondonBorough> .
?o a <http://data.ordnancesurvey.co.uk/ontology/admingeo/LondonBorough> .
?s <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches> ?o .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .
?s <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .
}

When visualising you might want to change the scale parameter to 10000.0. You should see something like this:

Screen Shot 2014-03-28 at 11.40.18

So far so good. Now imagine we want to bring in some other data – recall my previous blog post here. We can use SPARQL federation to bring in data from other endpoints. Suppose we would like to make the size of the node represent the ‘IMD rank‘ of each London Borough…we can do with by bringing in data from the Open Data Communities site:

prefix gephi:<http://gephi.org/>
construct {
?s gephi:label ?label .
?s gephi:lat ?lat .
?s gephi:long ?long .
?s gephi:imd-rank ?imdrank .
?s <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches> ?o .}
where
{
?s a <http://data.ordnancesurvey.co.uk/ontology/admingeo/LondonBorough> .
?o a <http://data.ordnancesurvey.co.uk/ontology/admingeo/LondonBorough> .
?s <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches> ?o .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .
?s <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .
SERVICE <http://opendatacommunities.org/sparql> {
?x <http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?s .
?x <http://opendatacommunities.org/def/IMD#IMD-score> ?imdrank . }
}

You will need to recast the imdrank as an integer for what follows (do this using the same approach used to recast the lat/long variables). You can now use Gephi to resize the nodes according to IMD rank. We do this using the ranking tab:

Screen Shot 2014-03-28 at 11.50.43

You should now see you London Boroughs re-sized according to their IMD rank:

Screen Shot 2014-03-28 at 11.51.51

turning the lights off and adding some labels we get:

Screen Shot 2014-03-28 at 12.04.27

Tell Me About Hampshire – Linking Government Data using SPARQL federation 2

March 23, 2014 3 comments

Yesterday I blogged about how to do some SPARQL federated queries across various government websites, and this blog is a continuation of this with a different example. In this blog I give an example query which basically say ‘tell me stuff about Hampshire‘. I do this by linking up data from Ordnance Survey, the Office of National Statistics, the Department of Communities and Local Government and Hampshire County Council. This query is really just for illustrative purposes, but I want to ask ‘for all districts in Hampshire find me the index of multiple deprivation rank, the change order and operative date for that district, the website for the local authority of that district along with the addresses of parcels of land where it is planned to build new dwellings. To achieve this I need to take data from several sources and use SPARQL federation. Here is the query that answers my question. First I query Ordnance Survey linked data to find districts in Hampshire, and I then pass these districts to three other linked data services to retrieve the relevant information. To try this example head over to the Ordnance Survey SPARQL endpoint and copy/paste the following:

select ?districtname ?imdrank ?changeorder ?opdate ?councilwebsite ?siteaddress
where
{?district <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/within>
   <http://data.ordnancesurvey.co.uk/id/7000000000017765> .
  ?district a <http://data.ordnancesurvey.co.uk/ontology/admingeo/District> .
  ?district <http://www.w3.org/2000/01/rdf-schema#label> ?districtname .
 SERVICE <http://opendatacommunities.org/sparql> {
 ?s <http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?district .
?s <http://opendatacommunities.org/def/IMD#IMD-rank> ?imdrank .
 ?authority <http://opendatacommunities.org/def/local-government/governs> ?district .
 ?authority <http://xmlns.com/foaf/0.1/page> ?councilwebsite .
 }
 ?district <http://www.w3.org/2002/07/owl#sameAs> ?onsdist .
 SERVICE <http://statistics.data.gov.uk/sparql> {
 ?onsdist <http://statistics.data.gov.uk/def/boundary-change/originatingChangeOrder>
          ?changeorder .
 ?onsdist <http://statistics.data.gov.uk/def/boundary-change/operativedate>
          ?opdate .
 }
 SERVICE <http://linkeddata.hants.gov.uk/sparql> {
   ?landsupsite <http://data.ordnancesurvey.co.uk/ontology/admingeo/district> ?district .
   ?landsupsite a <http://linkeddata.hants.gov.uk/def/land-supply/LandSupplySite> .
   ?landsupsite
<http://www.ordnancesurvey.co.uk/ontology/BuildingsAndPlaces/v1.1/BuildingsAndPlaces.owl#hasAddress>
   ?siteaddress .
   }
}

Happy SPARQLing…

Federating SPARQL Queries Across Government Linked Data

March 22, 2014 2 comments

SPARQL 1.1 introduces the idea of federated SPARQL queries – this enables you to execute part of your SPARQL query against a remote SPARQL endpoint. I thought I’d provide some examples of using this feature in government linked open data.

The Environment Agency has published a number of its open data offerings as linked data which you can explore here. One of these datasets is the Bathing Water Quality Data, and you can explore this via their SPARQL endpoint. I won’t go into this data in too much detail as it is not my area of expertise. The Environment Agency has created 5-star open data by linking their data to both Ordnance Survey and Office of National Statistics linked data. Look at linked data for the Eastoke bathing water site and you’ll see it linked to Havant and Hampshire in the Ordnance Survey data. A relatively straight forward SPARQL query will get you  a list of bathing waters, their name and the district they are in:

select ?x ?name ?district
where {
?x a <http://environment.data.gov.uk/def/bathing-water/BathingWater> .
?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .
?x <http://statistics.data.gov.uk/def/administrative-geography/district> ?district .}

Now suppose we just want a list of bathing water areas in South East England – how would we do that? This is where SPARQL federation comes in. The information about which European Regions districts are in is held in the Ordnance Survey linked data. If you hop over the the Ordnance Survey SPARQL endpoint explorer you can run the following query to find all districts in South East England along with their names (please see a previous blog post for information about simple spatial queries):

select ?district ?districtname
where
{?district <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/within>+
   <http://data.ordnancesurvey.co.uk/id/7000000000041421> .
  ?district <http://www.w3.org/2000/01/rdf-schema#label> ?districtname .}

Using the SERVICE keyword we can bring these two queries together to find all bathing waters in South East England, and the districts they are in:

select ?x ?name ?districtname
where {
?x a <http://environment.data.gov.uk/def/bathing-water/BathingWater> .
?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .
?x <http://statistics.data.gov.uk/def/administrative-geography/district> ?district .
SERVICE <http://data.ordnancesurvey.co.uk/datasets/boundary-line/apis/sparql>
{ ?district <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/within>+
   <http://data.ordnancesurvey.co.uk/id/7000000000041421> .
   ?district <http://www.w3.org/2000/01/rdf-schema#label> ?districtname .}
}
order by ?districtname

Now supposed we want to know the sediment types of the bathing waters in Havant. We can find this with the following query:

select ?x ?name ?sediment
where {
?x a <http://environment.data.gov.uk/def/bathing-water/BathingWater> .
?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .
?x <http://statistics.data.gov.uk/def/administrative-geography/district> <http://data.ordnancesurvey.co.uk/id/7000000000017297> .
?x <http://environment.data.gov.uk/def/bathing-water/sedimentTypesPresent> ?sediment .
}

We can again use the SPARQL federation to do something more interesting. The follow query returns both sediment types in bathing waters in Havant together with sediment types of bathing water in regions that touch Havant:

select ?x ?name ?sediment
where {
{
?x a <http://environment.data.gov.uk/def/bathing-water/BathingWater> .
?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .
?x <http://statistics.data.gov.uk/def/administrative-geography/district> <http://data.ordnancesurvey.co.uk/id/7000000000017297> .
?x <http://environment.data.gov.uk/def/bathing-water/sedimentTypesPresent> ?sediment .
}
UNION
{
SERVICE <http://data.ordnancesurvey.co.uk/datasets/boundary-line/apis/sparql>
{ ?district <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches>
   <http://data.ordnancesurvey.co.uk/id/7000000000017297> .
}
?x a <http://environment.data.gov.uk/def/bathing-water/BathingWater> .
?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .
?x <http://statistics.data.gov.uk/def/administrative-geography/district> ?district .
?x <http://environment.data.gov.uk/def/bathing-water/sedimentTypesPresent> ?sediment .
}
}

Another great government open data resource is the Open Data Communities site. They have a SPARQL endpoint here. This federated SPARQL query (analogous to those above) can be used, for example, to find the Index of Multiple Deprivation Environment rank for Havant and surrounding districts. This works are follows:

select ?s ?imdrank
where
{
{
?s <http://purl.org/linked-data/sdmx/2009/dimension#refArea> <http://data.ordnancesurvey.co.uk/id/7000000000017297> .
?s <http://opendatacommunities.org/def/IMD#IMD-environment-rank> ?imdrank .
}
UNION
{
SERVICE <http://data.ordnancesurvey.co.uk/datasets/boundary-line/apis/sparql>
{ ?district <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches>
<http://data.ordnancesurvey.co.uk/id/7000000000017297> .
}
?s <http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?district .
?s <http://opendatacommunities.org/def/IMD#IMD-environment-rank> ?imdrank .
}
}

I will now leave it as an exercise to the reader to figure out how these all combine so you can ask for ‘all bathing waters in Havant and surrounding areas, and the IMD environment ranks of the areas containing those bathing waters’ – it is possible!

Please note that federated SPARQL can be slow…happy SPARQLing.

Categories: linked data, Semantic Web

Ordnance Survey Linked Data: The Search API

September 24, 2013 Leave a comment

Please note in some of the examples below I have been having trouble with wordpress ‘correcting’ quote marks in my text. If you find the queries don’t work you may need to manually replace the copied quote marks from below with new ones via your keyboard. Hope that makes sense.

One of the biggest improvements to the new Ordnance Survey Linked Data site is the much improved search functionality. You can either search over a specific dataset (e.g. the Code-Point(R) Open linked data) or over all the combined datasets. I will first give some examples of using the Boundary-Line(TM) search API.

The Boundary-Line search API explorer can be found here. The simplest use of this search API is to enter some text for the name of an administrative area or the GSS code (the ONS identifier for a statistical region) into the search box. To get started enter Southampton into the query box. You will see that the search results are returned in JSON (RSS and Atom are additional options). Results contain the URI of the entities that match your queries along with a number of useful attributes.

Note that the Request box shows the actual GET request that is being done, and you can use this GET request in your applications. Now try searching for a GSS code, enter E06000045 into the query box. You should see results for the City of Southampton returned. So far so straight forward. The search function also allows for wildcards in search, for example in the Query box type:

label:Southa*

It is also possible to narrow search results by type. Recall that the search for Southampton returned both Westminster constituencies and a unitary authority with Southampton in their name. To just find the Westminster constituencies search for the following:

label:Southampton AND type:”http://data.ordnancesurvey.co.uk/ontology/admingeo/WestminsterConstituency

The search API also allows you to perform a number of simple spatial queries. The first of these are bounding box queries. For the Boundary-Line data you can specify a bounding box, and find all the administrative regions whose centroids lie within that bounding box. The bounding box can be expressed in eastings and northings. For example try the following:

easting:[371000 TO 374000] AND northing:[161000 TO 164500]

in the query box.

The answers can be narrowed down further by specifying the type of object that should be returned. For example to just get the civil parishes in this bounding box try the following:

easting:[371000 TO 374000] AND northing:[161000 TO 164500] AND type:”http://data.ordnancesurvey.co.uk/ontology/admingeo/CivilParish

Another type of simple spatial query we can do in the search API is ‘find me all feature of a kind type within a certain radius of a given point’. Here the point can be specified in either lat/long or easting/northing. To find all of the civil parishes in a 50 km radius of the point with easting 442339 and northing 112882 put:

type:”http://data.ordnancesurvey.co.uk/ontology/admingeo/CivilParish

into the query box and put the appropriate values in the easting and northing boxes, followed by a 50 in the radio search box. If, for example, you want to perform this query again but find civil parishes and districts enter the following into the query box:

type:”http://data.ordnancesurvey.co.uk/ontology/admingeo/CivilParish OR type:”http://data.ordnancesurvey.co.uk/ontology/admingeo/District

and try the query again.

These are just some simple examples of the search API. The full documentation is here.

Interview in ODBMS.org

September 1, 2013 Leave a comment

I did a recent interview with my friend Prof. Roberto V. Zicari over at odbms.org. You can read it here. (Please forgive the few typos that crept into some of my answers).

Categories: linked data, Semantic Web

Experiments with schema.org

August 26, 2013 7 comments

Directly quoting from schema.org:

This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages.

Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.

My favourite band, New Model Army, are touring later this year so I thought creating a website for their UK tour dates would be a good way to experiment with schema.org markup. First off we have a webpage for the UK leg of their winter tour:

http://www.johngoodwin.me.uk/event/newmodelarmy-uk-wintertour-2013

which is a kind of Music Event, and this is related to a number of sub events such as:

http://www.johngoodwin.me.uk/event/newmodelarmy-uk-Aberdeen-20131112

which is also a kind of Music Event. Each of these events are related to a venue, via the ‘location‘ predicate:

http://www.johngoodwin.me.uk/venue/The-Garage-Aberdeen

I have related each of the venues to locations in the Ordnance Survey Linked Data using the contained in predicate.

For each of the events I have included further markup such as the start date, end date, ticket sites (via the offers predicate) and also links to other pages about the event via sameAs. Similarly I have linked pages about each venue to other pages about that venue via sameAs.

All of the markup is done using RDFa. You can view the machine readable content using the Google Structured Data Testing tool or this RDFa Parser. Here are some examples:

New Model Army UK Winter Tour 2013: rich snippets, RDFa distiller

Aberdeen tour date: rich snippets, RDFa distiller

The Garage Aberdeen (venue): rich snippets, RDFa distiller

Mapping is provided by OS OpenSpace.

Extra: someone asked me what the experiment was. Maybe not much of an experiment really, but simply put I’m curious to find out what happens should these pages get picked up by Google et al., and curious to see what they do with them.

Follow

Get every new post delivered to your Inbox.

Join 2,189 other followers