Archive

Posts Tagged ‘linked data’

Ordnance Survey Linked Data – A Simple Spatial Query

August 2, 2013 3 comments

In this blog I thought I would  give an example of some very simple spatial queries using the Ordnance Survey Linked Data. When we first created the Ordnance Survey linked data not many RDF triplestores had spatial indexes, or in other words there was no easy way to say ‘find me all the Parishes in Hampshire‘ using a query based on the geometries of these regions. This functionality is fairly standard in GIS systems and a number of spatially enabled relational databases, and is now being increasingly implemented in RDF triplestores and other NoSQL technologies. To get round this issue it was decided that it would be very useful to precompute various topological relationships between the administrative areas described in the Boundary-Line(TM) linked data. What you will see in the data are explicit spatial relationships like touches, within and contains that relate the different administrative regions. Now the administrative geography of this country is complicated, and I’m no geographer so a complete description of it will be left for a later blog post. For now I will say that Boundary-Line contains different geographies based on national voting and some on local authorities. The spatial relationships are only includedwhere relevant – for example you won’t find explicit spatial relationships between Westminster Constituencies and Counties, but you will find them between Counties and Districts.

In the Ordnance Survey linked data you will find three types of spatial relationship: touches, within and contains:

  • touches means that two regions share a point on their boundary, but share no common points on their interior. They are adjacent/bordering. Touches relationships are typically only recorded between regions of the same type, i.e. which parish touches which parish. You won’t find a list of parishes touching counties. However, at some levels it gets a bit more complicated due to single tier local authorities (unitary authorities) and those based on a double tier (county/district). Counties and unitary authorities tessellate the country at some level, as do districts and unitary authorities.
  • contains and within are fairly self explanatory I hope.  Contains and within relationships are only stated between regions in the same geography and only explicitly stated between entities that directly contain/are within each other. What does this last part mean? In the local authorities geography counties contain districts and districts, in turn, contain parishes. You will only find explicit ‘contains’ statements between counties and districts, and between districts and parishes – you won’t find them between counties and parishes.

So now for some examples. Supposed I want to find the name of all the regions contained immediately in Hampshire. First you need to find which URI identifies Hampshire. Go to the Boundary-Line search API and search for Hampshire. You should then see that the county of Hampshire has the following URI:

http://data.ordnancesurvey.co.uk/id/7000000000017765

You can now use this in your query. Go to the SPARQL endpoint and enter the following:

select ?x ?name
where
{
  ?x <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/within> <http://data.ordnancesurvey.co.uk/id/7000000000017765> .
  ?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .
}

You will see a list of everything immediately within Hampshire, and these will all be of type district. Supposed you now want to get everything within Hampshire. This can be done easily by adding a ‘+’ at the end of the within predicate as follows:

select ?x ?name
where
{
  ?x <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/within>+ <http://data.ordnancesurvey.co.uk/id/7000000000017765> .
  ?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .
}

You now have a list of everything within Hampshire – this includes districts, wards and parishes. Now suppose you just want the parishes- you can do this by adding an extra line to the query to only match x to things of type civil parish:

select ?x ?name
where
{
  ?x <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/within>+ <http://data.ordnancesurvey.co.uk/id/7000000000017765> .
  ?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .
  ?x a <http://data.ordnancesurvey.co.uk/ontology/admingeo/CivilParish> .
}

Touches works in a similar way. Supposed you want the names of unitary authorities that touch Hampshire issue the following:

select ?x ?name
where
{
  ?x <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches> <http://data.ordnancesurvey.co.uk/id/7000000000017765> .
  ?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .
  ?x a <http://data.ordnancesurvey.co.uk/ontology/admingeo/UnitaryAuthority> .
}

Say you want to find parishes that touch Hampshire. This is where it gets complicated and the following is maybe for advanced SPARQL-wizards only. First find all of things that touch Hampshire (this will include other counties, unitary authorities and districts), then find all parishes within those regions and find which of those parishes touch parishes within Hampshire:

select distinct ?y ?name
where
{
  ?x <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches> <http://data.ordnancesurvey.co.uk/id/7000000000017765> .
  ?y <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/within> ?x .
  ?y a <http://data.ordnancesurvey.co.uk/ontology/admingeo/CivilParish> .
  ?z <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/within>+ <http://data.ordnancesurvey.co.uk/id/7000000000017765> .
  ?z a <http://data.ordnancesurvey.co.uk/ontology/admingeo/CivilParish> .
  ?z <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches> ?y .
  ?y <http://www.w3.org/2000/01/rdf-schema#label> ?name .
}

Congratulations – you now have a list of all the parishes that touch Hampshire.

Hopefully some of these queries are useful – happy SPARQLing.

Advertisement

Ordnance survey Linked Data – Simple SPARQL example

August 1, 2013 Leave a comment

Yesterday I received a request asking how to extract some simple data from the Ordnance Survey linked data using a SPARQL query. This post is not intended as a SPARQL tutorial – you can find plenty of those here.

A user wanted to know how to retrieve the name, unit-id, GSS Code, lat and long of all the unitary authorities, districts and metropolitan districts in England, Scotland and Wales as a CSV file.

To extract this information for all of the districts go to the Ordnance Survey’s Boundary-Line(TM) linked data SPARQL endpoint explorer and in the response format drop down menu select CSV. Now in the query window enter the following query:

select ?name ?lat ?long ?gss ?unit_id

where

{

?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .

?x <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .

?x <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .

?x <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> ?gss .

?x <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasUnitID> ?unit_id .

?x a <http://data.ordnancesurvey.co.uk/ontology/admingeo/District> .

}

This query selects the various attributes from the data, and the final line of the query makes sure that all of the entities selected from the data are of type District.

Scroll down the page and you should see the query response. To get the values for the district, unitary authorities and metropolitan districts we need to use a SPARQL union to gather together all of the results as follows:

select ?name ?lat ?long ?gss ?unit_id

where

{

{

?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .

?x <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .

?x <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .

?x <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> ?gss .

?x <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasUnitID> ?unit_id .

?x a <http://data.ordnancesurvey.co.uk/ontology/admingeo/District> .

}

UNION

{

?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .

?x <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .

?x <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .

?x <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> ?gss .

?x <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasUnitID> ?unit_id .

?x a <http://data.ordnancesurvey.co.uk/ontology/admingeo/MetropolitanDistrict> .

}

UNION

{

?x <http://www.w3.org/2000/01/rdf-schema#label> ?name .

?x <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .

?x <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .

?x <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> ?gss .

?x <http://data.ordnancesurvey.co.uk/ontology/admingeo/hasUnitID> ?unit_id .

?x a <http://data.ordnancesurvey.co.uk/ontology/admingeo/UnitaryAuthority> .

}

}

order by ?name

The ‘order by’ at the end of the query orders the results in alphabetical order.

To save the query results as a CSV file again make sure that response format in set to CSV and this time, before hitting the query button, make sure the ‘show raw response’ option is selected. Now hit the query button and you should be given the option to save your query result as a CSV file.

How are you using Ordnance Survey Linked Data?

June 5, 2013 1 comment

I might have mentioned (a few times) that the new look Ordnance Survey linked data site is now live. A question I ask from time to time is:

1) Are you using the data, and if so what for (if you don’t mind saying)?

2) Even if you aren’t actively using the data are you linking to it?

Please comment below if you have anything you’d like to share. Thank you in advance!

New Ordnance Survey Linked Data Site not just for Data Geeks

June 3, 2013 1 comment

Ordnance Survey’s new linked data site went live today. You can read the official press release here. One of the major improvements to the site is the look and feel of the site, and as a result of this the site should be useful to people who don’t care about ‘scary things’ like APIs, linked data or RDF.

One key additional feature of the new site is map views (!) of entities in the data. This means the site could be useful if you want to share your postcode with friends or colleagues as a means of locating your house or place of work. Every postcode in Great Britain has a webpage in the OS linked data of the form:

http://data.ordnancesurvey.co.uk/id/postcodeunit/POSTCODE

Examples of this would be the OS HQ postcode:

http://data.ordnancesurvey.co.uk/id/postcodeunit/SO160AS

or the postcode for the University of Southampton:

http://data.ordnancesurvey.co.uk/id/postcodeunit/SO171BJ

Click on either of these links you’ll see a map of the postcode – which you can view at various levels of zoom. You’ll also see useful information about the postcode such as its lat/long coordinate. More interestingly you’ll notice that it provides information about the ward, district/unitary authority, county (where applicable) and country your postcode is located in. So for the University of Southampton postcode we can see it’s located in the ward Portswood, the district Southampton and the country England.

Another interesting addition to the site is links to a few useful external sites such as: They Work For You, Fix My Street, NHS Choice and Police UK. This hopefully makes the linked data site a useful location based hub to information about what’s going on in your particular postcode area.

Why not give it a try with your postcode…:)

GeoSPARQL and Ordnance Survey Linked Data

April 26, 2013 3 comments

The Ordnance Survey Linked Data contains lots of qualitative spatial information – that is topological relationships between different regions. We have information about what each region contains, is within and touches (e.g. Cambridgeshire touches Norfolk). These relationships were encoded using an Ordnance Survey vocabulary as there was nothing suitable at the time. Since then a new standard has emerged from the OGC called GeoSPARQL. In the long term we would probably like to migrate the OS data over to the GeoSPARQL standard, but to stop third party applications using the data from breaking we decided not to on this release. However, mappings from the OS vocabulary have been made to the GeoSPARQL vocabulary via ‘owl:equivalentProperty’. So each of the spatial relationships now have a link to their equivalent in GeoSPARQL. Please see: contains, within, touches, equals, disjoint and partially overlaps for more details on which properties they are related to in GeoSPARQL.

 

Ordnance Survey Linked Data and the Reconciliation API

April 25, 2013 5 comments

The new Ordnance Survey Linked Data has a reconciliation API that allows users to turn text into URIs by matching against the Ordnance Survey linked data using a tool called open refine.

I’m not an expert on open refine but had a quick try of the tool today using some open data about libraries (available here). Instructions on installing Open Refine can be found here.

To use the Open Refine load your data into the tool and create your new project. On loading the library data into Open Refine you should see something like this:

Image

We can use Open Refine to turn the labels in both the ‘county’ column and postcode column into URIs. For the county column click the down arrow next the column name and select reconcile -> start reconciling. Now click ‘Add Standard Service’ and add the following URL http://data.ordnancesurvey.co.uk/datasets/boundary-line/apis/reconciliation. 

As the ‘county’ column will contain a mixture of types select the ‘reconcile against no particular type’ option and click ‘start reconciling’. You should now see that most of the text labels have turned to hyperlinks (note OS linked data does not included Northern Ireland data…this accounts for the missing values).

You can do the same for the postcode column, but this time use the API at: http://data.ordnancesurvey.co.uk/datasets/code-point-open/apis/reconciliation

Your data should now look something like:

Image

You have now successfully replaced the text in these columns with links to the OS linked data.

Another useful thing to try is a simple bit of geocoding based on postcodes. Again go to the postcode column and select “Edit Column -> Add Column by fetching URLs’. Where asked type in a column name (e.g. PC JSON) and in the Expression box type:

http://data.ordnancesurvey.co.uk/datasets/code-point-open/apis/search?output=json&query=’ + escape(value,’url’)

You should now see a column appear full of JSON results:

Screen Shot 2013-04-25 at 15.23.11

On the PC JSON column select “Edit Column -> Add Column Based on this column”. Again add a column name of your choice. I wanted to extract the value of the easting and northing and add it as a column so I called my new column ‘easting,northing’. In the expression box enter the following to get the value of the easting and northing:

with(value.parseJson(), pair, pair.results[0].easting + ‘,’ + pair.results[0].northing)

and you should now see something like:

Screen Shot 2013-04-25 at 15.27.27

Congratulations…you have now geo-coded your libary spreadsheet via a postcode and the OS linked data.

For more info on how to use Open Refine for reconciliation watch this youtube video.

Announcing new beta Ordnance Survey Linked Data Site

April 25, 2013 1 comment

Ordnance Survey has released a new beta linked data site. You can read the official press release here.

I thought I’d write a quick (unofficial) guide to some of the changes. The most obvious one that is hopefully apparent as you navigate round the site is the much improved look and feel of the site. Including maps (!) showing where particular resources are located. Try this and this for example. Maps can be viewed at different levels of zoom.

Another improvement is the addition of new APIs. The first of these is an improved search function. Supported fields for search and some examples can be found here. The search API now includes a spatial search element.

The SPARQL API is improved. Output is now available in additional formats (such as CSV) as well as the usual SPARQL-XML and SPARQL-JSON. Example SPARQL queries are also included to get users started.

Another interesting addition is a new reconciliation API. This allows developers to use the Ordnance Survey linked data with the Open Refine tool. This would allow a user to match a list of postcodes or place names in a spreadsheet to URIs in the Ordnance Survey linked data.

In the new release the Ordnance Survey linked data has been split into distinct datasets. You could use the above described APIs with the complete dataset or, if preferred, just work on the Code-Point Open or Boundary Line datasets.

For details on where to send feedback on the new site please see the official press release here.

Update: I blogged a bit more about some of the new APIs here.

Introducing RAGLD

December 21, 2011 1 comment

RAGLD (Rapid Assembly of Geo-centred Linked Data) is a project looking at the development of a software component library to support the Rapid Assembly of Geo-centred Linked Data applications

The advent of new standards and initiatives for data publication in the context of the World Wide Web (in particular the move to linked data formats) has resulted in the availability of rich sources of information about the changing economic, geographic and socio-cultural landscape of the United Kingdom, and many other countries around the world. In order to exploit the latent potential of these linked data assets, we need to provide access to tools and technologies that enable data consumers to easily select, filter, manipulate, visualize, transform and communicate data in ways that are suited to specific decision-making processes.In this project, we will enable organizations to press maximum value from the UK’s growing portfolio of linked data assets. In particular, we will develop a suite of software components that enables diverse organizations to rapidly assemble ‘goal-oriented’ linked data applications and data processing pipelines in order to enhance their awareness and understanding of the UK’s geographic, economic and socio-cultural landscape.A specific goal for the project will be to support comparative and multi-perspective region-based analysis of UK linked data assets (this refers to an ability to manipulate data with respect to various geographic region overlays), and as part of this activity we will incorporate the results of recent experimental efforts which seek to extend the kind of geo-centred regional overlays that can be used for both analytic and navigational purposes. The technical outcomes of this project will lead to significant improvements in our ability to exploit large-scale linked datasets for the purposes of strategic decision-making.RAGLD is a collaboative research initiative between the Ordnance Survey, Seme4 Ltd and the University of Southampton, and is funded in part by the Technology Strategy Board‘s “Harnessing Large and Diverse Sources of Data” programme. Commencing October 2011, the project runs for 18 months.

If you’d like to input into the requirements phase of the project I’d be very grateful if you could fill in one of these questionnaires. Many thanks in advance.

Making things with Ordnance Survey Linked Data…

November 3, 2011 7 comments

Following the example of “Making things with BBC data” I thought I’d ask the same question for Ordnance Survey linked data. Please leave a comment if you’ve used Ordnance Survey linked data for anything from a quick hack, full blown project or if you even just link to it in your data. Thanks!

 

 

How can I use the Ordnance Survey Linked Data: a python rdflib example

January 18, 2011 4 comments

In this blog post I talked about the potential of (Ordnance Survey) linked data. Partly motivated by this challenge I decided to write up how I did the mash up of data.gov.uk data and Ordnance Survey linked data. This post is a slightly different take on a previous post.

For this mashup I used Python 2.7 and rdflib 3.0.0.

First off you need to install rdflib. Full instructions on doing this can be found here. If you use easy_install you can install rdflib by typing:

easy_install -U "rdflib>=3.0.0"

You will also need to install rdfextras (see here). This can also be done using easy_install

easy_install rdfextras

You are now good to go. The next thing I needed was the BIS funding data. This can be downloaded here. The original BIS data gives location for various organisations via a URI based on the organisation’s postcode. For example:

<http://education.data.gov.uk/id/institution/UniversityOfWalesSwansea>
<http://research.data.gov.uk/def/project/location>
<http://education.data.gov.uk/id/institution/BabrahamBioscienceTechnolgiesLtd/SA28PP> .

I edited the data to point to URIs for postcodes in the Ordnance Survey linked data (note these weren’t available when the BIS data was created). Now we have:

<http://education.data.gov.uk/id/institution/UniversityOfWalesSwansea>
<http://research.data.gov.uk/def/project/location>
<http://data.ordnancesurvey.co.uk/id/postcodeunit/SA28PP> .

This triple basically states the location of the University of Wales in terms of its postcode.

So the edited RDF data now contains location information for research institutions in terms of a postcode URI, and it also contains information about the research projects worked on by those institutions and how much funding those projects received. Using rdflib it is very straight forward to load this data into Python and use it programmatically. Here’s how:

These first few lines load the necessary libraries and plugins:

import logging
import rdflib

# Configure how we want rdflib logger to log messages
_logger = logging.getLogger("rdflib")
_logger.setLevel(logging.DEBUG)_hdlr = logging.StreamHandler()
_hdlr.setFormatter(logging.Formatter('%(name)s %(levelname)s: %(message)s'))
_logger.addHandler(_hdlr)

from rdflib import Graph
from rdflib import URIRef, Literal, BNode, Namespace, ConjunctiveGraph
from rdflib import RDF
from rdflib import RDFS
rdflib.plugin.register('sparql', rdflib.query.Processor,'rdfextras.sparql.processor', 'Processor')
rdflib.plugin.register('sparql', rdflib.query.Result, 'rdfextras.sparql.query', 'SPARQLQueryResult')

 

we now create a Graph in which to store the RDF:

store = Graph()

the data can be easily loaded from the web or hard drive. In this case I have the files stored locally:

store.parse("file:/C:/Projects/RDFPythonPlay/data/businessdatagovuk.nt", format="nt")
store.parse("file:/C:/Projects/RDFPythonPlay/data/educationdatagovuk.nt", format="nt")
store.parse("file:/C:/Projects/RDFPythonPlay/data/patentsdatagovuk.nt", format="nt")
store.parse("file:/C:/Projects/RDFPythonPlay/data/researchdatagovuk.nt", format="nt")

Recall from here that I am interested in seeing which parties are funding in which local authority areas. The data as it stands will not let me do this. However, the OS postcode linked data provides information about the local authority areas that a postcode is contained in. All I now have to do is ‘follow my nose’ and load in the postcode data. I can do this by going through the triples containing links between organisations and postcodes via the location property. First I set up a few namespace bindings:

# Bind a few prefix, namespace pairs.
store.bind("PROJECT", "http://research.data.gov.uk/def/project/")
store.bind("FOAF", "http://xmlns.com/foaf/0.1/")

# Create a namespace object for the project and FOAF namespaces.
PROJECT = Namespace("http://research.data.gov.uk/def/project/")
FOAF = Namespace("http://xmlns.com/foaf/0.1/")

I can now iterate over the triples in the store and find those who subject is a type of foaf:Organization, and which contain the location property. An example of such a triple would be the one we had above:

<http://education.data.gov.uk/id/institution/UniversityOfWalesSwansea>
<http://research.data.gov.uk/def/project/location>
<http://data.ordnancesurvey.co.uk/id/postcodeunit/SA28PP> .

I can then lookup the data behind the postcode URI and load this into the store. This is all done by the following code:

# For each foaf:Organization in the store get the postcode

for organization in store.subjects(RDF.type, FOAF["Organization"]):
for postcode in store.objects(organization, PROJECT["location"]):
try:
print postcode
store.parse(postcode)
except:
print '404 not found'

Now the data in the store will contain a link from organisation to postocde, and a link from postcode to local authority area. We can now traverse the graph to find the link from organisation to local authority area. We can now use a simple SPARQL query to retrieve a list of projects giving the local authority areas the participating organisations are based in. The SPARQL query to do this is:

select distinct ?label ?districtlabel
where
{
?organisation <http://research.data.gov.uk/def/project/project> ?project .
?project <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?organisation <http://research.data.gov.uk/def/project/location> ?x .
?x <http://data.ordnancesurvey.co.uk/ontology/postcode/district> ?district .
?district <http://www.w3.org/2000/01/rdf-schema#label> ?districtlabel . }

We can now add that into our Python code as follows and print out the query answers:


query = """select distinct ?label ?districtlabel \
where \
{\
?organisation <http://research.data.gov.uk/def/project/project> ?project .\
?project <http://www.w3.org/2000/01/rdf-schema#label> ?label . \
?organisation <http://research.data.gov.uk/def/project/location> ?x . \
?x <http://data.ordnancesurvey.co.uk/ontology/postcode/district> ?district . \
?district <http://www.w3.org/2000/01/rdf-schema#label> ?districtlabel . }"""

answers = store.query(query).serialize('python')

for (label,districtlabel) in answers:
print "%s was funded in %s" % (label,districtlabel)

To summarise, this post shows how you just need rdflib and Python to build a simple linked data mashup – no separate triplestore is required! RDF is loaded into a Graph. Triples in this Graph reference postcode URIs. These URIs are de-referenced and the RDF behind them is loaded into the Graph. We have now enhanced the data in the Graph with local authority area information. So as well as knowing the postcode of the organisations taking part in certain projects we now also know which local authority area they are in. Job done! We can now analyse funding data at the level of postcode, local authority area and (as an exercise for the ready) European region.

[Python note – WordPress keeps messing with my indentation and I’m too tired to fix. I hope that doesn’t detract from your enjoyment of this blog post :)]