Genealogy and the Semantic Web

My folks have a keen interest in genealogy, and have built up quite an impressive family tree over the past few years. I always thought that genealogy would be a potentially cool application for the semantic web (imagine several independently constructed family trees being connected via their common nodes). It seems I wasn’t the first person to think this as this recent blog post from Dan Brickley suggests. Dan has written quick and horrid Perl script (his words) to convert the common family tree GED format to RDF/XML. A dump of my family tree in RDF/XML can be found here.

The family trees contain information about births and deaths of people. All of these events are then connected to places. I think one weakness of the current conversion script is that it represents the place information as a string, for example:

<foaf: Person rdf:about=”I1884.xml#I1884″>
<foaf:name>William Parsonage</foaf:name>
<foaf:givenname>William</foaf:givenname>
<foaf:family_name>Parsonage</foaf:family_name>
<bio:event>
<bio:Birth>
<bio:date>30 AUG 1721</bio:date>
<bio:place>Birmingham</bio:place>
</bio:Birth>
</bio:event>

It would be far more interesing if instead the place was actually connected to a URI representing the place on the semantic web. For example, in this case there are a number of URIs for Birmingham on the semantic web, for example http://os.rkbexplorer.com/id/osr7000000000000018 from Ordnance Survey or http://sws.geonames.org/2655603/about.rdf from Geonames. The RDF/XML could them be modified as follows:

<foaf: Person rdf:about=”I1884.xml#I1884″>
<foaf:name>William Parsonage</foaf:name>
<foaf:givenname>William</foaf:givenname>
<foaf:family_name>Parsonage</foaf:family_name>
<bio:event>
<bio:Birth>
<bio:date>30 AUG 1721</bio:date>
<bio:place rdf:resource=”http://os.rkbexplorer.com/id/osr7000000000000018“/>
</bio:Birth>
</bio:event>

If I get chance at the weekend I’ll see how much work it will take to add this information to the family tree RDF/XML. It will also be interesting to see if the combination of these two datasets provides extra information and insights to budding genealogists.

Reblog this post [with Zemanta]

6 Responses to “Genealogy and the Semantic Web”

  1. [...] Enlace: Genealogy and the Semantic Web [...]

  2. Thank you for the post. You and Dan are both right on target in understanding the importance of integrating genealogy with the Semantic Web. Imagine what we’d discover! Of course, the downside is that I’ll become even more obsessed with genealogy…

    As part of this process, reification for documenting sources will be extremely important. We must be able to say that a particular source URI documents a fact (birth date, marriage location, father’s name, etc). Once we can do this with URIs for everything, there will be much less of a focus on names and spelling variations, and we’ll be able to collaborate so much easier.

  3. Thanks for your comment/interest Brian. I hope you look more at this over the weekend. I think the semantic web and linked data have the potential to offer a great deal to the world of genealogy.

  4. No question that Semantic Web will be valuable for genealogy. I see at least two essential solutions that must be developed:
    1. Sources–In Semantic Web parlance, Tim Berners-Lee calls it provenance; in genealogy the typical term is sources. Very little web-posted genealogy has any source citation, and what is given is often secondary sourcing rather than primary.
    2. Duplication has to be collapsed. The same information is typically found dozens or hundreds of times on various websites, often with slight variations such as different forms for names, dates and places. Sorting through the duplication has gotten to be onerous and prohibitive.
    AI technology will likely offer solutions in both these arenas.

Leave a Reply