Home > Semantic Web > Genealogy and the Semantic Web

Genealogy and the Semantic Web


My folks have a keen interest in genealogy, and have built up quite an impressive family tree over the past few years. I always thought that genealogy would be a potentially cool application for the semantic web (imagine several independently constructed family trees being connected via their common nodes). It seems I wasn’t the first person to think this as this recent blog post from Dan Brickley suggests. Dan has written quick and horrid Perl script (his words) to convert the common family tree GED format to RDF/XML. A dump of my family tree in RDF/XML can be found here.

The family trees contain information about births and deaths of people. All of these events are then connected to places. I think one weakness of the current conversion script is that it represents the place information as a string, for example:

<foaf: Person rdf:about=”I1884.xml#I1884″>
<foaf:name>William Parsonage</foaf:name>
<foaf:givenname>William</foaf:givenname>
<foaf:family_name>Parsonage</foaf:family_name>
<bio:event>
<bio:Birth>
<bio:date>30 AUG 1721</bio:date>
<bio:place>Birmingham</bio:place>
</bio:Birth>
</bio:event>

It would be far more interesing if instead the place was actually connected to a URI representing the place on the semantic web. For example, in this case there are a number of URIs for Birmingham on the semantic web, for example http://os.rkbexplorer.com/id/osr7000000000000018 from Ordnance Survey or http://sws.geonames.org/2655603/about.rdf from Geonames. The RDF/XML could them be modified as follows:

<foaf: Person rdf:about=”I1884.xml#I1884″>
<foaf:name>William Parsonage</foaf:name>
<foaf:givenname>William</foaf:givenname>
<foaf:family_name>Parsonage</foaf:family_name>
<bio:event>
<bio:Birth>
<bio:date>30 AUG 1721</bio:date>
<bio:place rdf:resource=”http://os.rkbexplorer.com/id/osr7000000000000018“/>
</bio:Birth>
</bio:event>

If I get chance at the weekend I’ll see how much work it will take to add this information to the family tree RDF/XML. It will also be interesting to see if the combination of these two datasets provides extra information and insights to budding genealogists.

Reblog this post [with Zemanta]
  1. January 22, 2009 at 3:48 pm | #1

    Thank you for the post. You and Dan are both right on target in understanding the importance of integrating genealogy with the Semantic Web. Imagine what we’d discover! Of course, the downside is that I’ll become even more obsessed with genealogy…

    As part of this process, reification for documenting sources will be extremely important. We must be able to say that a particular source URI documents a fact (birth date, marriage location, father’s name, etc). Once we can do this with URIs for everything, there will be much less of a focus on names and spelling variations, and we’ll be able to collaborate so much easier.

  2. john225
    January 22, 2009 at 4:25 pm | #2

    Thanks for your comment/interest Brian. I hope you look more at this over the weekend. I think the semantic web and linked data have the potential to offer a great deal to the world of genealogy.

  3. January 22, 2009 at 11:45 pm | #3

    Seen http://genealogy.alexander.user.dev.freebaseapps.com/ ?

    Freebase does RDF output and provides semi-slick UI to author who your parents are.

    Here’s mine for instance:
    http://genealogy.alexander.user.dev.freebaseapps.com/?person=%2Fguid%2F9202a8c04000641f8000000007629191

    • john225
      January 23, 2009 at 7:04 am | #4

      Thanks for that Daniel. I’ll take a closer look.

  4. January 25, 2009 at 6:01 am | #5

    No question that Semantic Web will be valuable for genealogy. I see at least two essential solutions that must be developed:
    1. Sources–In Semantic Web parlance, Tim Berners-Lee calls it provenance; in genealogy the typical term is sources. Very little web-posted genealogy has any source citation, and what is given is often secondary sourcing rather than primary.
    2. Duplication has to be collapsed. The same information is typically found dozens or hundreds of times on various websites, often with slight variations such as different forms for names, dates and places. Sorting through the duplication has gotten to be onerous and prohibitive.
    AI technology will likely offer solutions in both these arenas.

  5. January 1, 2012 at 9:12 pm | #6

    I just found this tool and think it will be a great asset to my web applciation. I setup a Genealogy Application (or website) using MediaWiki and the Semantic MediaWiki bundle. It is at: http://my-family-lineage.com/w/
    John, your idea about sources is very important and I’ll add that to my site using templates and forms.
    Currenlty the form does not support sources. I wonder if we need to have a source for each piece of information on a page about a person? If so, then I would need to add a field for the source. Every item of information will need to be followed by a field that relates to the source. What Semantic Web Vocabulary would be used for citing sources?
    Bruce

  6. January 1, 2012 at 9:15 pm | #7

    Daniel O’Connor :
    Seen http://genealogy.alexander.user.dev.freebaseapps.com/ ?
    Freebase does RDF output and provides semi-slick UI to author who your parents are.
    Here’s mine for instance:
    http://genealogy.alexander.user.dev.freebaseapps.com/?person=%2Fguid%2F9202a8c04000641f8000000007629191

    So, with freebase would one just start adding information beginning with oneself and adding parents?
    I’m wondering how easy it is to do this, as I haven’t used freebase much or hardly at all… been meaning to do so. Any advice is appreciated. I don’t suppose freebase has a mechanism for importing a GEDCOM?
    Bruce

  7. January 13, 2012 at 9:23 am | #8

    @Bruce – unfortunately, no – no gedcom import. I guess if I were going to tackle it, I’d grab http://pear.php.net/package/Genealogy_Gedcom and start translating between gedcom -> freebase. Also, in reverse.

    Overall, I more wanted to illustrate that generic semantic web style tools 95% do family trees with ease.

    I’m also very, very interested in projects like http://trove.nla.gov.au/ – I’ve found Australian family history in the unstructured data. What could I automatically search for and extract, given a good NLP library plus RDF crawler? IE: something like Zemanta for family history?

  8. January 13, 2012 at 9:27 am | #9

    @Bruce – nice work! Though I’m loathe to slip away from freebase, and you are getting a bit of spam; I think you are on exactly the right track.

    How flexible are your tools for dealing with names? IE in my family history, there’s a lot of individuals with nicknames which prove to be poor IDs.

  9. February 21, 2012 at 12:09 am | #10

    Thanks for leading the way. I don’t see a reference yet to a very clear and significant paper on the importance of this new direction:

    The Coming Web of Genealogical Data, by Josh Hansen, FamilySearch
    http://fht.byu.edu/prev_workshops/workshop12/papers/3.1%20Josh%20Hansen%20-%20FHT%202012%20Workshop%20Paper%20-%20The%20Coming%20Web%20of%20Genealogical%20Data.pdf

    The trend is picking up with the beginnings of what I hope is a huge collaboration around GEDCOM X, including an open source implementation: http://www.gedcomx.org

  1. January 22, 2009 at 5:03 am | #1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,892 other followers