Thursday, January 31, 2008

How Shall I Integrate Thee? Let Me Count the Ways...

Leigh Dodds has a nice post How Shall I Integrate Thee? Let Me Count the Ways... about different ways to integrate data.
  • The one where we share identifiers
  • The one where we're describing the same thing
  • The one where we're speaking different languages
  • The one where we're using different units
  • The one where we're speaking at different levels of abstraction

Apart from the suggestion that Leigh has been watching way too much Friends, there's much food for thought here. I suspect that "The one where we're describing the same thing" is the one I'll be making most use of.

In Rethinking LSIDs versus HTTP URI I argued that most applications will use HTTP URIs, which makes them accessible, but not terribly useful as identifiers, the reason being that I think it is unlikely that people will reuse HTTP URIs ("The one where we share identifiers"). A good example is Connotea, which has its own URIs for each paper its users bookmark. I won't use these URIs as identifiers in my database (if only because if a user resolves them, they get taken to Connotea's web site, not mine). However, I will store any PubMed and DOI identifiers, so that somebody aggregating information from Connotea (say to retrieve user tags) and my database (say, to get links to sequences and specimens) can work out that the Connotea URI and my URI are talking about the same thing.

No comments: