Back in the dim and distant past, when the Web was born, there was no Google, no search, no place to start surfing from. Shortly after its beginning, small aggregations of links and catalogues started to appear, and sites like the WWW Virtual Library became good jumping off points. The rest may well be history, but it’s history well worth comparing with the ongoing evolution of the Web of Data.
Nowadays, using the principles of Linked Data, we can chase RDF predicates around the web, happily jumping between data islands. Just as users and web crawlers alike follow links in HTML to read and index the visual Web, this data linking allows all sorts of applications, from data browsers to semantic search engines.
But what can the history of the Web tell us about how to cope with SPARQL? SPARQL datasets can play nicely with the Web of Data by ensuring that our Linked Data URLs can point into SPARQL datasets, for example by crafting a SPARQL DESCRIBE query into a URL. But how do we find these SPARQL endpoints in the first place in order to link to them? Currently, we go to, ahem, Google, or the ESW's list of “currently alive endpoints” and copy/paste the results.
Some would argue that having lots of SPARQL endpoints around the place is a bit of a dead end anyway, and that eventually all the web of data, however its presented, will be sucked into a Google like data warehouse (to be queried using SPARQL?) Perhaps the key word here is “eventually”. Like the rest of the planet, I use Google constantly (thanks Google), but I also use more specific entry points like Wikipedia, company websites and wikis, RSS feeds, etc. when I know they'll be more up to date, more authoritative or are password protected. SPARQL is a well designed solution for those wishing to create quite specific data driven Web applications while at the same time making their data more widely available for more generic use.
Over the past few years, I’ve worked on and off with a bunch of people who want to do this kind of thing in their organisation. They want to free their data so that others can use it, but they also want to keep control over it and use it effectively in their own applications. A few years back they saw RDF as a way to approach these goals, and now SPARQL has been accepted by the W3C and has good vendor support, they’re moving to that. All good stuff.
DNS Service Discovery and SPARQL
But that’s just half the story. They also saw that if they were to couple this with Apple’s Bonjour networking protocol (DNS-SD) then they would be able to find each other and share data on a network without having to worry about setting anything up beforehand. Everything would Just Work. Even over wide area networks.
We went as far as registering a SPARQL service type at dns-sd.org, so now anyone can publish their SPARQL endpoints and have them automagically discovered.
Here's how it works. I'll use Apple’s open source implementation of their DNS-SD protocol stack, called mDNSResponder, although you can use Avahi or even just plain old DNS tools and servers, with some caveats (see later episodes). If you’re running on OSX, then mDNSResponder is already installed; if you’re running on Linux, then you’ll have to download, compile and install the package from Apple (again, see later episodes); if you’re on Windows, you’re on your own.
One of the handy command line tools which comes with mDNSResponder is ‘dns-sd’, which allows you to register, browse and resolve services on both the local area network, and if you’ve got permission, to a remote DNS server, allowing so called “wide area Bonjour”.
Firstly, let’s set up a ‘browse’ for currently published SPARQL endpoints on the local network. This may seem backward as there shouldn’t be anything published just yet, but since the Bonjour protocol is all about dynamic networks and being updated on the fly as things come and go, it should be a good test of the next step. In the shell, browse for local endpoints as follows:
dns-sd -B _sparql._tcp
You should just see “browsing for _sparql._tcp” with nothing else. So far so good.
Now let’s publish DBTune’s Magnatune SPARQL endpoint. If possible, do this on a different machine on the local network, but if not, just do it in a different shell. We’ll publish the endpoint URL itself, give the service a human readable name and add some information about which vocabularies are used. In the shell, register the endpoint as follows:
dns-sd -P Magnatune _sparql._tcp local 2020 dbtune.org 22.214.171.124 \ "path=/sparql" \ "vocabs=http://purl.org/ontologies/mo http://www.w3.org/TR/owl-time http://purl.org/NET/c4dm/timeline.owl http://xmlns.com/foaf/0.1"
Since the service we’re registering is outside our own domain which in this case is just the local area network (the ‘local’ part above), we register the service as a ‘proxy service’ and give the hostname and IP address where it can be found. ‘2020’ is the port on which the endpoint is running and the rest of the arguments, ‘path=’ and ‘vocabs=’ are part of the so-called “TXT Record”, which is really the only part of the registration which is specific to SPARQL. ‘path’ simply gives the path part of the URL of the endpoint while ‘vocabs’ gives a space seperated list of RDF vocabularies used.
Did the browse command in the other window show anything? Hopefully it’ll show a line with some, but not all, of the details of the new registration. In order to find out the rest of the details we registered, we need to ‘resolve’ the service as follows:
dns-sd -L Magnatune _sparql._tcp
Which should now give you back the information you need to figure out the SPARQL endpoint URL, a nice name for the service you can put in the user interface, along with which vocabularies it uses.
In the next episode, I'll cover doing the same thing with wide area Bonjour and show how to register and browse programmatically from Java. In the meantime, you can try out wide area browsing just by appending the domain name to browse to the dns-sd invocation, e.g.:
dns-sd -B _sparql._tcp floop.org.uk
or equally try sd.floop.org.uk (for services I'm currently dynamically registering) and even dns-sd.nc3a.nato.int for some interesting insights.