Friday, July 3, 2009

Biodiversity databases and OGC standards don't play well together


Some days ago Tim and I had some discussions about how to provide OGC services for biodiversity databases, like for example the Global Registry of Migratory Species. This time the reason for the discussion to start again was the discovery of the new INSPIRE Geoportal Viewer. For those who don't know it, the INSPIRE directive is pushing the creation of a common infrastructure fo sharing geospatial data within Europe. They plan to do that by using Open Standards like the ones from the Open Geospatial Consortium (OGC).

The typical use case they always describe with these Spatial Data Infrastructures (SDIs) is having a registry of services (Catalog Service) where a user can find geospatial data. The data is available through Web services, like Web Map Server (WMS) or Web Feature Service (WFS). With a
web or desktop GIS client you discover, connect to the services, display different layers, do some analysis, print results, etc. All by using open standards and mixing data from lot of different places.

The INSPIRE registry and Geoportal Viewer is the typical example. On the viewer you can select from a list of available services (registered on the catalog service) or use your WMS web service URL.

When you select a service, you get a list of available layers on it. For example in the Spanish SDI you get this:
The way this work internally is by doing a getCapabilities request to the WMS server. The returned XML document list the layers available with metadata on how to query it, owner, etc.

But what happens when you have a database like the GBIF cache with 1.8 Million species? You can not create a layer per species, or the getCapabitilites document will be MB and MB impossible to parse by any client. In any case, who wants to provide a list of 1.8M species to select a layer?

Well, the way to make it work is to specify a filter on the WMS request specifying for example the species_id you are interested in. But those generic clients do not support specifying these kind of filters.

To me that means that the current status of OGC clients, like the ones used by INSPIRE, GEOSS, National SDIs, etc. are not able to handle biodiversity OGC services. Or say it in a different way, OGC services are not prepared to handle biodiversity databases with lot of species.

What are the possibilities?

1) OGC supports on their capabilities documents things like "Hey! I am not a service with a set of layers, I am a datastore with potentially millions of layers. So if you want to grab anything from me, you are going to provide a filter in your request". This will imply that OGC do some work and more important, software clients support this work. I think this will not happen in a few years.

2) Create a set of interesting layers in biodiversity. We could match our biodiversity databases against IUCN list of endangered species, create richness maps, etc, but access to primary data per species will not be possible.

If you think on the potential customers we probably should be thinking on a predefined list of layers that we could all create on our OGC services that might be interesting for lot of people. Richness, endangered species, kingdoms, whatever...

Other possibility is that we create portals where the user filter for a species and then gets a "customized,dynamic GetCapabilities document" that will include the filter on the URL. That will be possible. But with Catalog Services, like GEOSS, where there will thousands of services, is biodiversity going to be so special as for the user to go to one of our website before continuing in their wonderful world of web services workflow? I doubt it.

Next week I am going to Geoweb 09 as invited speaker to talk about Biodiversity and the challenges to share it on the Geoweb. I would love to hear what do you think about using OGC services within our community or any other issue related to geospatial data and analysis.
I would love to hear what do you think about using OGC services within our community.