Dwins’s Weblog

I learned stuff today!

Posted in Development,Open Source Software by dwins on April 17, 2008

So, software is funny. It’s this huge complex thing where tons of abstractions are required to get anything useful done in a reasonable amount of time. (Think you’re dealing with an image? It’s really a grid of colors. Which is really a one-dimensional list of colors with a known width. Which are really just triples of numbers interpreted as red green and blue brightnesses. Which are really just sequences of binary values…) Abstractions are nice though since they keep you from making assumptions and keep things flexible. For example, GeoTools uses an abstraction called a DataStore to keep client code from caring about exactly how geospatial data is stored. Maybe it’s in a database like postgis. Maybe it’s a shapefile. Maybe you’re grabbing the data directly from the web using a WMS server someplace. It doesn’t matter, you just set up a DataStore and make a query and you get some geographic features.

This is really neat, because if someone wants to use a new type of storage, they can just write a new DataStore and all of a sudden everyone using GeoTools can use it. On the other hand, a lot of operations that you might want to run against a dataset still need to be written for each DataStore. So, each thing that a GeoTools Query knows about, a DataStore needs to know about as well. (If you’re wondering why we can’t just have these operations implemented “above” the DataStore in a way that uses that very same abstraction to avoid all this repeated code, see [1].) Today, one of the GeoServer developers wrote up a proposal to add a new feature to the Query object. To my dismay, I found a note to the effect that, since it might not be straightforward to implement this new feature, we should also add a “Capabilities” object to the DataStore, a means for DataStores to advertise which Query features they fail to implement.

The reason such a “cop-out” feature bugs me is that generally, if someone wants to query the data in a certain way, they will need the data that way regardless of whether the DataStore knows how to do it or not. So, you end up with either some code like this:

FeatureSource source = someDataStore.getFeatureSource();
FeatureCollection collection = source.getFeatures(Query.ALL);
Collections.sort(collection, new CustomSorter(mySortCriteria));

(ie, barely even using GeoTools and using a generic sort function instead of the possibly better-optimized features of the underlying database). Or, you might do something like this:

FeatureCollection collection;
FeatureSource source = someDataStore.getFeatureSource();
if (source.getQueryCapabilities().isSortingSupported()){
Query query = new Query(Query.ALL);
    collection = source.getFeatures(query);
} else {
    collection = source.getFeatures(Query.ALL);
    Collections.sort(collection, new CustomSorter(mySortCriteria);

(that is, check whether GeoTools can sort, and do it yourself if you have to.)  I may be demonstrating a creative failing on my part here, but I can’t think of any situation where a developer would say “I need this sorted… but only if you can do it for me.”  So why provide the lame sorting in datastores directly?  GeoTools already provides an AbstractDatastore which allows general-purpose code to be shared among the DataStores.  So then it would look like this:

FeatureSource source = someDataStore.getFeatureSource();
Query query = new Query(Query.all);
collection = source.getFeatures(query);

Sure, it’s an extra line, but now it’s fast when it needs to be.  Plus any bug fixes to the fallback sorting code get pushed out to all the other users of GeoTools since that’s where they live.  So, if this works out so well, why isn’t it an option?

As it turns out, GeoTools has not, thus far, been using this approach for the Query operations already supported.  (For example, there is only one DataStore implementation that actually supports sorting.)  So when we went to add a feature that could build off of other operations (in this case, the proposed paging operation is much more useful if we are imposing an order on the data by sorting it) we couldn’t simply provide a default implementation of the new operation simply because it doesn’t have the option of building on the older ones.  So, doing things ‘right’ in this case would involve updating all of the considerable number of DataStores in GeoTools to use a default sorting function, before even thinking about the paging feature.  Just because our software is Free doesn’t mean our time is worthless, so this will have to wait until we have some time to work on code cleanup as opposed to paying work.

So, the thing I learned today is that sometimes lax design today leads to bad design tomorrow.  That you can’t get rid of.  Because everyone’s dealing with the old way.


2 Responses to 'I learned stuff today!'

Subscribe to comments with RSS or TrackBack to 'I learned stuff today!'.

  1. Andrea Aime said,

    Err… the things is more like “lack of resources today won’t be cured tomorrow”. Sorting was implemented as part of the WFS 1.1 development, and resources were not fed into making all datastores support sorting (“ah, the maintainers will add that as they see fit”), but only to have that supported in PostGIS.

    The abstract data store is unfortunately not the base class for all datastores (if fact the only significant datastores using it are shapefile, property and WFS) because it was added way after the current datastores were developed and frankly it forces you into an awkward programming model that I personally don’t like.

    Which brings us to where we are today, willing to add a new feature on top of something that’s not implemented everywhere, with even less resources to tackle that. And as you noted, the problem will just get worse over time.

    About the client code, imho it should not try to make funny if-based-girations, if the datastore does not support what it needs, just blow up and say no, can’t do.

  2. dwins said,

    Sure, I hope I didn’t come across as trying to say that bad design can never be fixed. But it does seem to be much more work as more and more code is written against the flawed base, which means the longer things stay broken, the harder they are to fix.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: