Dwins’s Weblog

Summer of Code

Posted in Evangelism,Open Source Software by dwins on April 22, 2008

So, I’m mentoring a project for Google’s Summer of Code this year.  How cool is that?

But at the same time, I’m a bit disappointed.  Some of the students that I talked to about applying never bothered to put in a submission.  It seems a bit weird to me that after punching through that first psychological barrier of getting into a text chat channel, people still manage to convince themselves that it’s not worth applying.  I mean, the worst that can happen is you’d be turned down; what’s the big deal?

I see this around in general.  I bug people a bit about contributing to this or that project that they use all the time, or people come to me asking how they can become a better programmer and I tell them they should find an open source project and contribute.  (Seriously, what a great way to build your resume.  For free!)  And I get back weird answers like “I’m not good enough” or “I wouldn’t want to have people relying on me and then I’d let them down.”  I can see how it would be intimidating to try and get started on an enormous existing codebase, but it’s not like a project is going to suddenly fall over because some contributed a flaky patch.  The way it really works (at least in GeoServer) is that you make some changes.  You show them to someone who’s already familiar with the code.  If the code is good, it goes in.  If not, we’ll give you some advice on how to make it acceptable and you can try again.  Even before someone is contributing directly to the project they can be learning, see how that works?

So, to sum up: if you are thinking about contributing to an open source project, go ahead and do it!


I learned stuff today!

Posted in Development,Open Source Software by dwins on April 17, 2008

So, software is funny. It’s this huge complex thing where tons of abstractions are required to get anything useful done in a reasonable amount of time. (Think you’re dealing with an image? It’s really a grid of colors. Which is really a one-dimensional list of colors with a known width. Which are really just triples of numbers interpreted as red green and blue brightnesses. Which are really just sequences of binary values…) Abstractions are nice though since they keep you from making assumptions and keep things flexible. For example, GeoTools uses an abstraction called a DataStore to keep client code from caring about exactly how geospatial data is stored. Maybe it’s in a database like postgis. Maybe it’s a shapefile. Maybe you’re grabbing the data directly from the web using a WMS server someplace. It doesn’t matter, you just set up a DataStore and make a query and you get some geographic features.

This is really neat, because if someone wants to use a new type of storage, they can just write a new DataStore and all of a sudden everyone using GeoTools can use it. On the other hand, a lot of operations that you might want to run against a dataset still need to be written for each DataStore. So, each thing that a GeoTools Query knows about, a DataStore needs to know about as well. (If you’re wondering why we can’t just have these operations implemented “above” the DataStore in a way that uses that very same abstraction to avoid all this repeated code, see [1].) Today, one of the GeoServer developers wrote up a proposal to add a new feature to the Query object. To my dismay, I found a note to the effect that, since it might not be straightforward to implement this new feature, we should also add a “Capabilities” object to the DataStore, a means for DataStores to advertise which Query features they fail to implement.

The reason such a “cop-out” feature bugs me is that generally, if someone wants to query the data in a certain way, they will need the data that way regardless of whether the DataStore knows how to do it or not. So, you end up with either some code like this:

FeatureSource source = someDataStore.getFeatureSource();
FeatureCollection collection = source.getFeatures(Query.ALL);
Collections.sort(collection, new CustomSorter(mySortCriteria));

(ie, barely even using GeoTools and using a generic sort function instead of the possibly better-optimized features of the underlying database). Or, you might do something like this:

FeatureCollection collection;
FeatureSource source = someDataStore.getFeatureSource();
if (source.getQueryCapabilities().isSortingSupported()){
Query query = new Query(Query.ALL);
    collection = source.getFeatures(query);
} else {
    collection = source.getFeatures(Query.ALL);
    Collections.sort(collection, new CustomSorter(mySortCriteria);

(that is, check whether GeoTools can sort, and do it yourself if you have to.)  I may be demonstrating a creative failing on my part here, but I can’t think of any situation where a developer would say “I need this sorted… but only if you can do it for me.”  So why provide the lame sorting in datastores directly?  GeoTools already provides an AbstractDatastore which allows general-purpose code to be shared among the DataStores.  So then it would look like this:

FeatureSource source = someDataStore.getFeatureSource();
Query query = new Query(Query.all);
collection = source.getFeatures(query);

Sure, it’s an extra line, but now it’s fast when it needs to be.  Plus any bug fixes to the fallback sorting code get pushed out to all the other users of GeoTools since that’s where they live.  So, if this works out so well, why isn’t it an option?

As it turns out, GeoTools has not, thus far, been using this approach for the Query operations already supported.  (For example, there is only one DataStore implementation that actually supports sorting.)  So when we went to add a feature that could build off of other operations (in this case, the proposed paging operation is much more useful if we are imposing an order on the data by sorting it) we couldn’t simply provide a default implementation of the new operation simply because it doesn’t have the option of building on the older ones.  So, doing things ‘right’ in this case would involve updating all of the considerable number of DataStores in GeoTools to use a default sorting function, before even thinking about the paging feature.  Just because our software is Free doesn’t mean our time is worthless, so this will have to wait until we have some time to work on code cleanup as opposed to paying work.

So, the thing I learned today is that sometimes lax design today leads to bad design tomorrow.  That you can’t get rid of.  Because everyone’s dealing with the old way.


Posted in Bio,Development,Open Source Software by dwins on April 17, 2008

I work on open source software.  This means that I work with lots of other people who work on open source software.  A neat thing about open-source software is that it basically requires clean code, or at least attention to that sort of thing.  After all, labelling your software as open-source when nobody but you can untangle the spaghetti code enough to get anything done is roughly analogous to giving away free soda while charging for the cups.  So, people that are ‘good’ at the whole open-source thing tend to care about design.

In particular, I work on GeoServer, a web mapping server written in Java.  GeoServer relies on GeoTools to provide database abstractions and other great stuff to help with the geospatial operations, so GeoServer can focus on enforcing security restrictions and providing a decent configuration system and generally bridging the gap between the Web and GeoTools’s Java API.  This works out great, since GeoTools can then provide similar functionality to other geospatial applications such as uDig.  In the typical open-source way of things, this means that all three projects benefit since the more users an open-source project has, the more developers it will have. (generally speaking).

So, inasmuch as I am defined by what I do, that’s who I am.