Dwins’s Weblog


Workaround: SBT/Maven “Inconsistent module descriptor”

Posted in Development,Open Source Software by dwins on October 15, 2011
Tags: , , ,

Here’s an issue I’ve been dealing with in GeoScript.scala for a while: when fetching dependencies I get an error message:

update: sbt.ResolveException: unresolved dependency: xml-apis#xml-apis-xerces;2.7.1: java.text.ParseException: inconsistent module descriptor file found in ‘http://download.osgeo.org/webdav/geotools/xml-apis/xml-apis-xerces/2.7.1/xml-apis-xerces-2.7.1.pom’: bad module name: expected=’xml-apis-xerces’ found=’xml-apis’; bad revision: expected=’2.7.1′ found=’xerces-2.7.1′;

Sure enough, when I go to http://download.osgeo.org/webdav/geotools/xml-apis/xml-apis-xerces/2.7.1/xml-apis-xerces-2.7.1.pom I can see that even though the URL suggests the artifact would be (group: xml-apis, artifact: xml-apis-xerces, version: 2.7.1), it is in fact listed as (group: xml-apis, artifact: xml-apis, version: xerces-2.7.1).  Apparently Maven doesn’t verify those details when fetching dependencies, but Ivy (by default) does.  According to this JIRA issue I found there is an option to disable it, but SBT doesn’t seem to expose it (I fiddled around a bit with an ivysettings.xml file and the ivyXML setting but to no avail.)

Finally I just added an entry to my libraryDependencies setting like this:

“xml-apis” % “xml-apis-xerces” % “2.7.1″ from “http://download.osgeo.org/webdav/xml-apis/xml-apis-xerces/2.7.1/xml-apis-xerces-2.7.1.jar

This doesn’t avoid the warning but does let me go ahead with my build.  Good enough I guess.

But recently I’ve been hearing some complaints from folks checking out my code (that’s right, potential contributors!) who’ve been confused by the error.  Since first encountering this issue I’ve been informed that xerces is actually not even needed, the API it implements is included in modern JVMs or something, so I tried just adding a dependency exclusion to my SBT build file.  Success! Not only was I able to run the ‘update’ command with no scary errors, but the test suite even runs.  Ticket closed.

Except, as it turns out, Ivy needs that metadata to handle the exclusion (or something, I didn’t dig in too deep.)  I already had it in a local cache, thanks to having run the build with that explicit URL, but when benqua from Github attempted a build from scratch he ran into the same issues again.  ARRGH.

Finally I ended up sticking with the exclusion, but now the GeoScript build includes a dummy subproject that has the xml-apis dependency with the explicit URL.  The intermodule dependencies are set up so that it always runs before the main “library” subproject, meaning that the xml-api-xerces project, with correct metadata, is in the local cache before any dependency resolution involving GeoTools begins.  Not incredibly elegant, but it does work.

I’ll be in touch with the GeoTools guys to see about just fixing the metadata in the repository – since Maven appears to ignore it I don’t see this breaking any existing builds.

FOSS4G Code Sprint

Posted in Uncategorized by dwins on September 16, 2011

I’m at FOSS4G this week.  I’ll be at the code sprint at the end of the week and thought it might be a good idea to write up some thoughts about what I’d be interested in working on there.  I’m breaking the list down by project.

GeoScript

GeoScript seems to be getting some nice attention at FOSS4G this year.  It adapts the GeoTools geospatial library to be more convenient to use from scripting languages like Python and JavaScript.  I am the main developer of the Scala variant of GeoScript, so I’m happy to see all the excited tweets and other buzz about GeoScript.  I didn’t make it to the talks about it (Spatial Capabilities for Scripting Languages and Scripting GeoServer with GeoScript) due to another presentation I was making, but the tutorial on Thursday had a very nice turnout.  Justin Deoliveira and Tim Schaub put a lot of work into a written tutorial – but only for the Python and JavaScript variants of GeoScript.  So one (really useful) thing that could happen at the code sprint is “porting” that documentation to Scala, and maybe reviewing the GeoScript Scala documentation overall.

I’ve also been eyeing One-Jar as a means of producing an executable Jar distribution of the Scala interpreter with GeoScript and  GeoTools preloaded, as a really simple means of trying out GeoScript (no installer needed) for those who don’t want or need to learn SBT.

gsconfig.py

gsconfig.py is a Python library for controlling GeoServer remotely.  It’s already in heavy use in the GeoNode project but can be useful for smaller tasks as well – for example, here’s a script using it to copy a layergroup from one GeoServer to another.  Again, documentation would be useful – one thing I feel is really missing from the documentation is some info on hacking it :)  There are a lot of corners in the GeoServer REST API which are currently not supported in gsconfig, and it would be nice if it was more accessible for those sorts of improvements to be made by developers other than me.  Fortunately, Python is a nice language and gsconfig does a pretty simple task (effectively just slicing, dicing, and duct-taping XML documents together) and the whole thing is less than 1500 lines of code, so an overview for developers should be pretty attainable.

I also received a brisk reminder during my tutorial on GeoServer Scripting with Python that there is no documentation online about how to actually install gsconfig for use in Python scripts :)

mapnik2geotools

mapnik2geotools is a tool (in Scala, of course ;)) to translate map styles between Mapnik XML and OGC’s SLD standard (complete with support for GeoServer extensions and using GeoServer’s REST API to set up the styles in a live GeoServer instance.) It’s a rough project at this point, but I think an interesting one.  It also doesn’t use or need too many exotic features of the Scala programming language, so if you are interested in getting started with Scala, hacking on mapnik2geotools might be a good introduction to the language.  While we have used it a bit at OpenGeo for bootstrapping our OpenStreetMap styling efforts, I haven’t had a lot of time to refine it, and there is a lot of room for improvement.  I am not even going to list ideas here :) but instead refer readers to the Github issues list.

GeoServer CSS

The GeoServer CSS module is another style converter I work on.  There are some interesting problems left to be solved, and some polish to be added to the GeoServer UI related to it.  It would be interesting this weekend to think about styling raster data with CSS.  Automatic conversion of existing SLD documents into CSS would also be an interesting and useful feature.  I haven’t considered the deeper GeoServer integration too much, but the interactive styling page could use some cleanup (and maybe generalization to work with SLD styles too).  It would also be interesting to allow inline CSS in WMS requests, and possibly have a REST API for getting SLD equivalents to CSS, for use with OpenLayers.

 

GeoServer CSS – Conversion

Posted in Cool Stuff,Development,Open Source Software by dwins on July 25, 2010

When I first posted about GeoServer CSS, I was planning to follow up more frequently, but over the past couple of weekends I’ve been distracted by implementing some new features (integration with the Scala variant of GeoScriptstyled marks) as well as getting  away from the keyboard a bit.  This past week, however, I’ve been working on some speed improvements and I was thinking it would be nice to blog about those. Unfortunately, I haven’t explained what the conversion process is doing in the first place, so any discussion of performance would be a bit premature.  I suppose that means I should start at the beginning.

Last time I blogged about this, I said a bit about the parser that takes in a CSS file and breaks it up into objects that the Scala code inside GeoServer CSS can manipulate more easily.  The conversion process I’m talking about today disassembles and reassembles those rules into style components from the GeoTools API, and then you have a style that you can pass to GeoTools’s style serializer or straight to the renderer or whatever you like. Or, more graphically (ASCII art, woo):

 [CSS File] -- parser --> [CSS Objects] -- translator --> [SLD Objects]

The parser isn’t incredibly fast, but the real bottleneck is the translator.  Part of the point of the CSS converter in the first place is that CSS uses a very different model than SLD for combining rules: SLD uses the so-called Painter’s model, where each rule is applied in its entirety to any feature that matches its filter, and if multiple style rules apply to the same feature then they are drawn on top of each other.  CSS on the other hand, allows rules to override properties from other, less specific rules. The way I came up with to deal with this impedance mismatch was to inspect the rules, and produce one SLD rule for each possible combination of CSS rules.  For example, starting with this CSS source:

[a > 1] {
   fill: blue;
} 
[b < 2] {
   stroke: yellow;
}

We produce 3 SLD rules – one for the case where [a > 1 AND NOT b <2], one for [a > 1 AND B < 2], and one for [NOT A > 1 AND B < 2].  The fourth combination, with both rules negated, is a valid combination, but we throw it out since there are no styling attributes for that combination.  In general, there are (2^n – 1) potential SLD rules for a stylesheet containing n CSS rules.  That’s a lot; for only 10 input rules the output stylesheet could contain over 1000 rules.  I say “could” because there are some optimizations that can help to avoid enumerating all these possibilities… but as I said before, I’ll leave that for another post. After combining all these style rules, we still need to re-encode them as SLD.  Even this step is not straightforward as the CSS module unifies certain “contextual” filters with the SLD Filter concept, and also uses z-indexing instead of SLD’s “stack of featuretypes” model for controlling rendering order.

So, when a CSS rule combination is applicable to multiple featuretypes, or is applicable in multiple zoom ranges, or has symbolizers at multiple z-indexes, the equivalent SLD style will have multiple Rules corresponding to that single combination of CSS Rules.  So, we have to inspect the selectors to figure out the right set of SLD rules to produce.  In pseudocode:

featuretypes = extract_featuretypes(rule.selectors)
symbolizers_per_z_index = split_by_zindex(rule.symbolizers)
scale_ranges = extract_scale_ranges(rule.selectors)
for (ft in featuretypes)
    for (z, symbolizers in symbolizers_per_z_index)
        ftstyle = create_featuretypestyle(ft)
        for (range in scale_ranges)
             ftstyle += create_sld_rule(range, rule.selectors, symbolizers)
    output += ftstyle

Then, on top of that, we need to keep the total number of ftstyles produced as low as possible since each one requires an extra buffer at render time.  The code in Translator.css2sld does something similar, but shuffling all the rule combinations into one SLD style as it goes. So, clearly a lot going on here.  Sometime soon I’ll talk a bit about some of the things I’ve done to keep the conversion time under control (and for power users, how to make your styles friendly to the converter).

Parsing CSS in Scala – Parser Combinators

Posted in Open Source Software by dwins on June 19, 2010

My previous post was the first in a series talking about the code behind the GeoServer CSS module. This is my second post in the series, where I’ll introduce how I used Scala to parse CSS syntax.

For reading CSS in the GeoServer CSS styling module, I used a custom CSS parser that I wrote custom for the project.  I checked out some existing CSS parsing libraries like W3C’s flute or the CSS parser in Batik, but in the end I decided against them to allow for some wacky modifications to the spec, like embedded CQL expressions and extracting metadata from comments.  Plus, I wanted to check out the parser combinator library in Scala, which turned out to be pretty cool.

The basic idea behind parser combinators is exactly what it sounds like – simple parsers can be combined using various operators to produce more complex parsers.  Each parser is a full-fledged object extending the scala.util.parsing.combinator.Parsers.Parser abstract class. Parsers are responsible for pulling a single token off the front of the input stream and returning a ParseResult that contains information about whether the attempted parse was successful, as well as the object produced and the new position of the input stream.  For example, a parser that reads a single specific character (for example, a hyphen delimiting fields) could be implemented like so:

object ParsingExamples extends scala.util.parsing.combinator.Parsers {
 object Delimiter extends Parser[String] {
    def apply(reader: Input): ParseResult[String] = {
      if (reader atEnd) {
        Failure("end of string", reader)
      } else if (reader.first != '-') {
        Failure("no separator found", reader)
      } else {
        Success(reader.first, reader.rest)
      }
    }
  }
}

You can use this to actually extract strings from input like so:

import scala.util.parsing.input.CharSequenceReader
ParsingExample.Delimiter(new CharSequenceReader("-"))

which should produce something like:

scala> ParsingExample.Delimiter(new CharSequenceReader("123"))
res0: ParsingExample.ParseResult[String] = [1.2] parsed: -
scala> res0.get
res1: String = -

Kind of boring, I know. It gets better. Now if we want to parse some input that is composed of number strings (like a phone number, for example) we can also define a DigitSequence parser:

  val DigitSequence = Parser { reader =>
    if (reader atEnd) {
      Failure("end of string", reader)
    } else if (!reader.first.isDigit) {
      Failure("no digits found", reader)
    } else {
      var digits = reader.first.toString
      var rest = reader.rest
      while (!rest.atEnd && rest.first.isDigit) {
        digits += rest.first
        rest = rest.rest
      }
      Success(digits, rest)
    }
  }

For this one I saved a bit of typing using the Parser convenience method, which takes a function and wraps it as a full fledged Parser instance. Not only does this avoid an extra block for an object definition, but this also allows Scala’s type inference to figure out the type of reader automatically. Cool!

Now that we have our basic parsers figured out, we can combine them to produce more complex ones very tersely:

val PhoneNumber = DigitSequence ~ Delimiter ~ DigitSequence

If we take a look at the parse result for this, however, it is a bit messier than for the simple parsers used thus far:

scala> ParsingExample.PhoneNumber(new CharSequenceReader("123-456"))  
res6: ParsingExample.ParseResult[ParsingExample.~[ParsingExample.~[String,ParsingExample.Elem],String]] = [1.8] parsed: ((123~-)~456)

This is where the Parser.map method comes in. This allows you to apply some transformation to a parser’s output, usually to convert it to a more usable form. For example, you might define a case class to contain the different segments of a phone number and modify the PhoneNumber parser to output an instance of the case class:

  case class RolodexCard(pre: String, post: String)

  val PhoneNumber = DigitSequence ~ Delimiter ~ DigitSequence map {
    case pre ~ _ ~ post => RolodexCard(pre, post)
  }

Now the output looks a little nicer:

scala> ParsingExample.PhoneNumber(new CharSequenceReader("123-456"))
res0: ParsingExample.ParseResult[ParsingExample.RolodexCard] = [1.8] parsed: RolodexCard(123,456)

In addition to the generic parser combinators (which can actually help to parse any sequential stream of tokens, not just character data), there is a RegexParsers class in the standard library that provides some extra tailoring to text handling. This class offers such niceties as skipping over whitespace and implicit conversions for using string literals and regular expressions to parser instances. Using this, the ExampleParser demo becomes a lot shorter:

object RegexParsingExample extends RegexParsers {
  case class RolodexCard(pre: String, post: String)

  val Delimiter = "-"
  val DigitSequence = """\d+""".r
  val PhoneNumber = DigitSequence ~ Delimiter ~ DigitSequence map {
    case pre ~ _ ~ post => RolodexCard(pre, post)
  }
}

It’s almost like a Backus-Naur definition of the grammar. Neat!

Sequences are just the beginning, the combinator library also provides combinators for alternation, delimited sequences, optional tokens, and more. You can check out the whole arsenal in the Scaladocs: http://www.scala-lang.org/docu/files/api/scala/util/parsing/combinator/Parsers.html .

By using RegexParsers for most of the CSS parsing, I’m able to fit the whole parser into a pretty tight body of code. On the other hand, the ability to sidestep the RegexParser handling of whitespace with a custom Parser implementation makes it easy to have comments parsed immediately before a rule without having to deal with them explicitly everywhere. I can also write unit tests for individual productions if I want to, which has proved helpful in troubleshooting.

The sample code from this blog post is available on gist.github.com.

CSS in GeoServer – How’s it Work?

Posted in Open Source Software by dwins on June 19, 2010

I make maps with computers. Or rather, I make tools for making maps with computers, tools such as GeoServer and GeoExt. I also write code recreationally from time to time. A while ago I wrote a GeoServer blog post about one side project of mine, a tool to translate from CSS-like syntax to SLD styles (CSS is the Cascading Style Sheet language used for styling things like the blog you’re looking at right now; SLD is the Styled Layer Descriptor language used in GeoServer and other mapping servers to style maps). While the post on the GeoServer blog was intended to help spread user awareness of this tool, I’d like to write a few posts on this blog about the implementation. Who knows, maybe I can get some collaborators?

The CSS module started almost a year ago when I was checking out the Scala programming language.  I was interested in Scala because it’s Java-compatible, a key property for working with GeoServer and GeoTools.  Scala provides excellent Java interoperability, including natural use of overloaded methods which are a rough spot in the other JVM languages I’ve looked at due to their lack of typed variables (for example, Jython, Rhino, and JRuby).  Scala also provides lots of nice language features like function values, function literals, mixins (aka traits in Scala terminology), and pattern matching to name a few.

The best thing to do when learning a new programming language is to write some code in it, and after finishing HelloWorld.scala and a markov text generator, I started to think about a “real” project to work on.  I had recently come across Cascadenik, so a CSS map styling tool for GeoServer seemed like a good idea (and for a bonus, a text parsing API is part of the Scala standard library!)

I’ll devote the next few posts on this blog to discussing the actual code in a bit more depth, but here’s the general strategy I took.  Rather than reproducing all the work that’s gone into SLD-based rendering (including non-image maps like KML and SVG) in GeoTools, I decided to work on translating CSS files to SLD, using the normal GeoServer configuration and rendering system to actually generate maps.  I also decided to try and emulate the cascading model that CSS uses during the translation process, so that styling rules could share properties in a more flexible way than in the painter’s model imposed by SLD.  After parsing a CSS file and jumbling its rules around to adapt to the differing models, I produced a GeoTools Style object and could either pass the Style to the rendering system, or use GeoTools serialization methods to produce an XML document.

Part 1 – Parsing CSS in Scala

A Code Smell – the Utilities Trait

Posted in Development by dwins on May 7, 2010

warning: if you’re not a programmer you should probably stop reading now (use the time you’d have spent here at MS Paint Adventures; you’ll be glad you did.)

Back in school I saw a design pattern used in some students’ projects (including my own) where someone would want to have some constants used in several places in their code.  In Python, you would handle this with a simple variable in your module:

SNOWBALLS_CHANCE=6.8e-35
RABBITS_FOOT_INFLUENCE=7.5

A C++ programmer would probably use the preprocessor to define some constants that get replaced with literals before the compiler even sees the code:

#define SNOWBALLS_CHANCE 6.8e-35
#define RABBITS_FOOT_INFLUENCE 7.5

These projects were in Java, however, and Java doesn’t have free-standing values, or even a preprocessor like in C++ does; every single thing has to be part of some class.  Values that aren’t actually associated with class instances require the static modifier to stand apart from objects.  Code referencing such constants needs to qualify them with the name of the class they belong to, unless they subclass it:

class LuckConstants {
    private LuckConstants() { 
        /* don't even THINK of instantiating this, guys */ 
    }
    public static final float SNOWBALLS_CHANCE = 6.8e-35;
    public static final float RABBITS_FOOT_INFLUENCE = 7.5;

}
class PokerPlayer {
    public static void main(String[] args) {
        System.out.println("Odds:" + LuckConstants.SNOWBALLS_CHANCE);
    }
}
class PokerDealer extends LuckConstants {
    public static void main(String[] args) {
        System.out.println("Odds (adjusted):" + RABBITS_FOOT_INFLUENCE);
    }
}

This is kind of a lot of code, and if I want to have PokerDealer inherit from some GameMaster parent class I am going to have to use the long form since Java doesn’t do multiple inheritance.  Fortunately, there is also the option of inheriting constants from an interface, which doesn’t add any restrictions to the other parent types of a class.  The IConstants interface was the most common variation I saw, and I’m not aware of any particular shortcomings with this approach.  Still, it bugged me a bit that this setup involved this kind of useless type.  What use is an interface with no methods?  So I was really glad to find out about import static, which lets Java code reference static members of classes without qualifying them and without adding extra cruft to the type hierarchy.  (I started doing Java programming just before this feature came out, and I wasn’t able to use it for a little while as Apple dragged their feet a bit bringing it to the JVM that they provide for Macs.)

Okay, fast-forward 5 or 6 years.

Imagine my chagrine when, a couple of months into my first serious Scala project, I noticed that I had written some Utility Traits that did much the same things as those constant container interfaces, but providing some methods instead of some constants.  In Java such a thing isn’t possible since interfaces can’t contain implementation, but Scala’s traits can.  (Otherwise they are analogous to interfaces, and even reduce to Java interfaces when they don’t include implementation.)  Scala definitely has a nice equivalent to static class members that would be totally applicable here, so why didn’t I use that? Lame.

So when you find a trait in Scala code that doesn’t have any abstract methods, and maintains no state, it’s definitely time to consider refactoring it to an object instead.  (If it does have state, but no abstract methods, maybe it should be a class instead of a trait.)  You can easily convert any client code to the new way.  Let’s say you have a Utilities trait for doing some housework:

trait Utilities {
    def unclogDrain() {}
    def cleanGutters() {}
    def retileRoof() {}

}

class HiredHand extends Utilities {
    def doWhatYourePayedFor() {
        unclogDrain()
        cleanGutters()
        retileRoof()
    }
}

You could make it a Utilities object instead with:

object Utilities { ... }
class HiredHand {
    import Utilities._  
    // now everything from Utilities is in local scope, woo!
    ...
}

Neat!  Now to go fix up that code.

Smelter – A tool for JavaScript framework development

Posted in Ideas,Open Source Software by dwins on February 28, 2010
Tags: , ,

For much of the JavaScript work we do at OpenGeo, we use jsbuild (from jstools) to concatenate and minify JavaScript sources in the fairly sizable JavaScript libraries we use.  A useful feature that jstools provides is inspecting dependency annotations in the JavaScript to make sure all the scripts are concatenated in the correct order.  This is nice and automatic and everything, but minified JavaScript is not much fun to look at in Firebug.  So, we end up writing “loader” scripts to allow us to use the “raw” sources directly while testing.  The loader scripts are pretty simple; they just add some <script> tags while the page is loading to reference the non-minified sources directly.  A problem with this approach, however, is that the loader script then must enumerate, in order, the raw sources.  Much of the benefit of jsbuild’s dependency analysis is lost if we must also manually maintain a one-dimensional listing of the scripts! For example, look at GeoExt’s loader script in comparison with its build configuration.

In order to address this, I’ve been working on a JS build tool of my own, called Smelter.  Smelter reads jstools build files and dependency information and concatenates and minifies similarly, but it also provides an embedded web server that creates loader scripts based on that same dependencie information.  It also allows switching between minified and loader scripts without modifying the HTML pages that reference them.  Aside from saving developers a bit of duplicated work, this also ensures that the non-minified scripts load in the same order they are included in the minified build, for fewer surprises when switching over to the minified build.

I’ll definitely be dog-fooding this for the JavaScript used in GeoNode, and I’d love feedback from anyone else using it.

Testing is complicated.

Posted in Uncategorized by dwins on December 16, 2009
Tags: , ,

A coworker of mine is getting more into the code side of making software work.  I was talking to him recently about increasing the test coverage on the project he’s coming onto and he asked me, that eternal question:

So, what do I test?

The answer, of course, is “everything.”

Less glibly, software testing is a pretty deep topic.  There are a lot of kinds of testing you can do, some very time consuming, others very quick, and with a similar range in effectiveness in finding bugs.  Of course, there are other reasons to do tests (maybe you don’t care about all bugs and just want to be able to make some performance guarantees, for example.)  But most of the time when people talk about software testing they want to get an idea of how buggy the software is; how close its behavior is to the expected/desired.  In this blog post, I’ll talk about a few broad categories of testing.  But first, a quick testing glossary:

  • verification: Examining a product to make sure that it does what it was designed to do (what do the blueprints say?)
  • validation: Examining a product to make sure that it does what it is supposed to do (what did the customer want?)
  • reliability: How likely is it that a product will function when it is needed?
  • assertion: Generally, automated tests perform some number of operations with the system, and then inspect some property of the system afterward.  That property is called an assertion, and this term is used pretty universally across testing frameworks.  It is common to talk about a test or an assertion being “strong” or “weak” based on how specific the requirements are.  (x>2 is usually a weaker assertion than x == 6).
  • mocking: The practice of creating an object that *acts* like some component of the system, but without being a full implementation.  This is useful in testing because you can guarantee the return values from methods on a mock object will be appropriate for your test.  It’s also nice to not actually make that bank withdrawal when testing your bill-paying application.  You can mock up objects (ie, instances of a class in some programming language) but it is also possible to mock up an external service, or a user account, or whatever environment is needed for testing.

Smoke Testing

Smoke tests just exist to verify that, upon running your software, the magic smoke does not escape from your computer.  They generally make very weak assertions about the system, like, for example, that a homepage in a web application has a title.  But, even in this example, a passing test shows a lot of good things about the system: the templates are configured properly, the system successfully binds to a port, etc.

Unit Testing

Unit tests intend to test a single component of a system.  While the precise definition of a unit is up to individual teams, it is pretty common to discuss unit tests as ones that operate on a single class (in Java code, for example.)  Ideally, in a unit test, all objects except for the one under test should be mocked.

Integration Testing

Integration testing verifies that components in a system interact correctly.  This is the type of testing that you are doing when you set up that server and just check that the homepage comes up.  Again, the line between unit and integration testing is kind of fuzzy.

Black Box Testing

Black box testing uses only the parts of a system that are intended for use by external components.  For example, if you were black-box testing a Queue class, then you would not make any assertions about the internal storage, only based on the standard Queue operations.  You also wouldn’t test or modify any helper classes used in the implementation of the Queue, and code inspection would not be an option.  Black box testing encompasses techniques like:

  • fuzz testing, using randomly generated garbage data as input
  • equivalence classes, inspecting a component’s specification to identify ranges of input that should be functionally equivalent (thereby allowing the tester to avoid redundant effort
  • user testing, simply handing software to a user and asking him to identify places where it works unexpectedly or incorrectly

White Box Testing

White box testing assumes access to the code and internals of a system.  It allows techniques that modify and inspect the code, things like:

  • static analysis, where a tool like lint or findbugs inspects code to heuristically detect errors
  • code coverage analysis, where a tool like cobertura instruments the system during a run to identify unused sections of code.
  • mutation testing, randomly modifying code to gather statistics about how much has to break before automated tests start to fail
  • code review (by real live programmers!)

I guess I still haven’t answered the question; what should we test?  It’s going to vary a lot between projects.  A lot of shell scripts I write don’t get tested at all (well, except for the one time I run them), but I’d like to know that the guys working on the software for the next plane I’m on have a pretty solid testing system.  And there are a lot of points in the space between those extrema. In general, I would say probably getting basic tests in place to begin with is good, and of course as a project goes on problem points can be identified.  Automated tests are good for ensuring old behavior doesn’t change unintentionally, but there is a balance to strike between setting up that safety line and tying the whole project down.  Because (especially in web programming) things do actually change from time to time.

You Don’t Need Java to use the JVM

Posted in Open Source Software by dwins on November 30, 2009
Tags: , , , , ,

Not too long ago I was looking for a decent web application framework for an application we’re working on at OpenGeo. It’s based on GeoServer and maybe some other Java servers, but the team expressed some concerns about being able to quickly turn around and maintain Java code. So I checked out some alternative JVM languages during my search. Here are my thoughts on the ones I looked at, Jython, Rhino, Groovy, and Scala.

Jython

Jython is an implementation of Python that runs on the JVM. It lets you call Java constructors and methods, as well as extend Java classes and interfaces, and it even maps bean properties to keyword arguments on constructors.  However, the interoperability is not entirely seamless in either direction.  When using overloaded Java methods from Python code, the method is selected on the runtime type of the arguments (nulls are especially troublesome here.)  The recommended way to work around this seems to be just to perform the appropriate conversions manually (not particularly problematic.)  In the other direction, things are a bit more of a hassle: since the Python code is not compiled, classes defined in Python do not exist until the script has been run (and disappear once the JVM exits.)  So, there’s a fair bit of boilerplate required to get a hold of an instance of a Python class, and integration with frameworks that rely on reflection (such as Spring) will require even more (a wrapper class to create the Python object and then delegate to its methods).  Additionally, the base types (String, Integer, etc.) used in Python scripts are not the standard Java classes, so occasionally there is an impedance mismatch due to that.  (The interpreter automatically wraps and unwraps the objects so most of the time it is not a problem.)

Rhino

Rhino is an implementation of JavaScript that runs on the JVM. It also allows you to call Java constructors and methods and extend Java classes and interfaces.  There is also some nice sugar:

  • bean properties are mapped to simple properties in JavaScript code (obj.foo = “new value”; instead of obj.setFoo(“new value”); )
  • The method overloading problem is handled by providing access through longer keys which include the type signature, as well as the short name.
  • If a method expects an interface argument and the interface only declares one method, you can pass in a function object instead of explicitly implementing the interface. (var button = new javax.swing.JButton(); button.addClickListener(function(){…}); )
  • Rhino implements the E4X extension for embedded XML literals, so manipulating XML documents is pretty painless.

Rhino has also been around for the longest of any of the languages I looked at and is a fairly robust implementation.  There is a compiler so you can generate real, reflection-friendly classes with it, and, although the JavaScript standard library leaves a bit to be desired, there are projects like Narwhal and JSAN to improve that situation.

Groovy

Groovy is a language designed specifically for use on the JVM, with an eye to interoperability with existing Java libraries, but also providing a more dynamic language.  It provides facilities like optional typing and the “elvis operator” (which works like || in other scripting languages for providing default values in expressions that would otherwise return null):

javascript: var foo = baz.mightBeNull || "defaultFoo";
groovy:     def foo = baz.mightBeNull ?: "defaultFoo";

On the other hand, valid Java code is a mostly valid Groovy syntax, so transitioning an existing codebase is (supposedly) easy.  There are some gotcha’s though.  For example, if you use type checking, it is only implemented at runtime, so you can have the headache of a compile phase followed by a type error.  Again, compiled code is fully accessible to the Java runtime, including reflection and direct instantiation.

Scala

Scala is another language designed with JVM interoperability in mind, although it is much more of a deviation from Java than Groovy.  We’re not using it for the project that inspired this research, but I’ve been working with it on a little side project for some time now.  It is fully interoperable in the sense that Scala classes are Java classes too, but it resorts to some clever compile-time tricks to implement some of its niftier features, like operator overloading and traits (similar to Java’s interfaces, but with the ability to provide default implementations for methods.)  However, it is fully type-checked, using type inference to avoid type declarations all over the code.  It also has a number of features that don’t translate well to Java code:

  • Extractors, used for pattern matching; implemented as functions which return instances of a class from the Scala library.  These are usable from Java, but without Scala’s syntactic sugar for using them, they are much less useful.
  • Classes may have companion objects (which serve as the holder for any “global” methods; the stuff that you would declare static in Java.)  The way that the Scala compiler implements these means that static methods show up in Java code as part of a second class with a $ at the end of its name (ie, Foo would have a Foo$ class with all the static helpers.)
  • Functions as values.  In scala code, calling a function stored in a variable is just like calling a function normally, and there is specialized syntax for function variable types: “var foo: (Int, Int) => Int = (a, b) => a + b”, “foo(1, 2) == 3″.  In Java, such functions show up as instances of scala.Function2<scala.Int, scala.Int, scala.Int> and must be called through their apply() method.

The upshot is that Scala works quite well for working with existing Java code.  But if you are designing a library for use by general Java developers and not just Scala developers, then writing in Scala means that you need to avoid certain language features to avoid making the API cumbersome.

There are, of course, plenty of other languages available for use on the JVM.  JRuby, Clojure, and Jaskell are a few I’ve heard of, but not looked into yet.  However, of the languages I looked at, Rhino and Jython seem better suited as user-facing scripting interfaces, while Scala is a nice alternative to Java for core implementation.  If you are designing a library for consumption by Java developers though, Java itself gives you the best control over the Java classes and method signatures.

For the record, we ended up choosing Django on CPython for the current round of development, with plans to investigate moving to Jython if and when consolidating the code onto the JVM becomes more necessary.

Meme in Scala

Posted in Uncategorized by dwins on November 18, 2009

Sebastian over at Digifesto recently alerted me to a budding meme being pushed by Eric Florenzano.  The original proposal (from here) goes like this:

Rules:

  1. Implement a program that takes in a user’s name and their age, and prints hello to them once for every year that they have been alive.
  2. Post these rules, the source code for your solution, and the following list (with you included) on your blog.
  3. Bonus points if you implement it in a language not yet seen on the following list!

The List:

  1. [Python] http://www.eflorenzano.com/blog/post/trying-start-programming-meme
  2. [Bash] http://aartemenko.com/texts/bash-meme/
  3. [C] http://dakrauth.com/media/site/text/hello.c
  4. [Java] http://adoleo.com/blog/2008/nov/25/programming-meme/
  5. [Python 3] http://mikewatkins.ca/2008/11/25/hello-meme/
  6. [Ruby] http://stroky.l.googlepages.com/gem
  7. [Ruby] http://im.camronflanders.com/archive/meme/
  8. [Lisp] http://justinlilly.com/blog/2008/nov/25/back-on-the-horse/
  9. [Lua] http://aartemenko.com/texts/lua-hello-meme/
  10. [Functional Python] http://aartemenko.com/texts/python-functional-hello-meme/
  11. [Erlang] http://surfacedepth.blogspot.com/2008/11/erics-programming-meme-in-erlang.html
  12. [Haskell] http://jasonwalsh.us/meme.html
  13. [PHP] http://fitzgeraldsteele.wordpress.com/2008/11/25/memeing-in-php-2/
  14. [Javascript] http://www.taylanpince.com/blog/posts/responding-to-a-programming-meme/
  15. [Single-File Django] http://www.pocketuniverse.ca/archive/2008/november/27/florenzano-factor/

For my entry, I put together a script in Scala.

import scala.Console._

var name = readLine("What is your name? ")
var age  = readLine("How many years old are you? ").toInt
(1 to age).foreach { age => println("%2d) Hello, %s".format(age, name)) }

To run, copy the code into a text file named hello.scala and then run: scala hello.scala You can get the scala interpreter from the Scala website, or your distribution’s package manager.

Next Page »

Follow

Get every new post delivered to your Inbox.