Dwins’s Weblog


Parsing CSS in Scala – Parser Combinators

Posted in Open Source Software by dwins on June 19, 2010

My previous post was the first in a series talking about the code behind the GeoServer CSS module. This is my second post in the series, where I’ll introduce how I used Scala to parse CSS syntax.

For reading CSS in the GeoServer CSS styling module, I used a custom CSS parser that I wrote custom for the project.  I checked out some existing CSS parsing libraries like W3C’s flute or the CSS parser in Batik, but in the end I decided against them to allow for some wacky modifications to the spec, like embedded CQL expressions and extracting metadata from comments.  Plus, I wanted to check out the parser combinator library in Scala, which turned out to be pretty cool.

The basic idea behind parser combinators is exactly what it sounds like – simple parsers can be combined using various operators to produce more complex parsers.  Each parser is a full-fledged object extending the scala.util.parsing.combinator.Parsers.Parser abstract class. Parsers are responsible for pulling a single token off the front of the input stream and returning a ParseResult that contains information about whether the attempted parse was successful, as well as the object produced and the new position of the input stream.  For example, a parser that reads a single specific character (for example, a hyphen delimiting fields) could be implemented like so:

object ParsingExamples extends scala.util.parsing.combinator.Parsers {
 object Delimiter extends Parser[String] {
    def apply(reader: Input): ParseResult[String] = {
      if (reader atEnd) {
        Failure("end of string", reader)
      } else if (reader.first != '-') {
        Failure("no separator found", reader)
      } else {
        Success(reader.first, reader.rest)
      }
    }
  }
}

You can use this to actually extract strings from input like so:

import scala.util.parsing.input.CharSequenceReader
ParsingExample.Delimiter(new CharSequenceReader("-"))

which should produce something like:

scala> ParsingExample.Delimiter(new CharSequenceReader("123"))
res0: ParsingExample.ParseResult[String] = [1.2] parsed: -
scala> res0.get
res1: String = -

Kind of boring, I know. It gets better. Now if we want to parse some input that is composed of number strings (like a phone number, for example) we can also define a DigitSequence parser:

  val DigitSequence = Parser { reader =>
    if (reader atEnd) {
      Failure("end of string", reader)
    } else if (!reader.first.isDigit) {
      Failure("no digits found", reader)
    } else {
      var digits = reader.first.toString
      var rest = reader.rest
      while (!rest.atEnd && rest.first.isDigit) {
        digits += rest.first
        rest = rest.rest
      }
      Success(digits, rest)
    }
  }

For this one I saved a bit of typing using the Parser convenience method, which takes a function and wraps it as a full fledged Parser instance. Not only does this avoid an extra block for an object definition, but this also allows Scala’s type inference to figure out the type of reader automatically. Cool!

Now that we have our basic parsers figured out, we can combine them to produce more complex ones very tersely:

val PhoneNumber = DigitSequence ~ Delimiter ~ DigitSequence

If we take a look at the parse result for this, however, it is a bit messier than for the simple parsers used thus far:

scala> ParsingExample.PhoneNumber(new CharSequenceReader("123-456"))  
res6: ParsingExample.ParseResult[ParsingExample.~[ParsingExample.~[String,ParsingExample.Elem],String]] = [1.8] parsed: ((123~-)~456)

This is where the Parser.map method comes in. This allows you to apply some transformation to a parser’s output, usually to convert it to a more usable form. For example, you might define a case class to contain the different segments of a phone number and modify the PhoneNumber parser to output an instance of the case class:

  case class RolodexCard(pre: String, post: String)

  val PhoneNumber = DigitSequence ~ Delimiter ~ DigitSequence map {
    case pre ~ _ ~ post => RolodexCard(pre, post)
  }

Now the output looks a little nicer:

scala> ParsingExample.PhoneNumber(new CharSequenceReader("123-456"))
res0: ParsingExample.ParseResult[ParsingExample.RolodexCard] = [1.8] parsed: RolodexCard(123,456)

In addition to the generic parser combinators (which can actually help to parse any sequential stream of tokens, not just character data), there is a RegexParsers class in the standard library that provides some extra tailoring to text handling. This class offers such niceties as skipping over whitespace and implicit conversions for using string literals and regular expressions to parser instances. Using this, the ExampleParser demo becomes a lot shorter:

object RegexParsingExample extends RegexParsers {
  case class RolodexCard(pre: String, post: String)

  val Delimiter = "-"
  val DigitSequence = """\d+""".r
  val PhoneNumber = DigitSequence ~ Delimiter ~ DigitSequence map {
    case pre ~ _ ~ post => RolodexCard(pre, post)
  }
}

It’s almost like a Backus-Naur definition of the grammar. Neat!

Sequences are just the beginning, the combinator library also provides combinators for alternation, delimited sequences, optional tokens, and more. You can check out the whole arsenal in the Scaladocs: http://www.scala-lang.org/docu/files/api/scala/util/parsing/combinator/Parsers.html .

By using RegexParsers for most of the CSS parsing, I’m able to fit the whole parser into a pretty tight body of code. On the other hand, the ability to sidestep the RegexParser handling of whitespace with a custom Parser implementation makes it easy to have comments parsed immediately before a rule without having to deal with them explicitly everywhere. I can also write unit tests for individual productions if I want to, which has proved helpful in troubleshooting.

The sample code from this blog post is available on gist.github.com.

About these ads

2 Responses to 'Parsing CSS in Scala – Parser Combinators'

Subscribe to comments with RSS or TrackBack to 'Parsing CSS in Scala – Parser Combinators'.


  1. [...] 1 – Parsing CSS in Scala Possibly related posts: (automatically generated)Geoserver – A GIS Server 1 [...]


  2. [...] time I blogged about this, I said a bit about the parser that takes in a CSS file and breaks it up into objects that the Scala code inside GeoServer CSS can [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: