Friday, March 4, 2011

Writing a compiler for a DSL in python

I am writing a game in python and have decided to create a DSL for the map data files. I know I could write my own parser with regex, but I am wondering if there are existing python tools which can do this more easily, like re2c which is used in the PHP engine.

Some extra info:

  • Yes, I do need a DSL, and even if I didn't I still want the experience of building and using one in a project.
  • The DSL contains only data (declarative?), it doesn't get "executed". Most lines look like:

    SOMETHING: !abc @123 #xyz/123

    I just need to read the tree of data.

From stackoverflow
  • There are plenty of Python parsing tools: http://nedbatchelder.com/text/python-parsers.html

  • Yes, there are many -- too many -- parsing tools, but none in the standard library.

    From what what I saw PLY and SPARK are popular. PLY is like yacc, but you do everything in Python because you write your grammar in docstrings.

    Personally, I like the concept of parser combinators (taken from functional programming), and I quite like pyparsing: you write your grammar and actions directly in python and it is easy to start with. I ended up producing my own tree node types with actions though, instead of using their default ParserElement type.

    Otherwise, you can also use existing declarative language like YAML.

  • I've always been impressed by pyparsing. The author, Paul McGuire, is active on the python list/comp.lang.python and has always been very helpful with any queries concerning it.

    Torsten Marek : I would have suggested it if you hadn't done it already! PyParsing is awesome.
  • Here's an approach that works really well.

    abc= ONETHING( ... )
    xyz= ANOTHERTHING( ... )
    pqr= SOMETHING( this=abc, that=123, more=(xyz,123) )
    

    Declarative. Easy-to-parse.

    And...

    It's actually Python. A few class declarations and the work is done. The DSL is actually class declarations.

    What's important is that a DSL merely creates objects. When you define a DSL, first you have to start with an object model. Later, you put some syntax around that object model. You don't start with syntax, you start with the model.

    too much php : I know what you're saying, but writing all those comments, parenthesis, equals, prefixes is obfuscating the actual data. Also, this method doesn't port well to more verbose languages like PHP or Java.
    S.Lott : @Peter. Disagree. You can use positional args and eliminate the labels and ='s. It translates perfectly to Java. Already used it in production applications to define a declarative DSL.
    Mendelt : I've seen what you're suggesting referred to as an internal DSL. I like this method. One problem might be that while the method does port to other languages (i've seen stuff like this implemented in C#) the precise syntax of your DSL will probably change a bit. The map files won't be portable.
    S.Lott : @Mendelt: Didn't see portability as a requirement. You're right, but it doesn't seem to apply in this case.
  • I have written something like this in work to read in SNMP notification definitions and automatically generate Java classes and SNMP MIB files from this. Using this little DSL, I could write 20 lines of my specification and it would generate roughly 80 lines of Java code and a 100 line MIB file.

    To implement this, I actually just used straight Python string handling (split(), slicing etc) to parse the file. I find Pythons string capabilities to be adequate for most of my (simple) parsing needs.

    Besides the libraries mentioned by others, if I were writing something more complex and needed proper parsing capabilities, I would probably use ANTLR, which supports Python (and other languages).

  • Peter,

    DSLs are a good thing, so you don't need to defend yourself :-) However, have you considered an internal DSL ? These have so many pros versus external (parsed) DSLs that they're at least worth consideration. Mixing a DSL with the power of the native language really solves lots of the problems for you, and Python is not really bad at internal DSLs, with the with statement handy.

  • For "small languages" as the one you are describing, I use a simple split, shlex (mind that the # defines a comment) or regular expressions.

    >>> line = 'SOMETHING: !abc @123 #xyz/123'
    
    >>> line.split()
    ['SOMETHING:', '!abc', '@123', '#xyz/123']
    
    >>> import shlex
    >>> list(shlex.shlex(line))
    ['SOMETHING', ':', '!', 'abc', '@', '123']
    

    The following is an example, as I do not know exactly what you are looking for.

    >>> import re
    >>> result = re.match(r'([A-Z]*): !([a-z]*) @([0-9]*) #([a-z0-9/]*)', line)
    >>> result.groups()
    ('SOMETHING', 'abc', '123', 'xyz/123')
    

0 comments:

Post a Comment