Adventures in PHP Language Parsing

It was supposed to be simple. I wanted to find a PHP parser to help me pinpoint unused XSL resources.

After eight hours of tinkering I have discovered that:

  • phpParseTree, which seemed to be the closest fit to what I want, is unfortunately buggy and unmaintained.  The Windows DLL kept complaining about unexpected T_DOC_COMMENT tokens; the Linux version I built from source experienced segmentation faults and memory corruption.  It seems to work well enough on simple PHP examples, but I need something more robust for our codebase.
  • YAXX, the bison extension which phpParseTree uses as a base, is also unmaintained.  I tried using it with both the latest and the recommended versions of bison, but I kept getting this error:

    zend_language_parser.y: fatal error: invalid token in skeleton: @output @output_parser_name@

  • PHPLint would be more useful if it didn’t have special formatting requirements.
  • PHPUnit has nifty support for various code metrics in later versions. We need to upgrade.
  • php-ast looked promising, but it was only available through its subversion repository.  Building that from source is more of a fuss than I can be bothered with at the moment.
  • There are many other alternatives for lexing and parsing out there, each with its own quirks and limitations.  I’ve checked out ANTLR, racc, ragel, Treetop, and even the classic lex and yacc combination.  I might go back to play with Treetop when I have the time.
  • Windows is a terrible platform for working with things that need to be built from source.  Then again, I already knew that.

Next candidate: phc, billed as an open source PHP compiler.  More as soon as I manage to get it working.

Published in: on October 22, 2008 at 7:21 am  Leave a Comment  

The URI to TrackBack this entry is: https://flangganah.wordpress.com/2008/10/22/adventures-in-php-language-parsing/trackback/

RSS feed for comments on this post.

Leave a comment