Any decent PHP parser written in PHP? [closed] Any decent PHP parser written in PHP? [closed] php php

Any decent PHP parser written in PHP? [closed]


After no complete and stable parser was found here I decided to write one myself. Here is the result:

PHP-Parser: A PHP parser written in PHP

The project supports parsing code written for any PHP version between PHP 5.2 and PHP 8.0.

Apart from the parser itself the library provides some related components:

  • Compilation of the AST back to PHP ("pretty printing")
  • Infrastructure for traversing and changing the AST
  • Serialization to and from XML (as well as dumping in a human readable form)
  • Resolution of namespaced names (aliases etc.)

For an usage overview see the "Usage of basic components" section of the documentation.


This isn't going to be a great option for you, as it violates the pure-PHP constraint, but:

A while ago, the php-internals folks decided that they would switch to Lemon as their parsing technology. There's a branch in the PHP svn repo that contains the required changes.

They decided not to continue with this, as they found that their Lemon solution is about 10-15% slower. But, the branch is still there.

There's an older Lemon parser written as a PHP extension. You might be able to work with it. There's also this PEAR package. There's also this other lemon package (via this blog post about PGN).

Of course, even if you get it working, I'm not sure what you'd do with the data, or what the data even looks like.

Another wacky option would be peeking at Quercus, a PHP implementation in Java. They'd have to have written a parser, maybe it might be worth investigating.


The metrics tool PHP Depend contains code to generate an AST from PHP source written entirely in PHP. It does make use of PHP's own token_get_all for the tokenization however.

The source code is available on github: https://github.com/manuelpichler/pdepend/tree/master/src/main/php/PHP/Depend

The implementation of the AST for some parts like mathematical expressions was not yet complete last I checked, but according to its author that is the goal.