While implementing syntax highlighting for my little project Grease goblin, I wrote/am writing a Tokenizer for the Lua language. Now I'm at a point where I ran out of test cases, so I'm looking for help finding parsing errors.
I'm looking for bugs/input that causes the tokenizer to break as well as incorrect, missing or surplus tokens, as the tokenizer must return character-perfect tokens for highlighting.
The test subdirectory contains a few tools for easier testing. Most notably, if you make the whole directory accessible by a PHP-enabled web server, index.html will continuously poll Test.php via AJAX, which makes it easy to play around with the syntax. The test script also runs the input file through Lua's loadstring (without actually running the code) and prints any errors after the output.
While performance could certainly be better (It is only partly optimized), I'm quite satisfied with it - on my system the tokenizer parses it's own code in under 8ms. Its weak points are single-character "pass-through" tokens. Those could probably be optimized.