:Treetop => "Parser in Ruby"
I built a parser a few days ago using treetop. Treetop is an awesome ruby gem for writing parsers in ruby. The syntax is very clean and elegant, and you don't need to handle countless auto-generated files. Treetop grabs the grammar file on the fly and automagically loads the classes that you need. Pretty neat.
Once you parse text with your parser, treetop returns to you a syntactical tree. To avoid navigating through the syntactical tree, treetops allows you to add methods to your grammar. You can then call these methods in the syntactical tree. Here's an excerpt of the grammar I'm working on:
grammar RepositoryProtocol
rule message
credential / response / request / reply / challenge
{
def node_type
"message"
end
}
end
rule credential
credential_hdr public_key certificates credential_end
{
def pk
public_key.data
end
def certs
certificates.certs
end
def node_type
"credential"
end
}
end
# obviously more stuff goes down here...
end
Now lets talk about the dark side of this tool. There are two problems that I found while using it:
1- Lack of documentation: You can find pretty simplistic examples of grammars in the tools homepage and its github repository. Unfortunately if you need something more complex then a calculator, you're pretty much on yourself. If you google for a while you will find some more complex grammars written out there, but the truth is that the learning curve is a bit steep. For instance it took me quite a while to realize that the syntactical nodes only provide two methods to query a node: elements (an array of child nodes) and text_value (the text contained by the node). Even at this point I'm still not 100% sure if these are the only methods available. Also, if your rule is very simplistic (I mean it lacks or's, *'s, +'s, etc...) , then treetop will add methods with the same name as your grammar rule sons.
2- If you name a rule as something that treetop uses internally you're screwed. This sucks really. There's no message, no list of reserved words, no nothing. So if a rule doesn't work for any logical reason, try renaming it.
3- Nodes don't have a way to identify themselves. There's no node_type method to call to figure out what you're navigating in. If you really need to do that, then you'll have to manually add a node_type method to each rule.
4- No easy way to ignore white spaces. You have to provide a rule in your grammar to handle white spaces, tabs, \r's and \n's.
5- And finally, this problem really took me a while to figure it out. Treetop doesn't really construct the syntactical tree following your grammar literally. Take for example the following rules:
rule clause
implication / single_atom
{
def import(rsa_key)
internal_import(rsa_key)
end
def node_type
"clause"
end
}
end
rule single_atom
literal "."
{
def internal_import(rsa_key)
"#{literal.import(rsa_key)}."
end
def imported_clause?
literal.imported_clause?
end
def node_type
"single_atom"
end
}
end
rule implication
literal ":-" literal other_literals "."
{
def import(rsa_key)
elements.inject("") do |imported_string, e|
imported_string << if e.respond_to? "import"
e.import(rsa_key)
else
e.text_value
end
end
end
def node_type
"implication"
end
}
end
rule other_literals
("," literal)*
{
def import(rsa_key)
elements.inject("") do |imported_clause, e|
imported_clause << ",#{e.elements.last.import(rsa_key)}"
end
end
}
end
rule literal
( "says(" iden "," predicate ")" space ) / predicate
{
def import(rsa_key)
"says(#{rsa_key}, #{text_value})"
end
def node_type
"literal"
end
}
end
Following the grammar of the previous example I would expect the following to be the way in which the syntactical tree for an implication cause to be generated:
But instead I get a tree like this one:
Somehow treetops decides that clause and implication should be the same node and adds some methods of clause to the tree and others from implication. For that reason I use the strange import_internal for some paths, and on others I call import directly. So don't expect the syntactical tree to be an exact representation of your grammar, treetop might be trying some tree optimizations I guess.
Even with all it's problems I sincerely consider this tool much more intuitive than Antler. The syntax is clean and if you don't try to make fancy tricks like making a language translator you should be safe. So if you need a parser on your next ruby project make sure to give treetop a try.







