JRuby and NekoHTML

My last post (from ages ago) dealt with creating a parser wrapped around celerity. That approach has since stopped working because of API changes, so I figured I might as well see about doing it properly by directly talking to NekoHTML, which is the parser underlying HtmlUnit. If you install celerity, you get the nekohtml jar anyway, so I figured I might as well try to make some sort of use of it.

Because I’m not so familiar with the Java XML APIs, this took me more hunting than I expected, but I’ve wrapped it up in a gem for posterity.

In other words: ew. This code is icky but (possibly) useful.


require 'nekohtml'
html = "<html><head><title>Title of Majesty</title></head></html>"
Nekohtml.parse(html).at("//TITLE").text
 # => "Title of Majesty"

Leave a Reply

Entries (RSS)