JRuby and NekoHTML
My last post (from ages ago) dealt with creating a parser wrapped around celerity. That approach has since stopped working because of API changes, so I figured I might as well see about doing it properly by directly talking to NekoHTML, which is the parser underlying HtmlUnit. If you install celerity, you get the nekohtml jar anyway, so I figured I might as well try to make some sort of use of it.
Because I’m not so familiar with the Java XML APIs, this took me more hunting than I expected, but I’ve wrapped it up in a gem for posterity.
In other words: ew. This code is icky but (possibly) useful.
require 'nekohtml'
html = "<html><head><title>Title of Majesty</title></head></html>"
Nekohtml.parse(html).at("//TITLE").text
# => "Title of Majesty"