[ANN] celerity_parser 0.1.1

Monday, June 22nd, 2009

HTML parsing in JRuby seems to be going through a slightly odd patch. Nokogiri and Hpricot both seem to have problems. There’s one project I’m working on at the moment which needs xpath support, and by chance I happen to be using Celerity, which wraps htmlunit. If I need an HTML parser, I thought, there must be one somewhere hidden within that I can use. For extra bonus points, I wouldn’t even need to package any native code, celerity already has that covered…

And so it came to pass. celerity_parser is an almost trivially thin wrapper around HtmlUnit’s HTMLParser class that’s got just enough functionality to do what I need, which is search for elements by xpath, and extract text and XHTML structure. When I say “trivially thin”, I really mean it – there’s a grand total of 2 Ruby classes, and 5 methods you might want to use.

Here’s how it works, taken from the README:


root_node = CelerityParser.parse(html_content)
found_elements = root_node.search("//html/head/title")
found_elements.first.text # => "Html page title"

That’s pretty much it. Dependencies are on jarib-celerity and jruby itself. Enjoy, and I’m open to pull requests and suggestions if you need more than this. I’ve not done any speed tests, but it’s native Java so might be quite nippy.

Updating Metadata for 722 gems! AARGH!

Thursday, June 26th, 2008

Seriously. Is this really necessary?

alex@21:~/Documents/Projects/VPNGen$ gem search scp --remote

*** REMOTE GEMS ***

Updating metadata for 722 gems from http://gems.rubyforge.org/
..............................................................................................
..............................................................................................
..............................................................................................
...............................ERROR:  Interrupted
Entries (RSS)