[ANN] celerity_parser 0.1.1

Monday, June 22nd, 2009

HTML parsing in JRuby seems to be going through a slightly odd patch. Nokogiri and Hpricot both seem to have problems. There’s one project I’m working on at the moment which needs xpath support, and by chance I happen to be using Celerity, which wraps htmlunit. If I need an HTML parser, I thought, there must be one somewhere hidden within that I can use. For extra bonus points, I wouldn’t even need to package any native code, celerity already has that covered…

And so it came to pass. celerity_parser is an almost trivially thin wrapper around HtmlUnit’s HTMLParser class that’s got just enough functionality to do what I need, which is search for elements by xpath, and extract text and XHTML structure. When I say “trivially thin”, I really mean it – there’s a grand total of 2 Ruby classes, and 5 methods you might want to use.

Here’s how it works, taken from the README:


root_node = CelerityParser.parse(html_content)
found_elements = root_node.search("//html/head/title")
found_elements.first.text # => "Html page title"

That’s pretty much it. Dependencies are on jarib-celerity and jruby itself. Enjoy, and I’m open to pull requests and suggestions if you need more than this. I’ve not done any speed tests, but it’s native Java so might be quite nippy.

Green Fields

Friday, April 24th, 2009

So… first green fields Major Project in a while. It’s a Rails app, but I’m shifting to PostgreSQL and Amazon EC2/S3 for a bunch of it, so there’s going to be a fair amount of new learnings here.

It also feels slightly odd to be jumping back into Rails again. I’ve not done any new Rails work for a little while; the majority of my consulting has been on apps frozen at 2.1, so it’ll be good to be working on the fresh code-base.

I’ve shut down Other People Work for the next couple of weeks to get this project out of the door, although I reckon that what with various travel and visiting plans, I’ve only got about 2/3rds of that time to play with.

No time to hang around here blogging, there’s work to be done!

Updating Metadata for 722 gems! AARGH!

Thursday, June 26th, 2008

Seriously. Is this really necessary?

alex@21:~/Documents/Projects/VPNGen$ gem search scp --remote

*** REMOTE GEMS ***

Updating metadata for 722 gems from http://gems.rubyforge.org/
..............................................................................................
..............................................................................................
..............................................................................................
...............................ERROR:  Interrupted

VirtualBox, KVM, Windows and Linux

Thursday, June 12th, 2008

Today I’ve been mostly bringing a new subcontractor up to speed on a project I’ve been working on for a while. It’s quite a fun project that I’ll probably post about at some later date (think _why’s mousehole on steroids), but what I’ve spent most of my time on is wrangling virtual machine images.

For me, kvm is the first (free) virtual machine host system that makes everything Just Work. I always had problems with UML and Xen, qemu was too slow, and I always shied away from VMware. Not entirely sure why, but there you go.

Now, back to this contractor: he’s on Windows, I’m on Ubuntu Gutsy. I know that my code won’t work on Windows, because I’m daemonising and fork()ing all over the place. VMs to the rescue! It looks to me like the best combination is kvm on my side, and VirtualBox on his. There’s a very simple conversion that lets me convert my qcow image to a vmdk that VirtualBox can read:

qemu-img convert etch-rodents-i386.img -O vmdk etch-rodents-i386.vmdk

I’ve not seen this documented anywhere; most google hits mention the obsolete vditool, which I couldn’t get to run on my 64-bit host anyway.

Added bonus: the vmdk image is slightly smaller than the qcow. Win.

Entries (RSS)