<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blackkettle &#187; jruby</title>
	<atom:link href="http://blog.blackkettle.org/tags/jruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.blackkettle.org</link>
	<description>Things of Occasional Interest</description>
	<lastBuildDate>Wed, 10 Feb 2010 16:03:09 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>[ANN] celerity_parser 0.1.1</title>
		<link>http://blog.blackkettle.org/2009/06/22/ann-celerity_parser-011/</link>
		<comments>http://blog.blackkettle.org/2009/06/22/ann-celerity_parser-011/#comments</comments>
		<pubDate>Mon, 22 Jun 2009 07:22:14 +0000</pubDate>
		<dc:creator>alex</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[jruby]]></category>
		<category><![CDATA[rubygems]]></category>

		<guid isPermaLink="false">http://blog.blackkettle.org/?p=103</guid>
		<description><![CDATA[HTML parsing in JRuby seems to be going through a slightly odd patch. Nokogiri and Hpricot both seem to have problems. There&#8217;s one project I&#8217;m working on at the moment which needs xpath support, and by chance I happen to be using Celerity, which wraps htmlunit. If I need an HTML parser, I thought, there [...]]]></description>
			<content:encoded><![CDATA[<p>HTML parsing in JRuby seems to be going through a slightly odd patch. Nokogiri and Hpricot both seem to have problems. There&#8217;s one project I&#8217;m working on at the moment which needs xpath support, and by chance I happen to be using <a href="http://celerity.rubyforge.org">Celerity</a>, which wraps <a href="http://htmlunit.sourceforge.net">htmlunit</a>. If I need an HTML parser, I thought, there must be one somewhere hidden within that I can use. For extra bonus points, I wouldn&#8217;t even need to package any native code, celerity already has that covered&#8230;</p>
<p><a href="http://github.com/regularfry/celerity_parser/tree/master">And so it came to pass.</a> celerity_parser is an almost trivially thin wrapper around HtmlUnit&#8217;s HTMLParser class that&#8217;s got <i>just</i> enough functionality to do what I need, which is search for elements by xpath, and extract text and XHTML structure. When I say &#8220;trivially thin&#8221;, I really mean it &#8211; there&#8217;s a grand total of 2 Ruby classes, and 5 methods you might want to use.</p>
<p>Here&#8217;s how it works, taken from the README:</p>
<pre><code>
root_node = CelerityParser.parse(html_content)
found_elements = root_node.search("//html/head/title")
found_elements.first.text # => "Html page title"
</code></pre>
<p>That&#8217;s pretty much it. Dependencies are on jarib-celerity and jruby itself. Enjoy, and I&#8217;m open to pull requests and suggestions if you need more than this. I&#8217;ve not done any speed tests, but it&#8217;s native Java so might be quite nippy.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.blackkettle.org/2009/06/22/ann-celerity_parser-011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
