<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Graham Miln &#187; France</title>
	<atom:link href="http://theworklife.com/graham-miln/category/france/feed/" rel="self" type="application/rss+xml" />
	<link>http://theworklife.com/graham-miln</link>
	<description>Adventures in life.</description>
	<lastBuildDate>Mon, 16 Aug 2010 06:14:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Property hunting in Lyon, Web::Scraper, and tbody tags</title>
		<link>http://theworklife.com/graham-miln/2010/03/06/property-hunting-in-lyon-web-scraper-and-tbody-tags/</link>
		<comments>http://theworklife.com/graham-miln/2010/03/06/property-hunting-in-lyon-web-scraper-and-tbody-tags/#comments</comments>
		<pubDate>Sat, 06 Mar 2010 05:21:45 +0000</pubDate>
		<dc:creator>Graham Miln</dc:creator>
				<category><![CDATA[France]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[apartments]]></category>
		<category><![CDATA[Lyon]]></category>
		<category><![CDATA[Web::Scraper]]></category>

		<guid isPermaLink="false">http://theworklife.com/graham-miln/?p=188</guid>
		<description><![CDATA[I found one stumbling block that took a while to overcome. After a little trial and error, I discovered the FireFox browser returned misleading XPaths for objects embedded in tables. <a href="http://theworklife.com/graham-miln/2010/03/06/property-hunting-in-lyon-web-scraper-and-tbody-tags/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>During the last few weekends, I have needed to brush up on my web site parsing skills. The tools available have moved on nicely since my last dip into this topic.</p>
<p>I am currently keeping an eye on properties in Lyon, France. The process has been tedious and called out for some automation. <a title="Megan Miln" href="http://theworklife.com/megan-miln">Megan</a> and I plan to return to France in the future and this little project should ease the burden of finding an apartment or house.</p>
<div id="attachment_189" class="wp-caption aligncenter" style="width: 510px"><img class="size-full wp-image-189" src="http://theworklife.com/graham-miln/wp-content/uploads/2010/03/croix-paquet-lyon.jpg" alt="Croix-Paquet, Lyon" width="500" height="375" /><p class="wp-caption-text">Croix-Paquet, Lyon</p></div>
<p>This morning I discovered the perl module <a href="http://search.cpan.org/~miyagawa/Web-Scraper/lib/Web/Scraper.pm">Web::Scraper</a>. It is a port of a Ruby based tool called <a href="http://blog.labnotes.org/tag/scrapi/">scrAPI</a>. The approach taken avoids regular expression matching and opts for XPath and DOM tree selector matching; both more resilient methods of addressing specific sections of a web page.</p>
<div id="attachment_190" class="wp-caption aligncenter" style="width: 510px"><img class="size-full wp-image-190" src="http://theworklife.com/graham-miln/wp-content/uploads/2010/03/lyon.jpg" alt="Apartments, Lyon" width="500" height="375" /><p class="wp-caption-text">Apartments, Lyon</p></div>
<p>I found one stumbling block that took a while to overcome. After a little trial and error, I discovered the FireFox browser returned misleading XPaths for objects embedded in tables.</p>
<p>The XPaths provided by FireBug and XPather, included browser-inserted <code>tbody</code> tags. These tags did not appear in my source web pages. Thus the browser&#8217;s XPath did not match the structure used by Web::Scraper, and caused Web::Scraper to miss the desired content.</p>
<p>The solution was easy; strip out the <code>tbody</code> tags and Web::Scraper returns to working as advertised.</p>
<p>With this problem overcome, the project is already looking helpful.</p>
]]></content:encoded>
			<wfw:commentRss>http://theworklife.com/graham-miln/2010/03/06/property-hunting-in-lyon-web-scraper-and-tbody-tags/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
