<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: scrAPI and redirects without complete URL</title>
	<atom:link href="http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/</link>
	<description>Acquiring information, one day at a time.</description>
	<lastBuildDate>Wed, 03 Mar 2010 16:08:32 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0-alpha</generator>
	<item>
		<title>By: Rick Wargo</title>
		<link>http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/comment-page-1/#comment-4232</link>
		<dc:creator>Rick Wargo</dc:creator>
		<pubDate>Sat, 08 Sep 2007 02:39:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/#comment-4232</guid>
		<description>I&#039;m not even sure this works anymore, but here is what I wrote to scrape CB for one company:

&lt;code&gt;require &#039;unescape&#039;

class FiberlinkJob &lt; Scraper::Base
  process &quot;td&gt;a.job_title&quot;, :title =&gt; :text, :url =&gt; &quot;@href&quot;
  process &quot;td&quot;, :description =&gt; :text
  process &quot;span.tip_11&quot;, :location =&gt; :text, :posted_on =&gt; :text,
    :jobcode =&gt; &quot;&quot;, :guid =&gt; &quot;&quot;
  
  result :title, :description, :location, :posted_on, :url, :jobcode, :guid
  
  def collect
    self.title = unescape(self.title)
    self.url = unescape(self.url)
    
    self.url = self.url.sub(/\&amp;sc.*/, &#039;&#039;)

    self.location = location.sub(/.*Location: (^[&amp;]+).*/, &#039;\1&#039;)
    self.posted_on = posted_on.sub(/.*Posted: ([A-Za-z]+).(\d+).*/, &#039;\1 \2&#039;)
    self.guid = self.url
  end
end

class Fiberlink &lt; Scraper::Base
  def Fiberlink.url
	&quot;http://www.careerbuilder.com/JobSeeker/Companies/CompanyJobResults.aspx?Comp_DID=C250M6WSR21P12J2CN&quot;
  end
    
  array :jobs
  process &quot;table#snapshotOff1 tr&quot;, :jobs =&gt; FiberlinkJob
  result :jobs
end&lt;/code&gt;

Good luck!
Rick</description>
		<content:encoded><![CDATA[<p>I&#8217;m not even sure this works anymore, but here is what I wrote to scrape CB for one company:</p>
<p><pre><code>require &#039;unescape&#039;

class FiberlinkJob &lt; Scraper::Base
&nbsp;&nbsp;process &quot;td&gt;a.job_title&quot;, :title =&gt; :text, :url =&gt; &quot;@href&quot;
&nbsp;&nbsp;process &quot;td&quot;, :description =&gt; :text
&nbsp;&nbsp;process &quot;span.tip_11&quot;, :location =&gt; :text, :posted_on =&gt; :text,
&nbsp;&nbsp;&nbsp;&nbsp;:jobcode =&gt; &quot;&quot;, :guid =&gt; &quot;&quot;
&nbsp;&nbsp;
&nbsp;&nbsp;result :title, :description, :location, :posted_on, :url, :jobcode, :guid
&nbsp;&nbsp;
&nbsp;&nbsp;def collect
&nbsp;&nbsp;&nbsp;&nbsp;self.title = unescape(self.title)
&nbsp;&nbsp;&nbsp;&nbsp;self.url = unescape(self.url)
&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;self.url = self.url.sub(/\&amp;sc.*/, &#039;&#039;)

&nbsp;&nbsp;&nbsp;&nbsp;self.location = location.sub(/.*Location: (^[&amp;]+).*/, &#039;\1&#039;)
&nbsp;&nbsp;&nbsp;&nbsp;self.posted_on = posted_on.sub(/.*Posted: ([A-Za-z]+).(\d+).*/, &#039;\1 \2&#039;)
&nbsp;&nbsp;&nbsp;&nbsp;self.guid = self.url
&nbsp;&nbsp;end
end

class Fiberlink &lt; Scraper::Base
&nbsp;&nbsp;def Fiberlink.url
&nbsp;&nbsp;&quot;http://www.careerbuilder.com/JobSeeker/Companies/CompanyJobResults.aspx?Comp_DID=C250M6WSR21P12J2CN&quot;
&nbsp;&nbsp;end
&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;array :jobs
&nbsp;&nbsp;process &quot;table#snapshotOff1 tr&quot;, :jobs =&gt; FiberlinkJob
&nbsp;&nbsp;result :jobs
end</code></pre></p>
<p>Good luck!<br />
Rick</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick Bhanji</title>
		<link>http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/comment-page-1/#comment-4225</link>
		<dc:creator>Nick Bhanji</dc:creator>
		<pubDate>Fri, 07 Sep 2007 16:24:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/#comment-4225</guid>
		<description>I am trying to scrap search from careerbuilder.com,  How did you managed to take care of odd/even rows and company name in the row.  I am new to this.  I am trying to use gathered information to assist students looking for job.

thanks in advance

nick.bh</description>
		<content:encoded><![CDATA[<p>I am trying to scrap search from careerbuilder.com,  How did you managed to take care of odd/even rows and company name in the row.  I am new to this.  I am trying to use gathered information to assist students looking for job.</p>
<p>thanks in advance</p>
<p>nick.bh</p>
]]></content:encoded>
	</item>
</channel>
</rss>
