<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
> <channel><title>Comments on: scrAPI and redirects without complete URL</title> <atom:link href="http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/feed/" rel="self" type="application/rss+xml" /><link>http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/</link> <description>Acquiring information, one day at a time.</description> <lastBuildDate>Thu, 26 Jan 2012 11:03:28 +0000</lastBuildDate> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <item><title>By: Rick Wargo</title><link>http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/comment-page-1/#comment-4232</link> <dc:creator>Rick Wargo</dc:creator> <pubDate>Sat, 08 Sep 2007 02:39:30 +0000</pubDate> <guid
isPermaLink="false">http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/#comment-4232</guid> <description>I&#039;m not even sure this works anymore, but here is what I wrote to scrape CB for one company:
&lt;code&gt;require &#039;unescape&#039;
class FiberlinkJob &lt; Scraper::Base
process &quot;td&gt;a.job_title&quot;, :title =&gt; :text, :url =&gt; &quot;@href&quot;
process &quot;td&quot;, :description =&gt; :text
process &quot;span.tip_11&quot;, :location =&gt; :text, :posted_on =&gt; :text,
:jobcode =&gt; &quot;&quot;, :guid =&gt; &quot;&quot;
result :title, :description, :location, :posted_on, :url, :jobcode, :guid
def collect
self.title = unescape(self.title)
self.url = unescape(self.url)
self.url = self.url.sub(/\&amp;sc.*/, &#039;&#039;)
self.location = location.sub(/.*Location: (^[&amp;]+).*/, &#039;\1&#039;)
self.posted_on = posted_on.sub(/.*Posted: ([A-Za-z]+).(\d+).*/, &#039;\1 \2&#039;)
self.guid = self.url
end
end
class Fiberlink &lt; Scraper::Base
def Fiberlink.url
&quot;http://www.careerbuilder.com/JobSeeker/Companies/CompanyJobResults.aspx?Comp_DID=C250M6WSR21P12J2CN&quot;
end
array :jobs
process &quot;table#snapshotOff1 tr&quot;, :jobs =&gt; FiberlinkJob
result :jobs
end&lt;/code&gt;
Good luck!
Rick</description> <content:encoded><![CDATA[<p>I&#8217;m not even sure this works anymore, but here is what I wrote to scrape CB for one company:</p><p><code>require 'unescape'</p><p>class FiberlinkJob < Scraper::Base<br
/> process "td>a.job_title", :title => :text, :url => "@href"<br
/> process "td", :description => :text<br
/> process "span.tip_11", :location => :text, :posted_on => :text,<br
/> :jobcode => "", :guid => ""</p><p> result :title, :description, :location, :posted_on, :url, :jobcode, :guid</p><p> def collect<br
/> self.title = unescape(self.title)<br
/> self.url = unescape(self.url)</p><p> self.url = self.url.sub(/\&#038;sc.*/, '')</p><p> self.location = location.sub(/.*Location: (^[&#038;]+).*/, '\1')<br
/> self.posted_on = posted_on.sub(/.*Posted: ([A-Za-z]+).(\d+).*/, '\1 \2')<br
/> self.guid = self.url<br
/> end<br
/> end</p><p>class Fiberlink < Scraper::Base<br
/> def Fiberlink.url<br
/> "http://www.careerbuilder.com/JobSeeker/Companies/CompanyJobResults.aspx?Comp_DID=C250M6WSR21P12J2CN"<br
/> end</p><p> array :jobs<br
/> process "table#snapshotOff1 tr", :jobs => FiberlinkJob<br
/> result :jobs<br
/> end</code></p><p>Good luck!<br
/> Rick</p> ]]></content:encoded> </item> <item><title>By: Nick Bhanji</title><link>http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/comment-page-1/#comment-4225</link> <dc:creator>Nick Bhanji</dc:creator> <pubDate>Fri, 07 Sep 2007 16:24:48 +0000</pubDate> <guid
isPermaLink="false">http://www.rickwargo.com/2006/11/03/scrapi-and-redirects-without-complete-url/#comment-4225</guid> <description>I am trying to scrap search from careerbuilder.com,  How did you managed to take care of odd/even rows and company name in the row.  I am new to this.  I am trying to use gathered information to assist students looking for job.
thanks in advance
nick.bh</description> <content:encoded><![CDATA[<p>I am trying to scrap search from careerbuilder.com,  How did you managed to take care of odd/even rows and company name in the row.  I am new to this.  I am trying to use gathered information to assist students looking for job.</p><p>thanks in advance</p><p>nick.bh</p> ]]></content:encoded> </item> </channel> </rss>
