<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>epicblog &#187; Code</title> <atom:link href="http://www.rickwargo.com/category/code/feed/" rel="self" type="application/rss+xml" /><link>http://www.rickwargo.com</link> <description>Acquiring information, one day at a time.</description> <lastBuildDate>Fri, 14 Oct 2011 01:23:12 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <item><title>Updating to the Release Version of Outlook 2010 from the Beta Removes RSS Feeds</title><link>http://www.rickwargo.com/2010/05/12/updating-to-the-release-version-of-outlook-2010-from-the-beta-removes-rss-feeds/</link> <comments>http://www.rickwargo.com/2010/05/12/updating-to-the-release-version-of-outlook-2010-from-the-beta-removes-rss-feeds/#comments</comments> <pubDate>Wed, 12 May 2010 20:06:03 +0000</pubDate> <dc:creator>Rick Wargo</dc:creator> <category><![CDATA[Code]]></category> <category><![CDATA[WILT]]></category> <guid
isPermaLink="false">http://www.rickwargo.com/?p=361</guid> <description><![CDATA[Looks like there is a problem during the installation of the release version of Outlook 2010 in that the RSS feeds are not preserved (it appears it keeps the last added feed). Prior to uninstalling the beta version of Outlook 2010, make sure to export the RSS feeds as an OPML file. Then, after the [...]]]></description> <content:encoded><![CDATA[<p>Looks like there is a problem during the installation of the release version of Outlook 2010 in that the RSS feeds are not preserved (it appears it keeps the last added feed). Prior to uninstalling the beta version of Outlook 2010, make sure to export the RSS feeds as an OPML file. Then, after the installation is complete, import the OPML file to restore the RSS feeds.</p><p>To export the RSS Feeds, click on the File tab in the Ribbon and click on Open on the left pane. Select &#8220;Import&#8221; to start the Import and Export Wizard. From there, select Export RSS Feeds to an OPML file and continue as directed. Use the same process to Import the RSS Feeds from an OPML file.</p><p>If you are in the same boat as I, try restoring a previous version of the RSS Feeds file. On my Windows 7 box, it exists in:</p><p>%USERPROFILE%\AppData\Local\Microsoft\Outlook</p><p>My file ends with .sharing.xml.obi and it was about 3k.</p><p>First, close all running instances of Outlook.</p><p>Navigating to that directory, right click on the file and select Restore previous versions. Select the most recent version prior to the date of the installation.</p><p>Next, make sure to copy that file as it will be rewritten by Outlook when it is run again.</p><p>I am unable to determine how to get Outlook to recognize this file so the .obi file needs to be converted to an OPML file that can be imported into Outlook. This is simply achieved through an XSL transformation. By applying the following XSL on the .sharing.xml.obi file, an OPML file is created that can be used to import the RSS feeds into Outlook.</p><p>Prior to importing the RSS feeds, delete the folders, otherwise duplicate folders will appear under RSS Feeds.</p><pre>
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;xsl:stylesheet version=&quot;2.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;
  &lt;xsl:output method=&quot;xml&quot; version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; indent=&quot;yes&quot;/&gt;
  &lt;xsl:template match=&quot;/&quot;&gt;
    &lt;opml version=&quot;1.0&quot;&gt;
      &lt;head&gt;
        &lt;title&gt;OPML exported from Outlook&lt;/title&gt;
        &lt;dateCreated&gt;Wed, 12 May 2010 16:00:00 -0400&lt;/dateCreated&gt;
        &lt;dateModified&gt;Wed, 12 May 2010 16:00:00 -0400&lt;/dateModified&gt;
      &lt;/head&gt;
      &lt;body&gt;
        &lt;xsl:for-each select=&quot;sharing/bindings/binding&quot;&gt;
          &lt;xsl:element name=&quot;outline&quot;&gt;
            &lt;xsl:attribute name=&quot;text&quot;&gt;&lt;xsl:value-of select=&quot;local/@name&quot;/&gt;&lt;/xsl:attribute&gt;
            &lt;xsl:attribute name=&quot;type&quot;&gt;rss&lt;/xsl:attribute&gt;
            &lt;xsl:attribute name=&quot;xmlUrl&quot;&gt;&lt;xsl:value-of select=&quot;remote/@path&quot;/&gt;&lt;/xsl:attribute&gt;
          &lt;/xsl:element&gt;
        &lt;/xsl:for-each&gt;
      &lt;/body&gt;
    &lt;/opml&gt;
  &lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;
</pre>]]></content:encoded> <wfw:commentRss>http://www.rickwargo.com/2010/05/12/updating-to-the-release-version-of-outlook-2010-from-the-beta-removes-rss-feeds/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Rewriting the From: E-mail Header Using Sendmail and MIMEDefang</title><link>http://www.rickwargo.com/2010/04/21/rewriting-the-from-e-mail-header-using-sendmail-and-mimedefang/</link> <comments>http://www.rickwargo.com/2010/04/21/rewriting-the-from-e-mail-header-using-sendmail-and-mimedefang/#comments</comments> <pubDate>Wed, 21 Apr 2010 17:38:47 +0000</pubDate> <dc:creator>Rick Wargo</dc:creator> <category><![CDATA[Code]]></category> <category><![CDATA[E-mail]]></category> <category><![CDATA[WILT]]></category> <guid
isPermaLink="false">http://www.rickwargo.com/?p=318</guid> <description><![CDATA[I frequently give out email addresses using &#60;yourdomain@mydomain.com&#62;; this way I am able to track the source of spam I receive. It also easily enables me to reject future email to that account by adding a line to sendmail&#8216;s access file along with a &#8220;pleasant&#8221; 550 response message when warranted. My problem has been how [...]]]></description> <content:encoded><![CDATA[<p>I frequently give out email addresses using &lt;yourdomain@mydomain.com&gt;; this way I am able to track the source of spam I receive. It also easily enables me to reject future email to that account by adding a line to <a
href="http://www.sendmail.org/" target="_blank">sendmail</a>&#8216;s access file along with a &#8220;pleasant&#8221; 550 response message when warranted. My problem has been how to send an e-mail using that address.</p><p>The solution is quite simple &#8211; I specify the address in the Reply-To field of my e-mail client and use <a
title="MIMEDefang" href="http://www.mimedefang.org/" target="_blank">MIMEDefang</a> to add an action to change the From: Header to the Reply-To: header.</p><p>Fortunately, I already had MIMEDefang added into the mix as I use it and <a
href="http://spamassassin.apache.org/" target="_blank">SpamAssassin</a> for processing mail. I had some difficulty understanding where and how to add the logic in the mimedefang-filter perl script but finally found useful information at <a
href="http://www.mickeyhill.com/mimedefang-howto/" target="_blank">http://www.mickeyhill.com/mimedefang-howto/</a> and through searching archives of the <a
href="http://lists.roaringpenguin.com/mailman/listinfo/mimedefang" target="_blank">MIMEDefang mailing list</a>.</p><p>I don&#8217;t have any users that use Reply-To and so the determination of when to apply this is quite simple; if the mail originated on the local LAN and the Reply-To header exists and has an email address then change the From header to the Reply-To contents and delete the Reply-To header. You may need to alter the contents of the if logic.</p><p>This is accomplished with the following code, placed near the end of sub filter_end:</p><pre>
    # Rewrite From: header with Reply-To: if it exists
    if ($RelayAddr =~ "^192\.168\.1\." &#038;&#038; $entity->head->get('Reply-To') =~ /@/) {
        action_change_header('From', $entity->head->get('Reply-To'));
        action_delete_header('Reply-To');
    }
</pre>]]></content:encoded> <wfw:commentRss>http://www.rickwargo.com/2010/04/21/rewriting-the-from-e-mail-header-using-sendmail-and-mimedefang/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Automatically Scraping Jobs from LinkedIn Using Ruby</title><link>http://www.rickwargo.com/2010/04/16/automatically-scraping-jobs-from-linkedin-using-ruby/</link> <comments>http://www.rickwargo.com/2010/04/16/automatically-scraping-jobs-from-linkedin-using-ruby/#comments</comments> <pubDate>Fri, 16 Apr 2010 18:08:03 +0000</pubDate> <dc:creator>Rick Wargo</dc:creator> <category><![CDATA[Code]]></category> <category><![CDATA[WILT]]></category> <guid
isPermaLink="false">http://www.rickwargo.com/?p=247</guid> <description><![CDATA[There are a number of job postings on LinkedIn that are of no interest to me; many times I have to go through page after page of jobs to look for the few that are appropriate. To make it less tedious, I have shortened the process by automating this task and filtering out the jobs [...]]]></description> <content:encoded><![CDATA[<p>There are a number of job postings on LinkedIn that are of no interest to me; many times I have to go through page after page of jobs to look for the few that are appropriate. To make it less tedious, I have shortened the process by automating this task and filtering out the jobs I do not want. I&#39;ve done so using Ruby and shell scripting. I run the script every night through cron and mail the results to me.</p><p>This process has a few hurdles to jump:</p><ol><li>automation of the login and paging of the search results</li><li>collecting the interesting job information</li><li>html email output</li><li>running an interactive browsing session in a cron process</li></ol><p>I have tackled these issues with the following tools:</p><ol><li>Firefox on Linux (Fedora)</li><li>Ruby &amp; Gems: Nokogiri (or Hpricot), FireWatir</li><li>sendmail</li><li>Xvfb</li></ol><p>This process is dependent on Firefox and Linux (Fedora) but can be modified without much difficulty to utilize other platforms. The scripts perform no error checking to keep the size small for the purpose of this blog entry. Read on for the code.</p><p><span
id="more-247"></span></p><h3>Firefox Automation</h3><p>I use <a
href="http://code.google.com/p/firewatir/" target="_blank" title="FireWatir">FireWatir</a> to automate Firefox on Fedora. For this process, I need to authenticate to LinkedIn, issue the search query and page through the results.</p><p>To authenticate, the login and password text fields must be set to your information and the form submitted. Unfortunately, the send key delay is long and the simulation of &quot;keying&quot; in the characters is very slow. To circumvent this, I have my login information and password saved in Firefox so when I visit the login page, I can authenticate just by submitting the form.</p><pre>
ff = Firefox.start("https://www.linkedin.com/secure/login?trk=hb_signin")
##ff.text_field(:id, "session_key-login").set "account@example.com"
##ff.text_field(:id, "session_password-login").set "MyP@55w0rD"
ff.form(:name, "login").submit
</pre><p>The first line starts Firefox with the <span
style="font-family: courier new,courier,monospace;">-jssh</span> switch and navigates to the LinkedIn login page. The following two lines key in the user name and password for the LinkedIn account used for login. I have these commented out to speed the process as I have Firefox cache the values. The last line submits the login form, authenticating the process to LinkedIn.</p><p><span
style="font-family: courier new,courier,monospace;">Firefox</span> is the FireWatir class for starting a Firefox browser. The <span
style="font-family: courier new,courier,monospace;">start</span> method obviously invokes a new instance of Firefox and navigates to the given page.</p><p>The <span
style="font-family: courier new,courier,monospace;">text_field</span> method finds the text field by the <span
style="font-family: courier new,courier,monospace;">id</span> attribute and sets the value to the string specified by the <span
style="font-family: courier new,courier,monospace;">set</span> method.</p><p>Finally the <span
style="font-family: courier new,courier,monospace;">form</span> method searches for a form <span
style="font-family: courier new,courier,monospace;">name</span>d &quot;login&quot; and <span
style="font-family: courier new,courier,monospace;">submit</span>s it.</p><p>To navigate the search results, one request per page is issued, explicity specifying the page number. This is accomplished with the following code.</p><pre>ff.goto("#{url}&#038;page_num=#{pg}")</pre><p>To close the instance of Firefox instantiated by the <span
style="font-family: courier new,courier,monospace;">start</span> method, execute:</p><pre>ff.close</pre><p>near the end of the routine.</p><p>That is all that is required to automate Firefox using FireWatir. At the time of this writing, FireWatir does not run on x86_64 Linux platforms, at least on Fedora.</p><h3>Scraping</h3><p><a
href="http://hpricot.com/" target="_blank">Hpricot</a> and <a
href="http://nokogiri.org/" target="_blank">Nokogiri</a> offer fantastic libraries for scraping content from web pages and XML documents through either CSS or XPath navigation. The methods are fairly similar so you can choose either library. Personally I have found Nokogiri to be much faster than Hpricot.</p><p>Given the html document for the page with search results, the jobs are located within the <span
style="font-family: courier new,courier,monospace;">li</span> elements where the <span
style="font-family: courier new,courier,monospace;">id</span> attributes contain &quot;vcard-&quot;. Within each list element, the job title is in the first <span
style="font-family: courier new,courier,monospace;">h2</span> element, the url for the job posting is in the anchor element and the company name, date, and location are in the paragraph with the <span
style="font-family: courier new,courier,monospace;">class</span> of &quot;company-info&quot;. Given that information, jobs can be limited easily and the remaining ones saved for a later email notification.</p><p>The following ruby code accomplishes the scraping; note cleansing of the information also occurs.</p><pre>
doc = Nokogiri::HTML(ff.html)
jobs = doc.xpath(&#039;//li[contains(@id,&quot;vcard-&quot;)]&#039;)
jobs.each do |job|
  title = job.search(&#039;h2&#039;).first.content.strip
  job_url = &quot;http://www.linkedin.com&quot; + job.search(&#039;a&#039;).first[&#039;href&#039;].to_s
  info = job.search(&#039;p[@class=&quot;company-info&quot;]&#039;).first.content.gsub(/[ \f\t\r\n]+/, &#039; &#039;).split(/ - /).map { |s| s.strip }
  company = info.shift
  date = info.pop
  location = info.join(&#039; - &#039;)
  processing_date = date if processing_date.nil?
  break if date != processing_date
  rejected += 1
  next unless companies.select { |c| company =~ c }.empty?
  next unless titles.select { |t| title =~ t }.empty?
  next unless locations.select { |l| location =~ l }.empty?
  rejected -= 1
  found += 1
  results += &quot;&lt;tr&gt;&lt;td&gt;&lt;a href=&#039;#{job_url}&#039; target=&#039;_top&#039;&gt;#{title}&lt;/a&gt;&lt;/td&gt;&lt;td&gt;#{company}&lt;/td&gt;&lt;td&gt;#{location}&lt;/td&gt;&lt;/tr&gt;\n&quot;
end
</pre><p>In keeping this as simple as possible, the job runs nightly so there is only need to process the jobs for a single day; the new date is the sentinel value.</p><p>To limit the jobs, a list of regular expression for companies, titles, and location is checked to determine if the job should be excluded.</p><p>To compose the email, the elements of a table are built as interesting jobs are found.</p><h3>E-Mail Results</h3><p>The results are easily emailed by outputing HTML encompassed by email headers. This output can be processed with sendmail to deliver the mail.</p><pre>
print &lt;&lt;-_EOT1_
MIME-Version: 1.0
Content-Type: text/html
From: LinkedIn Job Reporter &lt;account\@example.com&gt;
To: account\@example.com
Subject: Today&#039;s #{subject}
&lt;!DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.01//EN&quot;&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;#{subject} for #{processing_date}&lt;/title&gt;
&lt;style type=&quot;text/css&quot;&gt;
table { font-family: Verdana, sans-serif; font-style: normal; font-size: 10pt; }
body { font-family: Verdana, sans-serif; font-style: normal; font-size: 10pt; }
h1 { font-family: Verdana, sans-serif; font-size: 14pt; color: navy; }
.date { font-size: 100%; font-weight: bold; background: lightyellow; }
.header { font-size: 100%; font-weight: bold; background: lightblue; }
&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;h1&gt;&lt;a href=&quot;#{url}&amp;page_num=1&quot; target=&quot;_top&quot;&gt;#{subject} for #{processing_date}&lt;/a&gt;&lt;/h1&gt;
&lt;p/&gt;
&lt;table border=1 cellpadding=4 cellspacing=0&gt;
&lt;tr class=header&gt;
&lt;td&gt;Title&lt;/td&gt;
&lt;td&gt;Company&lt;/td&gt;
&lt;td&gt;Location&lt;/td&gt;
&lt;/tr&gt;
#{results}
&lt;/table&gt;
&lt;p/&gt;
Found #{found} and rejected #{rejected} out of #{found+rejected} jobs.
&lt;/body&gt;
&lt;/html&gt;
_EOT1_
</pre><p>Note there is some additional information added to the email.</p><h3>Cron Job</h3><p>The trick here, since the process will not be connected with an X windows session, is to run it with the <a
href="http://en.wikipedia.org/wiki/Xvfb" target="_blank">X virtual frame buffer</a> window server, or <span
style="font-family: courier new,courier,monospace;">Xvfb</span>. This allows Firefox to run in the background without an active logged in window session. Some of the parameters are passed on the command line so the script can be run multiple times and finally the output from the ruby script is consumed by sendmail.</p><p>For some reason, FireWatir could not start an instance of Firefox without it already running so the script invokes Firefox with the <span
style="font-family: courier new,courier,monospace;">-jssh</span> flag and terminates the process at the end when the X server is killed.</p><p>The url should be modified to correspond to the types of jobs desired.</p><p>The script in its entirety follows.</p><pre>
#!/bin/sh
export DISPLAY=:1
Xvfb $DISPLAY 2>/dev/null &#038;
xvfb_pid=$!
firefox -jssh 2>/dev/null&#038;
sleep 5
# Local jobs
subject='Local LinkedIn Job Results'
url='http://www.linkedin.com/jsearch?searchLocationType=I&#038;pplSearchOrigin=MDYS&#038;sortCriteria=DD&#038;countryCode=us&#038;postalCode=19422&#038;distance=75'
ruby ~rick/Develop/linkedin.rb "$subject" $url 2>/dev/null >/tmp/jobs$$
/usr/lib/sendmail -t < /tmp/jobs$$
# Interesting jobs worldwide
subject='Interesting Jobs Worldwide'
url='http://www.linkedin.com/jobs?runSearch=&#038;sortCriteria=1&#038;jobFunction=it&#038;experienceLevel=5&#038;trk_info=jobview_similar_fc_ex'
ruby ~rick/Develop/linkedin.rb "$subject" $url 2>/dev/null >/tmp/jobs$$
/usr/lib/sendmail -t < /tmp/jobs$$
rm -f /tmp/jobs$$
kill $xvfb_pid
</pre><p>I&#39;ve made the <a
href="http://www.rickwargo.com/wp-content/uploads/2010/04/linkedin.zip">code available for download</a>; it contains little more than what is presented here.</p><p><script type="text/javascript">SyntaxHighlighter.all()</script></p> ]]></content:encoded> <wfw:commentRss>http://www.rickwargo.com/2010/04/16/automatically-scraping-jobs-from-linkedin-using-ruby/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
