<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>nils-kaiser.de</title>
	<atom:link href="http://blog.nils-kaiser.de/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.nils-kaiser.de</link>
	<description>Information Technology, Organisations and Weltanschauung ;)</description>
	<pubDate>Thu, 09 Dec 2010 10:12:05 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Time to crawl back! Download Google Groups using a crawler</title>
		<link>http://blog.nils-kaiser.de/2008/07/13/time-to-crawl-back-download-google-groups-using-a-crawler/</link>
		<comments>http://blog.nils-kaiser.de/2008/07/13/time-to-crawl-back-download-google-groups-using-a-crawler/#comments</comments>
		<pubDate>Sun, 13 Jul 2008 00:51:49 +0000</pubDate>
		<dc:creator>Nils Kaiser</dc:creator>
		
		<category><![CDATA[How-To's]]></category>

		<category><![CDATA[archive]]></category>

		<category><![CDATA[crawler]]></category>

		<category><![CDATA[download]]></category>

		<category><![CDATA[Google Groups]]></category>

		<category><![CDATA[mbox]]></category>

		<category><![CDATA[Web-Harvest]]></category>

		<guid isPermaLink="false">http://blog.nils-kaiser.de/?p=18</guid>
		<description><![CDATA[For a research project, I needed an archive of a Google Group forum. Unfortunately, I had never been a member of the group and thus had not received any group messages. I tried to find a way to download an archive using the site but couldn&#8217;t find any.
However, Google Groups forums offers an original version [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "Time to crawl back! Download Google Groups using a crawler", url: "http://blog.nils-kaiser.de/2008/07/13/time-to-crawl-back-download-google-groups-using-a-crawler/" });</script>]]></description>
			<content:encoded><![CDATA[<p>For a research project, I needed an archive of a Google Group forum. Unfortunately, I had never been a member of the group and thus had not received any group messages. I tried to find a way to download an archive using the site but couldn&#8217;t find any.</p>
<p>However, Google Groups forums offers an original version of the message (Click on &quot;More options&quot; on the right of a message, then on &quot;Show Original&quot;). There is even a link which allows to get the original message source (Click on &quot;Show only message text&quot; on the page opened before). I needed a way to collect these pages and save them to a file.</p>
<p>The program I found is called <a href="http://web-harvest.sourceforge.net/" onclick="javascript:pageTracker._trackPageview('/outbound/article/web-harvest.sourceforge.net');">Web-Harvest</a> and calls itself an &quot;<em>Open Source Web Data Extraction tool</em>&quot;. It allows to control the behavior of the crawler using XML scripts and Java expressions (it uses <a href="http://www.beanshell.org" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.beanshell.org');">BeanShell</a> internally), and to extract values (for example to follow links) within a page with XPath, XSLT, Regular Expressions, XQuery or even Java code. So I started setting up a script, and here is what came out:</p>
<h4></h4>
<h4>Instructions</h4>
<p>What you need to download Google Groups forum messages:</p>
<ol>
<li>Download, install and run <a href="http://web-harvest.sourceforge.net/" onclick="javascript:pageTracker._trackPageview('/outbound/article/web-harvest.sourceforge.net');">Web-Harvest</a> (it requires <a href="http://www.java.com" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.java.com');">Java</a>). On most platform you should be able to start it by double-clicking on the .jar file. </li>
<li>Copy &#038; paste the following script into the editor (UPDATE: The script is also available at <a href="http://pastebin.com/Xe6f4s9s" onclick="javascript:pageTracker._trackPageview('/outbound/article/pastebin.com');">http://pastebin.com/Xe6f4s9s</a>):      
<div class="geshi no xml">
<div class="head"><?xml version="1.0" encoding="UTF-8"?></div>
<ol>
<li class="li1">
<div class="de1"><span class="sc3"><span class="coMULTI">&lt;!&#8211;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; A Web-Harvest script that crawls through the pages of a Google Group forum </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; and saves the messages as an mbox file.</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; By: Nils Kaiser (blog.nils-kaiser.de)</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; This work is licensed under the Creative Commons Attribution 3.0 Unported License. </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/legalcode</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; Please mention my name and blog address as above if you reuse or distribute this work.</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS &quot;AS IS&quot; </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;POSSIBILITY OF SUCH DAMAGE. &nbsp; &nbsp;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="sc3"><span class="re1">&lt;config</span> <span class="re0">charset</span>=<span class="st0">&quot;UTF-8&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="coMULTI">&lt;!&#8211; *** EDIT: set Google Group forum to crawl (main page of a discussion</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;make sure that the params gvc=2 and hl=en are used &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;var-def</span> <span class="re0">name</span>=<span class="st0">&quot;discussionMainUrl&quot;</span> <span class="re0">overwrite</span>=<span class="st0">&quot;false&quot;</span><span class="re2">&gt;</span></span>http://groups.google.com/group/youtube-api-gdata/topics?hl=en<span class="sc1">&amp;amp;</span>gvc=2<span class="sc3"><span class="re1">&lt;/var-def<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="coMULTI">&lt;!&#8211; *** EDIT: set output file name &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;var-def</span> <span class="re0">name</span>=<span class="st0">&quot;outputFile&quot;</span> <span class="re0">overwrite</span>=<span class="st0">&quot;false&quot;</span><span class="re2">&gt;</span></span>c:/output/output.mbox<span class="sc3"><span class="re1">&lt;/var-def<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="sc3"><span class="coMULTI">&lt;!&#8211; </span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; function download-multipage-list, removed typos</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; adapted from http://web-harvest.sourceforge.net/samples.php?num=0</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; Copyright © 2006 by vnikic at users.sourceforge.net</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;&#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="sc3"><span class="re1">&lt;function</span> <span class="re0">name</span>=<span class="st0">&quot;download-multipage-list&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;return<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;while</span> <span class="re0">condition</span>=<span class="st0">&quot;${pageUrl.toString().trim().length() &gt;</span> 0}&quot; maxloops=&quot;${maxloops}&quot; index=&quot;i&quot;&gt;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;empty&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;var-def name=&quot;</span>content<span class="st0">&quot;&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;html-to-xml&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;http url=&quot;</span>$<span class="br0">&#123;</span>pageUrl<span class="br0">&#125;</span><span class="st0">&quot;/&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;/html-to-xml&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;/var-def&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;var-def name=&quot;</span>nextLinkUrl<span class="st0">&quot;&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;xpath expression=&quot;</span>$<span class="br0">&#123;</span>nextXPath<span class="br0">&#125;</span><span class="st0">&quot;&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;var name=&quot;</span>content<span class="st0">&quot;/&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;/xpath&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;/var-def&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;var-def name=&quot;</span>pageUrl<span class="st0">&quot;&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="sc3">&lt;case&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3">&lt;if condition=&quot;</span>$<span class="br0">&#123;</span>nextLinkUrl.toString<span class="br0">&#40;</span><span class="br0">&#41;</span>.length<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="re2">&gt;</span></span> 0}&quot;&gt;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;template<span class="re2">&gt;</span></span></span>${sys.fullUrl(pageUrl, nextLinkUrl.toString().trim())}<span class="sc3"><span class="re1">&lt;/template<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/if<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/case<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/var-def<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/empty<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;xpath</span> <span class="re0">expression</span>=<span class="st0">&quot;${itemXPath}&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;var</span> <span class="re0">name</span>=<span class="st0">&quot;content&quot;</span><span class="re2">/&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/xpath<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/while<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/return<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="sc3"><span class="re1">&lt;/function<span class="re2">&gt;</span></span></span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="coMULTI">&lt;!&#8211; from line included, thunderbird does not care about receiver and</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; &nbsp;date is only used to check for new mail, i.e. useless here.</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp; &nbsp; &nbsp;see http://www.qmail.org/man/man5/mbox.html &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;var-def</span> <span class="re0">name</span>=<span class="st0">&quot;fromLine&quot;</span><span class="re2">&gt;</span></span>From - Sat Jul 12 19:52:07 2008<span class="sc3"><span class="re1">&lt;/var-def<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="coMULTI">&lt;!&#8211; clear output file as we use append later &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="sc3"><span class="re1">&lt;file</span> <span class="re0">action</span>=<span class="st0">&quot;write&quot;</span> <span class="re0">type</span>=<span class="st0">&quot;text&quot;</span> <span class="re0">path</span>=<span class="st0">&quot;${outputFile}&quot;</span><span class="re2">/&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="coMULTI">&lt;!&#8211; collects all thread urls &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;var-def</span> <span class="re0">name</span>=<span class="st0">&quot;threadUrls&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;call</span> <span class="re0">name</span>=<span class="st0">&quot;download-multipage-list&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;call-param</span> <span class="re0">name</span>=<span class="st0">&quot;pageUrl&quot;</span><span class="re2">&gt;</span></span><span class="sc3"><span class="re1">&lt;var</span> <span class="re0">name</span>=<span class="st0">&quot;discussionMainUrl&quot;</span><span class="re2">/&gt;</span></span><span class="sc3"><span class="re1">&lt;/call-param<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;call-param</span> <span class="re0">name</span>=<span class="st0">&quot;nextXPath&quot;</span><span class="re2">&gt;</span></span>//div[@class=&#39;maincontbox&#39;]//a[contains(.,&#39;Older&#39;)]/@href<span class="sc3"><span class="re1">&lt;/call-param<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;call-param</span> <span class="re0">name</span>=<span class="st0">&quot;itemXPath&quot;</span><span class="re2">&gt;</span></span>//div[@class=&#39;maincontoutboxatt&#39;]//a[contains(@href,&#39;browse_thread&#39;)]/@href<span class="sc3"><span class="re1">&lt;/call-param<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;call-param</span> <span class="re0">name</span>=<span class="st0">&quot;maxloops&quot;</span><span class="re2">&gt;</span></span><span class="sc3"><span class="re1">&lt;/call-param<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/call<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/var-def<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="coMULTI">&lt;!&#8211; open each thread &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;loop</span> <span class="re0">item</span>=<span class="st0">&quot;threadUrl&quot;</span> <span class="re0">index</span>=<span class="st0">&quot;i&quot;</span> <span class="re0">filter</span>=<span class="st0">&quot;unique&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;list<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;var</span> <span class="re0">name</span>=<span class="st0">&quot;threadUrls&quot;</span><span class="re2">/&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/list<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;body<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;empty<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="coMULTI">&lt;!&#8211; absolutize thread url &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;var-def</span> <span class="re0">name</span>=<span class="st0">&quot;threadUrlFull&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;template<span class="re2">&gt;</span></span></span>${sys.fullUrl(discussionMainUrl, threadUrl.toString().trim())}<span class="sc3"><span class="re1">&lt;/template<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/var-def<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="coMULTI">&lt;!&#8211; collect all original messages urls &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;var-def</span> <span class="re0">name</span>=<span class="st0">&quot;originalMsgUrls&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;call</span> <span class="re0">name</span>=<span class="st0">&quot;download-multipage-list&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;call-param</span> <span class="re0">name</span>=<span class="st0">&quot;pageUrl&quot;</span><span class="re2">&gt;</span></span><span class="sc3"><span class="re1">&lt;var</span> <span class="re0">name</span>=<span class="st0">&quot;threadUrlFull&quot;</span><span class="re2">/&gt;</span></span><span class="sc3"><span class="re1">&lt;/call-param<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;call-param</span> <span class="re0">name</span>=<span class="st0">&quot;nextXPath&quot;</span><span class="re2">&gt;</span></span>//div[@class=&#39;maincontbox&#39;]/table[1]//nobr[@id=&#39;thread_page_links_site&#39;]/a[contains(.,&#39;Newer &gt;&#39;)]/@href<span class="sc3"><span class="re1">&lt;/call-param<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;call-param</span> <span class="re0">name</span>=<span class="st0">&quot;itemXPath&quot;</span><span class="re2">&gt;</span></span>//div[@class=&#39;exh&#39;]//a[.=&#39;Show original&#39;]/@href<span class="sc3"><span class="re1">&lt;/call-param<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;call-param</span> <span class="re0">name</span>=<span class="st0">&quot;maxloops&quot;</span><span class="re2">&gt;</span></span><span class="sc3"><span class="re1">&lt;/call-param<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/call<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/var-def<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="coMULTI">&lt;!&#8211; loop through messages in thread (original view) &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;loop</span> <span class="re0">item</span>=<span class="st0">&quot;originalMsgUrl&quot;</span> <span class="re0">index</span>=<span class="st0">&quot;j&quot;</span> <span class="re0">filter</span>=<span class="st0">&quot;unique&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;list<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;var</span> <span class="re0">name</span>=<span class="st0">&quot;originalMsgUrls&quot;</span><span class="re2">/&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/list<span class="re2">&gt;</span></span></span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;body<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;empty<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="coMULTI">&lt;!&#8211; get original message &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;var-def</span> <span class="re0">name</span>=<span class="st0">&quot;originalMsgContent&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;http</span> <span class="re0">url</span>=<span class="st0">&quot;${sys.fullUrl(discussionMainUrl, originalMsgUrl.toString().trim() + &amp;quot;&amp;amp;output=gplain&amp;quot;)}&quot;</span><span class="re2">/&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/var-def<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="coMULTI">&lt;!&#8211; need to quote lines starting with &quot;From &quot;,&quot;&gt;</span></div>
</li>
<li class="li1">
<div class="de1"></span>From &quot;, &quot;&gt;&gt;From &quot;, &quot;&gt;&gt;&gt;From &quot;&#8230;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; see http://www.qmail.org/man/man5/mbox.html</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &#8211;&gt;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;var-def</span> <span class="re0">name</span>=<span class="st0">&quot;originalMsgContentQuoted&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;regexp</span> <span class="re0">replace</span>=<span class="st0">&quot;true&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;regexp-pattern<span class="re2">&gt;</span></span></span>(?m)^([&gt;]*From )<span class="sc3"><span class="re1">&lt;/regexp-pattern<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;regexp-source<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;var</span> <span class="re0">name</span>=<span class="st0">&quot;originalMsgContent&quot;</span><span class="re2">/&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/regexp-source<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;regexp-result<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;template<span class="re2">&gt;</span></span></span><span class="sc1">&amp;gt;</span>${_1}<span class="sc3"><span class="re1">&lt;/template<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/regexp-result<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/regexp<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/var-def<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="coMULTI">&lt;!&#8211; process message content &#8211;&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;var-def</span> <span class="re0">name</span>=<span class="st0">&quot;originalMsgContentProcessed&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;template<span class="re2">&gt;</span></span></span>${&quot;&quot; + fromLine + &quot;\n&quot; + originalMsgContentQuoted + &quot;\n\n&quot;}<span class="sc3"><span class="re1">&lt;/template<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/var-def<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="coMULTI">&lt;!&#8211; append content of &lt;pre&gt;</span></div>
</li>
<li class="li1">
<div class="de1"></span> to file &#8211;&gt;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;file</span> <span class="re0">action</span>=<span class="st0">&quot;append&quot;</span> <span class="re0">type</span>=<span class="st0">&quot;binary&quot;</span> <span class="re0">path</span>=<span class="st0">&quot;${outputFile}&quot;</span><span class="re2">&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;var</span> <span class="re0">name</span>=<span class="st0">&quot;originalMsgContentProcessed&quot;</span><span class="re2">/&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/file<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/empty<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/body<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/loop<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="sc3"><span class="re1">&lt;/empty<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/body<span class="re2">&gt;</span></span></span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="sc3"><span class="re1">&lt;/loop<span class="re2">&gt;</span></span></span> &nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="sc3"><span class="re1">&lt;/config<span class="re2">&gt;</span></span></span></div>
</li>
</ol>
</div>
</li>
<li>Change the url of the Google Groups forum to download and adjust path of the output file (See Section marked as <em>&quot;*** EDIT&quot;</em> above).     <br />The url must point to the discussion page (be careful with Google Groups forums containing multiple discussion categories!). You also need to make sure that the url contains the &quot;&amp;gvc=2&quot; and p&quot;&amp;hl=en&quot;parameter as the scripts can only parse the english version of the list thread view. </li>
<li>Hit the play button in the toolbar, minimize the window and get back to whatever you were doing before&#8230; this might take a while! Also, make sure that you read the Limitations below before you start. </li>
</ol>
<p>The instructions are provided for research purposes only, see disclaimer in the script above.</p>
<h4>How it works</h4>
<p>The script starts by collecting the urls of every thread by following the <em>&quot;Older&quot; </em>link on every page. It then collects the urls of every original message, this time following the <em>&quot;Newer&quot;</em> link. By appending <em>&quot;&amp;output=gplain&quot;</em> to the url, it makes sure that the original message text page is requested.</p>
<p>The script outputs a single file in <a href="http://www.qmail.org/man/man5/mbox.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.qmail.org');">mbox file format</a>, which can be easily imported in most e-mail clients (see <a href="http://kb.wisc.edu/helpdesk/page.php?id=6436" onclick="javascript:pageTracker._trackPageview('/outbound/article/kb.wisc.edu');">this page</a> for instructions). In order to respect the mbox convention, some quoting is done and a From line is added.</p>
<h4>Limitations</h4>
<p>The approach used has the following limitations:</p>
<ol>
<li>All email addresses in the messages are obsfucated using dots (&#8230;). However, I reckon it makes sense and was not a major drawback for my purpose anyway. </li>
<li><a href="http://blog.nils-kaiser.de/wp-content/image.png"   rel="lightbox[roadtrip]"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="218" alt="image" src="http://blog.nils-kaiser.de/wp-content/image-thumb.png" width="246" align="right" border="0" /></a>Requests to the Google Groups forum site are blocked after a while when a large number of automated requests is recognized. You will see that requests take a bit longer and are about 3K size.       <br />If you open the Google Groups page in your browser, you will see the page shown on the right. <br/>How to solve this? You could either modify the script to delay the requests or you can manually crawl sessions of a certain number of threads (edit the start url - look-up how it changes if you browse to older discussion pages - and edit the <em>maxloops</em> parameters of the first call to the <em>download-multiple-list</em> helper function).</li>
</ol>
<p>&#160;</p>
<p>Hope this helps! Feel free to change the script and to notify me of any useful addition. To start changing the script, I recommend to have a look at the <a href="http://web-harvest.sourceforge.net/manual.php" onclick="javascript:pageTracker._trackPageview('/outbound/article/web-harvest.sourceforge.net');">user manual</a> and the <a href="http://web-harvest.sourceforge.net/samples.php" onclick="javascript:pageTracker._trackPageview('/outbound/article/web-harvest.sourceforge.net');">examples</a>. Also have a look at some other uses <a href="http://twit88.com/blog/2007/10/24/web-scraping-using-web-harvest/" onclick="javascript:pageTracker._trackPageview('/outbound/article/twit88.com');">here</a> and <a href="http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/" onclick="javascript:pageTracker._trackPageview('/outbound/article/twit88.com');">here</a>.     </p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=b50167ed-eeb2-44de-8ea4-0ee56b9832b5&amp;title=Time+to+crawl+back%21+Download+Google+Groups+using+a+crawler&amp;url=http%3A%2F%2Fblog.nils-kaiser.de%2F2008%2F07%2F13%2Ftime-to-crawl-back-download-google-groups-using-a-crawler%2F" onclick="javascript:pageTracker._trackPageview('/outbound/article/sharethis.com');">ShareThis</a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.nils-kaiser.de/2008/07/13/time-to-crawl-back-download-google-groups-using-a-crawler/feed/</wfw:commentRss>
		</item>
		<item>
		<title>How to get back &#8220;Send to OneNote&#8221; on Vista x64</title>
		<link>http://blog.nils-kaiser.de/2008/06/20/how-to-get-back-send-to-onenote-on-vista-x64/</link>
		<comments>http://blog.nils-kaiser.de/2008/06/20/how-to-get-back-send-to-onenote-on-vista-x64/#comments</comments>
		<pubDate>Fri, 20 Jun 2008 20:11:49 +0000</pubDate>
		<dc:creator>Nils Kaiser</dc:creator>
		
		<category><![CDATA[How-To's]]></category>

		<category><![CDATA[OneNote]]></category>

		<category><![CDATA[Printer]]></category>

		<category><![CDATA[Vista 64]]></category>

		<category><![CDATA[Workaround]]></category>

		<guid isPermaLink="false">http://blog.nils-kaiser.de/?p=5</guid>
		<description><![CDATA[I decided that my first post was going to be a useful one, so here it is:
Several users have been complaining about the missing &#8220;Send to OneNote&#8221; functionality after installing Vista 64 (see here, here and the several comments on an MSDN blog). The response from Microsoft has been rather indifferent, as they decided not to fix [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "How to get back &#8220;Send to OneNote&#8221; on Vista x64", url: "http://blog.nils-kaiser.de/2008/06/20/how-to-get-back-send-to-onenote-on-vista-x64/" });</script>]]></description>
			<content:encoded><![CDATA[<p>I decided that my first post was going to be a useful one, so here it is:</p>
<p>Several users have been complaining about the missing &#8220;Send to OneNote&#8221; functionality after installing Vista 64 (see <a title="No onenote fix" href="http://www.jkontherun.com/2008/04/no-onenote-fix.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.jkontherun.com');">here</a>, <a title="OneNote 2007 Send-To 64-Bit Support? Not Coming " href="http://www.gottabemobile.com/OneNote+2007+SendTo+64Bit+Support+Not+Coming.aspx" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.gottabemobile.com');">here</a> and the <a title="Send to OneNote 2007, why it isn't there in 64-bit (x64)" href="http://blogs.msdn.com/descapa/archive/2007/12/04/send-to-onenote-2007-why-it-isn-t-there-in-64-bit-x64.aspx" onclick="javascript:pageTracker._trackPageview('/outbound/article/blogs.msdn.com');">several comments on an MSDN blog</a>). The response from Microsoft has been rather indifferent, as they decided not to fix it before the next version of OneNote (see <a href="http://blogs.msdn.com/david_rasmussen/archive/2008/04/21/onenote-64-bit-print-driver.aspx" onclick="javascript:pageTracker._trackPageview('/outbound/article/blogs.msdn.com');">here</a>). As it was one of the featurs I used most, I was seriously thinking about switching back to Vista 32 before a found a solution&#8230;</p>
<p>The workaround presented here brings back the support for printing documents into OneNote. Here is a detailled How-To:</p>
<ol>
<li>Install Zan Image Printer. The software adds a (actually two) virtual printers that generate images of the printed documents. You can get it from the <a href="http://www.zan1011.com/download.htm" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.zan1011.com');">Zan Image Printer download page</a>. The tool has a 30-day trial and the full version is available for around 50$.</li>
<li>Configure the Image Printer. The options are found by accessing the printer property page and going to the &#8220;Print Settings&#8221; page on the &#8220;General&#8221; tab (sorry I&#8217;m on a german locale):
<p style="text-align: center;"><a href="http://blog.nils-kaiser.de/wp-content/zan0.jpg"   rel="lightbox[roadtrip]"><img class="alignnone size-medium wp-image-6 aligncenter" title="zan0" src="http://blog.nils-kaiser.de/wp-content/zan0-281x300.jpg" alt="Printer Properties" width="282" height="300" /></a></p>
</li>
<li style="text-align: left;">Go to the &#8220;Image&#8221; tab and select TIFF and modify the settings according to your preferences (or follow my settings).</li>
<p style="text-align: center;"><a href="http://blog.nils-kaiser.de/wp-content/zan1.jpg"   rel="lightbox[roadtrip]"><img class="alignnone size-medium wp-image-7" title="zan1" src="http://blog.nils-kaiser.de/wp-content/zan1-300x251.jpg" alt="Zan Image Printer Properties - Image" width="300" height="251" /></a></p>
<li style="text-align: left;">And now the magic bit! Switch to the &#8220;Settings&#8221; tab and select &#8220;Application&#8221; from the list box on the left side. Browse the OneNote executable (normally located in C:\Program Files (x86)\Microsoft Office\Office12). In the box &#8220;Parameters&#8221;, add the following text:<br /><code>/insertdoc "[%file]&#8220;</code></br>
<p style="text-align: center;"><a href="http://blog.nils-kaiser.de/wp-content/zan2.jpg"   rel="lightbox[roadtrip]"><img class="alignnone size-medium wp-image-9 aligncenter" title="zan2" src="http://blog.nils-kaiser.de/wp-content/zan2-300x251.jpg" alt="Zan Image Printer Properties - Settings" width="300" height="251" /></a></p>
</li>
<li>Try it out! Open a PDF and print it to the Zan Image Printer. The virtual printer should now generate the image, run OneNote which shows the usual dialog:</li>
<p style="text-align: center;"><a href="http://blog.nils-kaiser.de/wp-content/zan3.jpg"   rel="lightbox[roadtrip]">&lt;<img class="alignnone size-medium wp-image-10 aligncenter" title="zan3" src="http://blog.nils-kaiser.de/wp-content/zan3-300x106.jpg" alt="OneNote inserts printout" width="300" height="106" /></a></p>
</ol>
<p>BTW, once you are satisfied with your settings, you can disable the Zan Printer dialog on the &#8220;Save&#8221; tab.</p>
<p>Hope this is useful for you and comes on time to prevent you from resinstalling your whole system!</p>
<p>And hey MS, how do you want to convince users to install Vista 64 when your own software is not compatible???</p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=b50167ed-eeb2-44de-8ea4-0ee56b9832b5&amp;title=How+to+get+back+%26%238220%3BSend+to+OneNote%26%238221%3B+on+Vista+x64&amp;url=http%3A%2F%2Fblog.nils-kaiser.de%2F2008%2F06%2F20%2Fhow-to-get-back-send-to-onenote-on-vista-x64%2F" onclick="javascript:pageTracker._trackPageview('/outbound/article/sharethis.com');">ShareThis</a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.nils-kaiser.de/2008/06/20/how-to-get-back-send-to-onenote-on-vista-x64/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

