<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: XML data file of online, valid phishes from PhishTank</title>
	<atom:link href="http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/</link>
	<description>A blog about and from PhishTank, a collaborative clearinghouse for data about phishing.</description>
	<lastBuildDate>Tue, 30 Jun 2009 19:58:42 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: PhishTank Blog &#187; Blog Archive &#187; Data about phishers at the right cost (free)</title>
		<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/comment-page-1/#comment-173372</link>
		<dc:creator>PhishTank Blog &#187; Blog Archive &#187; Data about phishers at the right cost (free)</dc:creator>
		<pubDate>Tue, 30 Jun 2009 19:55:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/#comment-173372</guid>
		<description>[...] PhishTank works is because the data is freely available to all, from the free, open API to the XML data file or the lightweight [...]</description>
		<content:encoded><![CDATA[<p>[...] PhishTank works is because the data is freely available to all, from the free, open API to the XML data file or the lightweight [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: PhishTank Blog &#187; Blog Archive &#187; WOT uses PhishTank data</title>
		<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/comment-page-1/#comment-173371</link>
		<dc:creator>PhishTank Blog &#187; Blog Archive &#187; WOT uses PhishTank data</dc:creator>
		<pubDate>Tue, 30 Jun 2009 19:55:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/#comment-173371</guid>
		<description>[...] WOT uses data from lots of sources, including its users. PhishTank is now part of the mix, via the downloadable data file. [...]</description>
		<content:encoded><![CDATA[<p>[...] WOT uses data from lots of sources, including its users. PhishTank is now part of the mix, via the downloadable data file. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: PhishTank Blog &#187; Blog Archive &#187; SiteChecker brings PhishTank into Firefox</title>
		<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/comment-page-1/#comment-173369</link>
		<dc:creator>PhishTank Blog &#187; Blog Archive &#187; SiteChecker brings PhishTank into Firefox</dc:creator>
		<pubDate>Tue, 30 Jun 2009 19:54:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/#comment-173369</guid>
		<description>[...] developer, MASA has built several extensions. With SiteChecker, MASA used the PhishTank data file (details) to bring PhishTank&#8217;s judgments right into the [...]</description>
		<content:encoded><![CDATA[<p>[...] developer, MASA has built several extensions. With SiteChecker, MASA used the PhishTank data file (details) to bring PhishTank&#8217;s judgments right into the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: PhishTank Blog &#187; Blog Archive &#187; PhishTank data added to SURBL phishing list</title>
		<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/comment-page-1/#comment-173368</link>
		<dc:creator>PhishTank Blog &#187; Blog Archive &#187; PhishTank data added to SURBL phishing list</dc:creator>
		<pubDate>Tue, 30 Jun 2009 19:53:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/#comment-173368</guid>
		<description>[...] PhishTank data file we announced two days ago is already seeing [...]</description>
		<content:encoded><![CDATA[<p>[...] PhishTank data file we announced two days ago is already seeing [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sebastian nielsen</title>
		<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/comment-page-1/#comment-150223</link>
		<dc:creator>sebastian nielsen</dc:creator>
		<pubDate>Sat, 25 Oct 2008 23:13:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/#comment-150223</guid>
		<description>I did a nice perl parser for Phishtank XML data. It parses out all url data, keeping IPs intact, and listing all second level domains that are in phishtank database. The whole thing are then put into the &quot;domains&quot; category of dansguardian&#039;s filter category &quot;phishing&quot;.

It also reads in the &quot;domains&quot; at start, so if some phish site gets temporary offline and gets removed from online-valid.xml, it will still be listed in the dansguardian file. (This to prevent a phish site from temporarly dropping all its connections just to get removed from phishtank, and then reopening the phish site)

With second level domains, I mean that a phising url like:
http://adsl-75-11-237-21.dsl.rcsntx.sbcglobal.net/.irs/stimulus.refund/0,,id=181665,00.html

is converted to:

sbcglobal.net

The reason of why im doing that, is that a phisher can set up like millions of sites aaaaaaa.host.com to zzzzzzz.host.com. He only needs to point *.host.com to his IP in his DNS, and then setting up a *.host.com virtualhost, and then sending a random one to each recipient. This makes the listing at phishtank ineffective, if I don&#039;t block out the whole second level block. The only part the phisher dosen&#039;t have control over, is the second level domain (that he has to purchase), and the TLD.

And here comes the script: http://pastebin.com/f2f4f0f27
You can then use wget to pull down online-valid.xml from phishtank, and then run the script that I have posted. Then you need to restart DG (/etc/rc.d/dansguardian restart) to reload the blacklist. Three lines of code (wget fetching, running perl script and then restarting DG) can be done from cron.hourly or cron.daily</description>
		<content:encoded><![CDATA[<p>I did a nice perl parser for Phishtank XML data. It parses out all url data, keeping IPs intact, and listing all second level domains that are in phishtank database. The whole thing are then put into the &#8220;domains&#8221; category of dansguardian&#8217;s filter category &#8220;phishing&#8221;.</p>
<p>It also reads in the &#8220;domains&#8221; at start, so if some phish site gets temporary offline and gets removed from online-valid.xml, it will still be listed in the dansguardian file. (This to prevent a phish site from temporarly dropping all its connections just to get removed from phishtank, and then reopening the phish site)</p>
<p>With second level domains, I mean that a phising url like:<br />
<a href="http://adsl-75-11-237-21.dsl.rcsntx.sbcglobal.net/.irs/stimulus.refund/0,,id=181665,00.html" rel="nofollow">http://adsl-75-11-237-21.dsl.rcsntx.sbcglobal.net/.irs/stimulus.refund/0,,id=181665,00.html</a></p>
<p>is converted to:</p>
<p>sbcglobal.net</p>
<p>The reason of why im doing that, is that a phisher can set up like millions of sites aaaaaaa.host.com to zzzzzzz.host.com. He only needs to point *.host.com to his IP in his DNS, and then setting up a *.host.com virtualhost, and then sending a random one to each recipient. This makes the listing at phishtank ineffective, if I don&#8217;t block out the whole second level block. The only part the phisher dosen&#8217;t have control over, is the second level domain (that he has to purchase), and the TLD.</p>
<p>And here comes the script: <a href="http://pastebin.com/f2f4f0f27" rel="nofollow">http://pastebin.com/f2f4f0f27</a><br />
You can then use wget to pull down online-valid.xml from phishtank, and then run the script that I have posted. Then you need to restart DG (/etc/rc.d/dansguardian restart) to reload the blacklist. Three lines of code (wget fetching, running perl script and then restarting DG) can be done from cron.hourly or cron.daily</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sebastian nielsen</title>
		<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/comment-page-1/#comment-150161</link>
		<dc:creator>sebastian nielsen</dc:creator>
		<pubDate>Sat, 25 Oct 2008 16:10:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/#comment-150161</guid>
		<description>What is the purpose of the phish_detail_url, when you only need to take &quot;http://www.phishtank.com/phish_detail.php?phish_id=&quot; and put  on the end?
The phish_detail_url entry adds about 100 unneccesary bytes to each entry. In the current file with about 5752 entires, that makes the file about 562 kb bigger = 0,5 MB bigger.

I guess its a safeguard if Phishtank decides to change the format of the Phish detail url, but then it could be made that the current phish detail url format is specified in the metadata, only once per XML file, like:
 and then the application developer only needs to replace %1 with the phish ID that the application developer wants to link/send the user to.

I think the &quot;verified&quot; and the &quot;online&quot; tag can be removed too, since they are constant.</description>
		<content:encoded><![CDATA[<p>What is the purpose of the phish_detail_url, when you only need to take &#8220;http://www.phishtank.com/phish_detail.php?phish_id=&#8221; and put  on the end?<br />
The phish_detail_url entry adds about 100 unneccesary bytes to each entry. In the current file with about 5752 entires, that makes the file about 562 kb bigger = 0,5 MB bigger.</p>
<p>I guess its a safeguard if Phishtank decides to change the format of the Phish detail url, but then it could be made that the current phish detail url format is specified in the metadata, only once per XML file, like:<br />
 and then the application developer only needs to replace %1 with the phish ID that the application developer wants to link/send the user to.</p>
<p>I think the &#8220;verified&#8221; and the &#8220;online&#8221; tag can be removed too, since they are constant.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sorin</title>
		<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/comment-page-1/#comment-74473</link>
		<dc:creator>Sorin</dc:creator>
		<pubDate>Thu, 13 Mar 2008 15:53:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/#comment-74473</guid>
		<description>Hi,

There are some invalid characters in the URLs.
Have a look at phish IDs: 405396 and 360933</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>There are some invalid characters in the URLs.<br />
Have a look at phish IDs: 405396 and 360933</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: iluxan</title>
		<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/comment-page-1/#comment-72175</link>
		<dc:creator>iluxan</dc:creator>
		<pubDate>Fri, 29 Feb 2008 17:14:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/#comment-72175</guid>
		<description>I really like your feed.  One thing I do have a problem with, though, is the lack of some kind of &quot;delta&quot; or &quot;versioned&quot; retrieval mechanism.

I have no safe way to know which entries were removed from the list.  I toyed with the option of assuming that whatever is missing from the current file but was sent in a previous file is deleted, but since sometimes I may get a truncated or incomplete file (network issues, etc), I run the risk of accidentally deleting all the items that were sent previously.

Is there any way to make this more explicit - some kind of versioning mechanism like the Google Safe Browsing API uses?

Thanks.</description>
		<content:encoded><![CDATA[<p>I really like your feed.  One thing I do have a problem with, though, is the lack of some kind of &#8220;delta&#8221; or &#8220;versioned&#8221; retrieval mechanism.</p>
<p>I have no safe way to know which entries were removed from the list.  I toyed with the option of assuming that whatever is missing from the current file but was sent in a previous file is deleted, but since sometimes I may get a truncated or incomplete file (network issues, etc), I run the risk of accidentally deleting all the items that were sent previously.</p>
<p>Is there any way to make this more explicit &#8211; some kind of versioning mechanism like the Google Safe Browsing API uses?</p>
<p>Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Nagle</title>
		<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/comment-page-1/#comment-47819</link>
		<dc:creator>John Nagle</dc:creator>
		<pubDate>Sun, 14 Oct 2007 04:54:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/#comment-47819</guid>
		<description>The XML file started containing useful data again on Friday, October 12th.  Thanks.

Incidentally, it would help if the file was updated as an atomic operation.  Occasionally, we see a partially written file, if we happen to read it while it&#039;s being rewritten.  We have to read the file twice at 30 second intervals and compare, rereading until we get the same contents twice in a row.  It would be better to write a new file on each update, then move or link it to the name of the distributed file.</description>
		<content:encoded><![CDATA[<p>The XML file started containing useful data again on Friday, October 12th.  Thanks.</p>
<p>Incidentally, it would help if the file was updated as an atomic operation.  Occasionally, we see a partially written file, if we happen to read it while it&#8217;s being rewritten.  We have to read the file twice at 30 second intervals and compare, rereading until we get the same contents twice in a row.  It would be better to write a new file on each update, then move or link it to the name of the distributed file.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Nagle</title>
		<link>http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/comment-page-1/#comment-47487</link>
		<dc:creator>John Nagle</dc:creator>
		<pubDate>Fri, 12 Oct 2007 04:56:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/#comment-47487</guid>
		<description>Something has gone very wrong with the XML file of PhishTank data at &quot;http://data.phishtank.com/data/online-valid&quot;. Today, it reads:


− 
2007-10-12T04:30:01+00:00
0





That&#039;s the entire file.  Valid XML, no entries. Something is very broken.</description>
		<content:encoded><![CDATA[<p>Something has gone very wrong with the XML file of PhishTank data at &#8220;http://data.phishtank.com/data/online-valid&#8221;. Today, it reads:</p>
<p>−<br />
2007-10-12T04:30:01+00:00<br />
0</p>
<p>That&#8217;s the entire file.  Valid XML, no entries. Something is very broken.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
