PhishTank is operated by OpenDNS, a free service that makes your Internet safer, faster, and smarter. Get started today!

October, 2006

53.com is a real bank

posted by John Roberts on October 31st, 2006 in PhishTank, Voting, Verifying phishes

Submission 19715 continues to await final judgment from the community. The phish URL is:

http://www.53.com/wps/portal/contenttype/secure/confirm_context.id

The screenshot shows Fifth Third Bank.

The technical details give the strongest evidence. Admittedly, the technical details tab did not exist when this was submitted on October 17, 2006.

Registrant:
Fifth Third Bank
38 Fountain Square Plaza
Cincinnati, OH 45263-0001
US

There are 250+ votes so far, with 60% saying “Is NOT a phish.”

Hint: This bank exists, and this site is real. If you have not voted, please vote Is NOT a phish.

The lesson is that number-only domain names do not inspire trust, but don’t dismiss them out of hand.

Simple developer method for checking individual URLs

posted by miked on October 30th, 2006 in PhishTank, API, Developers

This post was updated November 15, 2006 with the POST method to work around a limit of the original method.

When launching PhishTank, one goal was to release reliable verified phishing data to the community free of charge in an open and easily accessible format. Over the past weeks, I have had the privilege of working with many committed developers and integrators to whom we owe a great deal of gratitude for supporting this effort and helping to make PhishTank an amazing success.

Building on the API we have exposed and the downloadable data file we publish, these developers have implemented protection at layers from the mail server to the web browser (coming soon!).

However, there is still work to be done. Today we are releasing a simplified interface for checking URLs against the PhishTank database. This new interface could be used for anything from mitigating new threats on mobile platforms to easing development of check-only plugins for browsers and mail clients.

Usage is simple and straightforward, in either of two ways: POST or Base 64 encoded.

1. POST

This method is preferred, as POST eliminates the limit on URL length imposed by the original Base 64 encoded method.

  1. Start with the URL you would like to check.
    http://www.evil.com/
  2. Base 64 encode the URL string.
    http://www.evil.com/ becomes aHR0cDovL3d3dy5ldmlsLmNvbS8=
  3. Send a POST to http://checkurl.phishtank.com/checkurl/ with the Base 64 encoded string as the url parameter

The response will be in XML, in an identical format to that returned by the API check.url action.

2. Base 64 encoded

Originally, this was the only method. However, some URLs may end up too long when Base 64 encoded and included in the URL. So, while this method is still supported and live, consider it deprecated: use the first method if you’re starting from scratch.

  1. Start with the URL you would like to check.
    http://www.evil.com/
  2. Base 64 encode the URL string.
    http://www.evil.com/ becomes aHR0cDovL3d3dy5ldmlsLmNvbS8=
  3. Make the Base 64 string URL safe (aka, URL encode it to remove illegal characters).
    aHR0cDovL3d3dy5ldmlsLmNvbS8= becomes aHR0cDovL3d3dy5ldmlsLmNvbS8%3D
  4. Access http://checkurl.phishtank.com/checkurl/<string>
    http://checkurl.phishtank.com/checkurl/aHR0cDovL3d3dy5ldmlsLmNvbS8%3D

The response will be in XML, in an identical format to that returned by the API check.url action.

Let us know how you use it.

Technical details tab provides ASN and whois data

posted by John Roberts on October 26th, 2006 in PhishTank, ASN, Site changes, Data, whois

Screenshot of Technical Details tab

We’ve added two pieces of technical information about each phish URL on the PhishTank phish detail page: ASN and whois.

Look for the new “View Technical Details” tab underneath the voting links.

First, we provide the ASN number. ASN stands for Autonomous System Number, and it’s a way of uniquely identifying networks on the Internet. For more details, see the Wikipedia entry. RSS feeds by ASN are still on the to do list. Stay tuned.

Second, when available, we provide the whois information. Depending on the registrar, this data may or may not be useful, well-formatted (we echo it back to you pretty much as is), or available. But we’ll try to provide it for every suspected phish going forward, and I’m inquiring about better data sources. (If you are a better data source, please get in touch!)

This data is not yet available via the API, but we plan to add it eventually, starting with ASN.

Better screenshots running on PhishTank

posted by John Roberts on October 24th, 2006 in PhishTank, Voting, Site changes, Screenshots

Site screenshots are mugshots for phish URLs. So, I’m happy to say that miked has just improved the PhishTank “camera” — the software that takes screenshots.

The results? More screenshots. Faster screenshots. Better screenshots.

This is a leap forward since a good screenshot, in concert with a close examination of the phish URL, is enough to judge “phishiness” right there and then, without needing to visit a potentially shady site.

We haven’t re-taken every site’s screenshot, as it’s impossible for those that are down and may be confusing for those already judged, but all new submissions (and most of the “living” ones from the past) should now be represented.

Please continue to “flag” bad or missing screenshots — it’s been helpful in debugging. Site admins can now retake screenshots more easily, too.

14 percent of phishing scams may be successful, says IU

posted by Allison on October 19th, 2006 in PhishTank, Phishing news, Data

The phish fighters at Indiana University’s School of Informatics today released the findings of a study [PDF] that evaluates the success rate of various types of phishing scams. The study, titled “Designing Ethical Phishing Experiments: A study of (ROT13) rOnl query features,” was also covered by Network World.

The researchers simulated a phishing attack by sending out bogus e-mails claiming to be from eBay and including what looked like a link to eBay. When all was said and done, 14 percent of the e-mail recipients clicked the link, suggesting that one in six people who receive phishing e-mails visit the phishing site.

I commend IU for exposing that maybe phishing is a bigger problem than people realize.

PhishTank data added to SURBL phishing list

posted by John Roberts on October 19th, 2006 in PhishTank, Email, Data

The PhishTank data file we announced two days ago is already seeing action!

Thanks to Jeff Chan of the SURBL project, the data is now part of the SURBL phishing list, as he announced today. As Justin Mason points out, that means the data is now available for use in SpamAssassin 3.0.0 and above.

PhishTank on the (5 o’clock) news

posted by Allison on October 18th, 2006 in PhishTank, Members, Community, PhishTank in the news

PhishTank on the news

Why were the OpenDNS offices empty by 4:45 yesterday? Because we were hurrying to a neighborhood haunt to watch PhishTank on TV!

Our very own John Roberts was interviewed for a segment called “ConsumerWatch: How To Fight Back Against Phishing” on KPIX, the local CBS affiliate in San Francisco. The segment came out awesome. You can watch it here. Note that submission #19362 got its 5 seconds of fame. Bet billwake didn’t think it would end up on TV when he submitted it. Thanks to billwake for submitting and Simurgh, krellis, alanjshea, hawk82, jbrunette, polymorp, IntrepidEddie, jkrieger3, irixman, someone1234, miowpurr, bastardblaster, clubjuggle, dr1, Sierran52, lyagushka and jpohl for verifying.

Some of us (not mentioning any names) never made it back to the office, which might explain why this post is just going up now, halfway through the day. ;)

XML data file of online, valid phishes from PhishTank

posted by John Roberts on October 17th, 2006 in PhishTank, API, Data, XML

The best judgment of the PhishTank community is represented by the ever-changing list of suspected phishes that are both online and valid, meaning “verified as a phish” by the members of PhishTank. You can page through this list on the site, in reverse chronological order (by submission time). The PhishTank API is more powerful and more granular; it does not offer a way to get a bulk list.

However, the PhishTank data also is effective when distributed and available for use in local applications, whether local is your personal router, the gateway of your ISP, a corporate firewall or elsewhere. Now the data is available as a regularly updated XML data file.

Basic file details

  • Format: XML
  • Update frequency: Hourly.
    I encourage you to fetch it no more often than once an hour.
  • File size: Varies. Edited: April 11, 2007: Can be as large as 10MB, so it may not open easily in a browser. Right now, with 1125 verified, online phishes, the file is a bit over 600Kb.

File location

http://data.phishtank.com/data/online-valid/

Edited: January 25, 2007: If you need a filename for your script, the filename is index.php in that directory. Please do not use the filename; it interferes with our mirroring of the data file in multiple locations.

Considerations

Phish sites go up and down at various times. Usually, a single phish URL doesn’t stay online for very long, so it’s important to consider not only the timestamp of the data file, but the time elapsed since both the submission of the phish to PhishTank and its verification by the PhishTank community. There are exceptions, but if a phish URL is more than a week or two old, then the host where it’s living is not paying attention. Over time, PhishTank will start to provide more data about the hosts, so you can see which hosts tend to allow this kind of activity to continue.

The file has an ETag header and a Last-Modified header. Please respect these when fetching the file. We may support gzip in the future, to further reduce bandwidth for all parties.

Field definitions

To help you use the data in this file, I’ve described each of the fields below.

meta is the wrapper for information about the file itself.

generated_at is the time the file was last generated as an ISO 8601 date string. The ISO standard incorporates the timezone; PhishTank uses UTC.
Sample value: 2006-10-17T00:17:02+00:00

total_entries is the count of how many valid phish URLs are in the file at that time. This will always be a positive integer.
Sample value: 1125

entries is the overall container for all the individual phish records as a collection.

entry is the container for data about each individual phish.

url is the phish URL. The value (a URL) is presented as CDATA because phishers are not polite folks, and occasionally use non-valid characters in their URLs. Some browsers are more forgiving about the standards, and interpret (or ignore) the non-valid characters, so the URL is a phish, even though it might fail in other browsers.
Sample value: <![CDATA[http://www.firstgenericbank.account-updateinfo.com]]>
Note: This URL is an example only. The domain is owned by OpenDNS, operators of PhishTank, for demonstration purposes.

phish_id is the PhishTank ID for the phish URL. All data in PhishTank is tied to this ID. You may or may not need this piece of information, but it’s useful for us. This will always be a positive integer.
Sample value: 19845

phish_detail_url is the PhishTank detail page for the phish URL, where you can view data about the phish, including a screenshot and the community votes. More data will be added to this page over time.
Sample value: <![CDATA[http://www.phishtank.com/phish_detail.php?phish_id=19845]]>

submission is a container for submission_time currently, and may contain additional fields in the future.

submission_time is the time the phish was submitted to PhishTank, in UTC. Same timestamp format as generated_at.
Sample value: 2006-10-17T19:21:30+00:00

verification is a container for information about a phish URL’s verification, including verified and verification_time currently. This container may have additional fields in the future.

verified indicates whether or not a suspected phish has been judged by the PhishTank community. In this data file, of all online, valid phishes, the value will always be yes.
Sample value: yes

verification_time is the time the phish was judged by the PhishTank community, in UTC. In this file, it’s the time the phish was verified as a phish. Same timestamp format as generated_at. It may be interesting to compare verification_time and submission_time.
Sample value: 2006-10-17T23:06:28+00:00

status is the container for online, currently. This container may have additional fields in the future.

online notes whether a phish URL is live and responding. In this data file, of all online, valid phishes, the value will always be yes.
Sample value: yes

Attribution and usage

This data is free. It may be used in commercial products or non-commercial products, by organizations or individuals.

If you use the data, we would appreciate public attribution for the data to PhishTank, preferably with a link to the PhishTank home page. We will soon publish a page with some guidelines about how to use the PhishTank logo (if you want to) and otherwise attribute the data to PhishTank. For now, contact us if you have anything special… kind words and a link are the general goals! ;-)

We’re curious to learn how this data gets used, so please let us know, either in the comments or via the contact form.

Example XML

<?xml version="1.0" encoding="utf-8"?>
<output>
<meta>
<generated_at>2006-10-17T18:17:01+00:00</generated_at>
<total_entries>1</total_entries>
</meta>
<entries>
<entry>
<url><![CDATA[http://www.firstgenericbank.account-updateinfo.com]]></url>
<phish_id>19845</phish_id>
<phish_detail_url><![CDATA[http://www.phishtank.com/phish_detail.php?phish_id=19845]]></phish_detail_url>
<submission>
<submission_time>2006-10-17T03:00:18+00:00</submission_time>
</submission>
<verification>
<verified>yes</verified>
<verification_time>2006-10-17T13:13:37+00:00</verification_time>
</verification>
<status>
<online>yes</online>
</status>
</entry>
</entries>
</output>

PhishTank improvements, including a third choice and new API calls

posted by John Roberts on October 11th, 2006 in PhishTank, API, Statistics, Site changes

Since my long post on Friday, PhishTank has been updated in many ways.

I don’t know

Most visible change? Responding to a common request, we’ve added a third choice when voting on a suspected phish: I don’t know

Crop of phish detail, with new 'I don't know' choice and timestamp and colored voting links

Voting “I don’t know” is not encouraged, but it’s necessary at times. An “I don’t know” vote doesn’t influence the final judgment of the community in any way, and it doesn’t appear in statistics, site-wide or personal.

Most important for the very active members of the community: if you vote “I don’t know,” you will not see that suspected phish ID again from the “Next Unverified Phish” link.

New API actions

submit.url and submit.email API actions are now documented and available for use. We also cleaned up the documentation a bit more. Questions welcomed.

Lots of other changes

Flag radio buttons

Here’s a catalog of changes rolled out since Friday morning:

  • What is phishing? page now includes an annotated website example. (And the fictitious URLs are already registered by us, to defer typosquatting there.)
  • The “Something wrong with this submission?” window was updated in a few ways. In most browsers, you can now click on the words, not just the radio buttons. There are also two new choices: “Screenshot Issues” and “Invalid URL.” These “flags” are read, though they remain invisible to anyone but an admin. You only need to submit it once, I promise, even if you can’t see the result.
  • The “Is a phish” and “Is NOT a phish” vote links are now different colored buttons to limit mistakes. (see the screenshot above)
  • More granular stats. More still to come here (most accurate submitters and most accurate verifiers on tap), but now you can see submission numbers broken down, and the total number of PhishTank members. Also, the start date for stats graphs is now September 30th, right before launch.
  • More code to limit/eliminate duplicates (flag ‘em if you see ‘em), and we cleaned out some cruft that had gotten in there earlier.
  • Some limits to keep over-eager submissions (intentional or otherwise) from flooding the site.
  • Session timeout was increased, so you should be able to stay logged in longer.
  • My Account graphs revised to handle larger numbers more effectively. Some of you needed that!
  • The personal RSS feed should now have more informative titles.
  • The phish detail page now displays the current time in UTC, to make it easier to compare to the submission time.
  • If you flag a suspected phish via the “Something wrong with this submission?” link, that suspected phish should not show up via the “Next Unverified Phish” link until the flag is resolved. This is useful for power users.

Behind the scenes, we’re also adding more measures to ensure the site stays online, functional, and fast. There was a brief outage a bit after midday UTC today, October 10; we’ve changed a few things to avoid a repeat.

We’re still working on a host of other improvements. Keep the suggestions coming!

Oh, and when I write that “we” made changes, I mostly mean miked and aaron.

When the community doesn’t reach a consensus

posted by John Roberts on October 10th, 2006 in PhishTank, Community, Voting

We set up community voting at PhishTank because we think multiple insights make for a better community judgment. This is similar to “Linus’s Law,” as formulated by Eric Raymond: “Given enough eyeballs, all bugs are shallow.”

We’re not the first to re-word that concept, but here’s the PhishTank version:

Given enough eyeballs, all phishes can be identified.

In a related post, Jeff Veen wrote about bloggers and the media and ways of reacting to changing forces:

Or will [organizations] find inspiration in, say, the Digg model, harnessing countless tiny points of participation to harness the collective intelligence of their audience and feeding it back into their product?

PhishTank is certainly about collective intelligence.

But sometimes it’s not that easy. Intelligent people can disagree!

Suspected phish ID 11983 is the first really challenging submission, where the community has not reached consensus yet despite over a week of vigorous voting. As we approach midnight UTC on Tuesday, October 10, this submission has over 315 votes, and it’s nearly 50-50 as to whether this is a phish or not. (Note: The # of votes is never shown publicly to non-admins.)

To me, this is not a phish, and I voted that way. My thinking? The URL is greatstudentloanpayoff.com, and when you get there… it’s for Great Student Loan Payoff. This looks less than beneficial, and I’m not going to give my information, but there is no attempt to pretend to be something other than what it is: an attempt to legally get your Social Security Number and permission to email you marketing messages.

My take? Don’t do it. But it’s not a phish.

For the terminally undecided among you, we have some site changes now live which I’ll talk about in a separate post shortly. While you wait for those words, go ahead and vote.

Server: pt1