PhishTank is operated by OpenDNS, a free service that makes your Internet safer, faster, and smarter. Get started today!

'API' Posts

Try the PhishTank Addin for Microsoft Outlook and Outlook Express

posted by John Roberts on December 16th, 2006 in PhishTank, API, Email, Outlook

PhishTank Addin for Microsoft Outlook, screenshot

What if you could get all the intelligence of PhishTank right in the application where you receive the suspicious emails? For Outlook and Outlook Express users, you can now!

The team at Project Honey Pot used the PhishTank API to bring PhishTank capabilities right into Microsoft Outlook and Microsoft Outlook Express, the popular Windows email clients.

The key feature is the ability to check a suspicious email against the PhishTank database right in your email client and (if necessary) report a phish. It’s seamless, giving you a one-click way to make your Internet use safer. PhishTank will benefit, too, from making submitting suspected phishes even easier.

You do need a free PhishTank user account to benefit from the Addin. For security’s sake, you will be asked (once) to authorize the use of the Addin with your PhishTank account.

Go learn more and get the software.

Kudos to Brandon and Eric, especially, and thanks to the entire team at Unspam (the company behind Project Honey Pot).

Update to simple method for checking individual URLs

posted by John Roberts on November 15th, 2006 in PhishTank, API, Developers

A couple of weeks ago, Mike introduced a simple developer method for checking individual URLs for “phishiness” outside of the API. There have been edge cases where the submitted URL was too long, going beyond the legal limit of a GET request.

So, the method has been updated, and you should read the details. The original method will be supported, but it’s being deprecated in favor of a POST-based method.

We’ve had a request for a PhishTank bookmarklet… anyone out there want to write one? We’ll promote it. I think this POST method is probably a nice, lightweight way to implement it, but I’m not a developer. ;-)

Data about phishers at the right cost (free)

posted by John Roberts on November 14th, 2006 in PhishTank, API, Community, PhishTank in the news, Data, XML

I read the SecurityProNews article “Sites Want To Hook And Gut Phishers” with interest this morning. The article’s summary:

A trio of websites offer people the opportunity to report the phish emails they receive in order to thwart the various scams and their perpetrators.

Three different sites are included in the round-up: PhishTank, CastleCops, and Symantec’s Phish Report Network.

At OpenDNS (operators of PhishTank), we’re fans of CastleCops. Their work is excellent, and their efforts in the broader anti-abuse community are notable. We shared our gratitude in July.

However, I don’t think the Phish Report Network site belongs in the same category, for two key reasons: the lack of information about submissions and the hefty price of their data.

Submitting to a black hole

Submitting phish to the Phish Report Network is dumping your submissions into a black hole. (And they didn’t even accept submissions from individuals until October 2006… wonder if PhishTank’s launch had something to do with that?)

I just took a live phish site from PhishTank and submitted it, after agreeing to a license and filling out a Captcha. Those hoops are not necessarily a bad idea to weed out spurious submissions, but here’s all I was told after the submission was received.

CONFIRMATION

Your submission has been sent Tue Nov 14 09:46:06 PST 2006. To make another submission, click here.

Sincerely,

Symantec Security Response

Couldn’t the page at least say thanks?

Outside of the lack of human touch, there is no insight into what the final judgment might be, when such judgment will be rendered, and by whom. There is literally no way to follow up.

PhishTank shows you your activity, and gives you email updates (if you want them) and an RSS feed to track your submissions. Go to your My Account page to learn how your contributions are being judged.

The price of data

The data gathered and verified by Symantec’s site is only available if you pay for it. How much? US$50,000 per year.

On behalf of OpenDNS, I inquired about a license to the data on July 12, 2006. On August 8, 2006, I got an apologetic response for the delay. On August 9, 2006, I got a copy of the contract, with its US$50,000 price tag for the year. I declined to go any further.

I have nothing against businesses charging for a service, and perhaps Symantec is finding customers who find this a valuable source of data. It’s hard to know, since they give out little information about who’s using the data and how much data there is. PhishTank statistics are wide open.

PhishTank was set up to help the Internet at large and solve a business problem for OpenDNS (the common need for better data about phishing sites). The reason PhishTank works is because the data is freely available to all, from the free, open API to the XML data file or the lightweight method.

My suggestion to Symantec? Add data from PhishTank to your Phish Report Network. It’s free. And if you’d like to share your submissions with PhishTank, we’re happy to help make it work.

Mozilla found the data worth testing with, at least.

PhishTank data’s so good, it’s the standard

posted by Allison on November 14th, 2006 in PhishTank, API, Data, Firefox, SiteChecker

Mozilla

Everyone who has ever submitted a phish to or verified a phish for PhishTank deserves a pat on the back today. Congrats to all of you for contributing to the phishing data source chosen by Mozilla to compare phishing protection in Firefox 2.0 to Internet Explorer 7.

That’s right. You read correctly. Mozilla chose PhishTank over all of the other phishing data source sources available to test the effectiveness of new phishing protection features in the two browsers.

The way the testing worked is this: Mozilla contracted third-party evaluator Smartware to track Firefox 2.0 and IE7’s respective accuracy rates in identifying phishing scams. The same scams that were originally netted and verified by you.

In the end, Firefox 2.0 found and blocked 243 phishing Web sites that IE7 failed to identify, and was deemed the better of the two at keeping you safe from phishing.

Brian Krebs of Washington Post went into greater detail about the testing, and mentioned PhishTank SiteChecker, a Firefox extension.

Though we admittedly have Firefox and Internet Explorer on the brain today, we urge everyone making a browser to use PhishTank data (API, Data File, Check URL Method).

Help a developer debug a PHP class for using the PhishTank API

posted by John Roberts on November 10th, 2006 in PhishTank, API, Developers, PHP

David Branco is working on a PHP class, which he calls PhishTank Runner. The goal of PhishTank Runner is to make working with the PhishTank API very easy in that language. We haven’t had time to take a look at the code ourselves, but we shouldn’t be the bottleneck. If you’re a PHP developer, or otherwise experienced, David is eager for feedback. His email address is in the code.

The PHP source code is here:
http://www.neoeliteusa.com/demo/phishtank.class.phps

We’re not “endorsing” this code, but I’m pleased that David is interested in helping out, and I think constructive criticism helps us all in this regard. This is a new step for us, but we want to continue to encourage developers to help us spread the PhishTank community’s work to as many places as possible. There won’t be one way, but many.

We know the PhishTank API documentation would benefit from code examples, so if there’s good stuff out there people are willing to share, please let us know.

Simple developer method for checking individual URLs

posted by miked on October 30th, 2006 in PhishTank, API, Developers

This post was updated November 15, 2006 with the POST method to work around a limit of the original method.

When launching PhishTank, one goal was to release reliable verified phishing data to the community free of charge in an open and easily accessible format. Over the past weeks, I have had the privilege of working with many committed developers and integrators to whom we owe a great deal of gratitude for supporting this effort and helping to make PhishTank an amazing success.

Building on the API we have exposed and the downloadable data file we publish, these developers have implemented protection at layers from the mail server to the web browser (coming soon!).

However, there is still work to be done. Today we are releasing a simplified interface for checking URLs against the PhishTank database. This new interface could be used for anything from mitigating new threats on mobile platforms to easing development of check-only plugins for browsers and mail clients.

Usage is simple and straightforward, in either of two ways: POST or Base 64 encoded.

1. POST

This method is preferred, as POST eliminates the limit on URL length imposed by the original Base 64 encoded method.

  1. Start with the URL you would like to check.
    http://www.evil.com/
  2. Base 64 encode the URL string.
    http://www.evil.com/ becomes aHR0cDovL3d3dy5ldmlsLmNvbS8=
  3. Send a POST to http://checkurl.phishtank.com/checkurl/ with the Base 64 encoded string as the url parameter

The response will be in XML, in an identical format to that returned by the API check.url action.

2. Base 64 encoded

Originally, this was the only method. However, some URLs may end up too long when Base 64 encoded and included in the URL. So, while this method is still supported and live, consider it deprecated: use the first method if you’re starting from scratch.

  1. Start with the URL you would like to check.
    http://www.evil.com/
  2. Base 64 encode the URL string.
    http://www.evil.com/ becomes aHR0cDovL3d3dy5ldmlsLmNvbS8=
  3. Make the Base 64 string URL safe (aka, URL encode it to remove illegal characters).
    aHR0cDovL3d3dy5ldmlsLmNvbS8= becomes aHR0cDovL3d3dy5ldmlsLmNvbS8%3D
  4. Access http://checkurl.phishtank.com/checkurl/<string>
    http://checkurl.phishtank.com/checkurl/aHR0cDovL3d3dy5ldmlsLmNvbS8%3D

The response will be in XML, in an identical format to that returned by the API check.url action.

Let us know how you use it.

XML data file of online, valid phishes from PhishTank

posted by John Roberts on October 17th, 2006 in PhishTank, API, Data, XML

The best judgment of the PhishTank community is represented by the ever-changing list of suspected phishes that are both online and valid, meaning “verified as a phish” by the members of PhishTank. You can page through this list on the site, in reverse chronological order (by submission time). The PhishTank API is more powerful and more granular; it does not offer a way to get a bulk list.

However, the PhishTank data also is effective when distributed and available for use in local applications, whether local is your personal router, the gateway of your ISP, a corporate firewall or elsewhere. Now the data is available as a regularly updated XML data file.

Basic file details

  • Format: XML
  • Update frequency: Hourly.
    I encourage you to fetch it no more often than once an hour.
  • File size: Varies. Edited: April 11, 2007: Can be as large as 10MB, so it may not open easily in a browser. Right now, with 1125 verified, online phishes, the file is a bit over 600Kb.

File location

http://data.phishtank.com/data/online-valid/

Edited: January 25, 2007: If you need a filename for your script, the filename is index.php in that directory. Please do not use the filename; it interferes with our mirroring of the data file in multiple locations.

Considerations

Phish sites go up and down at various times. Usually, a single phish URL doesn’t stay online for very long, so it’s important to consider not only the timestamp of the data file, but the time elapsed since both the submission of the phish to PhishTank and its verification by the PhishTank community. There are exceptions, but if a phish URL is more than a week or two old, then the host where it’s living is not paying attention. Over time, PhishTank will start to provide more data about the hosts, so you can see which hosts tend to allow this kind of activity to continue.

The file has an ETag header and a Last-Modified header. Please respect these when fetching the file. We may support gzip in the future, to further reduce bandwidth for all parties.

Field definitions

To help you use the data in this file, I’ve described each of the fields below.

meta is the wrapper for information about the file itself.

generated_at is the time the file was last generated as an ISO 8601 date string. The ISO standard incorporates the timezone; PhishTank uses UTC.
Sample value: 2006-10-17T00:17:02+00:00

total_entries is the count of how many valid phish URLs are in the file at that time. This will always be a positive integer.
Sample value: 1125

entries is the overall container for all the individual phish records as a collection.

entry is the container for data about each individual phish.

url is the phish URL. The value (a URL) is presented as CDATA because phishers are not polite folks, and occasionally use non-valid characters in their URLs. Some browsers are more forgiving about the standards, and interpret (or ignore) the non-valid characters, so the URL is a phish, even though it might fail in other browsers.
Sample value: <![CDATA[http://www.firstgenericbank.account-updateinfo.com]]>
Note: This URL is an example only. The domain is owned by OpenDNS, operators of PhishTank, for demonstration purposes.

phish_id is the PhishTank ID for the phish URL. All data in PhishTank is tied to this ID. You may or may not need this piece of information, but it’s useful for us. This will always be a positive integer.
Sample value: 19845

phish_detail_url is the PhishTank detail page for the phish URL, where you can view data about the phish, including a screenshot and the community votes. More data will be added to this page over time.
Sample value: <![CDATA[http://www.phishtank.com/phish_detail.php?phish_id=19845]]>

submission is a container for submission_time currently, and may contain additional fields in the future.

submission_time is the time the phish was submitted to PhishTank, in UTC. Same timestamp format as generated_at.
Sample value: 2006-10-17T19:21:30+00:00

verification is a container for information about a phish URL’s verification, including verified and verification_time currently. This container may have additional fields in the future.

verified indicates whether or not a suspected phish has been judged by the PhishTank community. In this data file, of all online, valid phishes, the value will always be yes.
Sample value: yes

verification_time is the time the phish was judged by the PhishTank community, in UTC. In this file, it’s the time the phish was verified as a phish. Same timestamp format as generated_at. It may be interesting to compare verification_time and submission_time.
Sample value: 2006-10-17T23:06:28+00:00

status is the container for online, currently. This container may have additional fields in the future.

online notes whether a phish URL is live and responding. In this data file, of all online, valid phishes, the value will always be yes.
Sample value: yes

Attribution and usage

This data is free. It may be used in commercial products or non-commercial products, by organizations or individuals.

If you use the data, we would appreciate public attribution for the data to PhishTank, preferably with a link to the PhishTank home page. We will soon publish a page with some guidelines about how to use the PhishTank logo (if you want to) and otherwise attribute the data to PhishTank. For now, contact us if you have anything special… kind words and a link are the general goals! ;-)

We’re curious to learn how this data gets used, so please let us know, either in the comments or via the contact form.

Example XML

<?xml version="1.0" encoding="utf-8"?>
<output>
<meta>
<generated_at>2006-10-17T18:17:01+00:00</generated_at>
<total_entries>1</total_entries>
</meta>
<entries>
<entry>
<url><![CDATA[http://www.firstgenericbank.account-updateinfo.com]]></url>
<phish_id>19845</phish_id>
<phish_detail_url><![CDATA[http://www.phishtank.com/phish_detail.php?phish_id=19845]]></phish_detail_url>
<submission>
<submission_time>2006-10-17T03:00:18+00:00</submission_time>
</submission>
<verification>
<verified>yes</verified>
<verification_time>2006-10-17T13:13:37+00:00</verification_time>
</verification>
<status>
<online>yes</online>
</status>
</entry>
</entries>
</output>

PhishTank improvements, including a third choice and new API calls

posted by John Roberts on October 11th, 2006 in PhishTank, API, Statistics, Site changes

Since my long post on Friday, PhishTank has been updated in many ways.

I don’t know

Most visible change? Responding to a common request, we’ve added a third choice when voting on a suspected phish: I don’t know

Crop of phish detail, with new 'I don't know' choice and timestamp and colored voting links

Voting “I don’t know” is not encouraged, but it’s necessary at times. An “I don’t know” vote doesn’t influence the final judgment of the community in any way, and it doesn’t appear in statistics, site-wide or personal.

Most important for the very active members of the community: if you vote “I don’t know,” you will not see that suspected phish ID again from the “Next Unverified Phish” link.

New API actions

submit.url and submit.email API actions are now documented and available for use. We also cleaned up the documentation a bit more. Questions welcomed.

Lots of other changes

Flag radio buttons

Here’s a catalog of changes rolled out since Friday morning:

  • What is phishing? page now includes an annotated website example. (And the fictitious URLs are already registered by us, to defer typosquatting there.)
  • The “Something wrong with this submission?” window was updated in a few ways. In most browsers, you can now click on the words, not just the radio buttons. There are also two new choices: “Screenshot Issues” and “Invalid URL.” These “flags” are read, though they remain invisible to anyone but an admin. You only need to submit it once, I promise, even if you can’t see the result.
  • The “Is a phish” and “Is NOT a phish” vote links are now different colored buttons to limit mistakes. (see the screenshot above)
  • More granular stats. More still to come here (most accurate submitters and most accurate verifiers on tap), but now you can see submission numbers broken down, and the total number of PhishTank members. Also, the start date for stats graphs is now September 30th, right before launch.
  • More code to limit/eliminate duplicates (flag ‘em if you see ‘em), and we cleaned out some cruft that had gotten in there earlier.
  • Some limits to keep over-eager submissions (intentional or otherwise) from flooding the site.
  • Session timeout was increased, so you should be able to stay logged in longer.
  • My Account graphs revised to handle larger numbers more effectively. Some of you needed that!
  • The personal RSS feed should now have more informative titles.
  • The phish detail page now displays the current time in UTC, to make it easier to compare to the submission time.
  • If you flag a suspected phish via the “Something wrong with this submission?” link, that suspected phish should not show up via the “Next Unverified Phish” link until the flag is resolved. This is useful for power users.

Behind the scenes, we’re also adding more measures to ensure the site stays online, functional, and fast. There was a brief outage a bit after midday UTC today, October 10; we’ve changed a few things to avoid a repeat.

We’re still working on a host of other improvements. Keep the suggestions coming!

Oh, and when I write that “we” made changes, I mostly mean miked and aaron.

More details about how PhishTank works and what is coming next

posted by John Roberts on October 6th, 2006 in PhishTank, API, Community, Voting, Email, RSS, ASN

We’ve been thrilled with the enthusiastic embrace of PhishTank by an active community. Check those stats! Despite our unspoken office contest to submit and verify as many phishes as possible, all the OpenDNS employees are being blown off the Top Submitters/Verifiers lists (or soon will be) by active individuals around the Internet. That’s a good sign!

This is day five. We’ve been making adjustments and changes all week in response to comments and learnings. We’re not done, so keep telling us how to improve.

There are a lot of different questions we’ve fielded, and ideas we’ve heard. Here are some answers and comments and a quick look ahead on PhishTank.

Screenshots

We know that screenshots of suspected phish sites are valuable in judging a submitted URL, and help avoid visiting a potential phishing site (which should be done with care!). We also know that sometimes the screenshot doesn’t work very well. Please use the “Something wrong with this submission?” link on the right-hand side to alert us. We’ll add a specific choice for “Screenshot problem” shortly. The development team has a ticket for improving this key feature. It’s not a binary issue, but it will get better.

Duplicate URLs

There should not be any…but there are some as I type this. We know why this mistake happened, and it’s being fixed today. My apologies.

Wrong URL picked from email submissions

With some phish submissions via email, the PhishTank software chooses the wrong URL as the phish URL to judge. We’re working to improve our choice, of course. If we’ve got it wrong, please tell us via the “Something wrong with this submission?” link, rather than voting on an obviously biffed URL.

Redirects

Some phishing sites mask their final destination URL by using open redirect URLs at legitimate services. The final destination should certainly be marked as a phish, but the phish URL being judged is often the masked URL. Our take, for now, is that both the full original URL, including the redirect, and the final destination URL are phishing. The point? If someone can click on the URL and get to a phishing site, it’s bad news. This is an understandably grey area, and we’re happy to revisit as the data tells new stories.

Flags

Flags are what we call the notes appended to individual phish IDs via the “Something wrong with this submission?” link. These are read with interest, and help us as PhishTank administrators know where to focus our attention. Please continue to use them!

We are considering whether or not to make them visible to more than just administrators. They are informative, but wondering whether they will bias votes or not. PhishTank doesn’t tell you how others have voted on a submission until you vote because we hope you make your own judgment.

We’re undecided here. Thoughts on making these notes visible?

Judging a site that is offline

We’re continuing to tweak our code for judging (and re-checking) whether a submission is online or offline. We know it’s not 100% accurate, in part due to the normal volatility of phishing sites. If a site is offline, please do not vote. Instead, flag it for review via the “Something wrong with this submission?” link. We use these examples to test and improve our software for checking online status. Our belief is that it’s inappropriate to vote on a site that is not available. Of course, some URLs on their own show phishing intent and no possibility of mistakenly hurting legit folks if identified as phishes; there are grey areas. Help us work to define them further.

Making a mistake

I’ve made a few mistakes already where I mistakenly judged a submission as a phish (or “NOT a phish”) because my mouse finger was moving faster than my brain.

The good news? The community gets it right, and a single mistake vote won’t damage the overall judgment.

There is no need to notify us if you make a mistake. We’re not going to change individual votes. Your choices do matter: better choices will increase the “weight” of your future votes. Still, we’re also going to bake in a (small) allowance for this kind of mistake when judging an individual’s contributions.

We’re going to modify the two links (Is a phish / NOT a phish) to try and make them more distinct and less prone to mistakes.

Displaying suspected phish emails

Several people have asked why we don’t display the suspected phish emails, too. We do store the submitted email, and try to append extra information based on headers where possible. Viewing the email might help in making a better judgment, but there are two elements holding us up.

First, we’re concerned about usability. Before launch, some of the email information was displayed. The individual phish detail page was cluttered. We didn’t solve that problem before launch, but it is solvable.

Second, under no circumstances should PhishTank display personal information about the submitter. With email submissions, that requires extra care. Until we get it right, we will leave the source of the email (for example) behind the scenes.

We are considering screenshots of the emails, although the rendering in different email clients is notably more varied even than web browsers.

MTA (Mail Transfer Agent) information from the email is something we hope to break out, too, for display and API query.

In any event, we know the email itself has valuable information for PhishTank beyond just the phishing URL, and we’re thinking it through.

whois and ASN data

We are adding whois and ASN (Autonomous System Number) data to the submissions, although not currently displayed, primarily because the output of these two fields (especially whois) is so varied. We’ll figure it out.

Coming sooner, probably, are RSS feeds by ASN, so webhosts, ISPs and other organizations can subscribe to notifications about verified phishes on their networks. PhishTank doesn’t do takedowns, but certainly hopes that the data proves useful for those in a position to act.

RSS feeds

The focus for sharing information has been the API (check out the new diagram). But we believe in the simplicity of RSS feeds, too. Beyond the RSS feed for this blog, the site already offers individuals a personal feed to track their contributions. Find it on the My Account page.

We will offer more RSS feeds over time, like the ASN feeds noted above.

Text file of all verified phishes

The API does not offer a way to pull every single verified phish, purposefully. It would not be efficient for developers or PhishTank. However, we’ve heard many requests for a straightforward text file, updated frequently, that lists every verified phish.

We will offer such a file. Goal is to have this up and running sometime next week, barring other interruptions. Availability will be announced on this blog (http://www.phishtank.com/blog/) and in the API documentation.

More API calls coming

There’s more to come with the API. Most immediately, the API will offer calls to submit an email or URL to PhishTank, in addition to check them, as it does now. All that’s needed is some documentation. Stay tuned. If you want something else from the API, just ask. We’ll try to say yes to all reasonable requests; we don’t want to build applications, we want to enable application building.

A few people have written in asking about API limits. I’ll just quote the specific section of the FAQ:

There is no set usage limit. Extreme use will be noted, and we would ask that you contact PhishTank if you plan to use the API heavily. We welcome such usage, but would prefer to hear about it before it begins. PhishTank reserves the right to terminate API usage for accounts which abuse the free privilege.

As we learn more, we’ll get more specific.

Phew… more than enough for now. Comments invited and expected.

API = All Programmers Invited

posted by John Roberts on October 3rd, 2006 in PhishTank, API

To us, API means All Programmers Invited. I want to extend that invitation now, on behalf of PhishTank and its developers, especially miked, who did most of the coding for PhishTank.

Read the documentation. Imagine where data about phishing sites could be applied, shared, and otherwise brought to light where users live. Remember that we’re not done.

More output formats? More actions? Example code? Diagrams? (We think all of these would be useful.)

Be specific… if you have good reasons for your choices, that’s more compelling.

An API is just a toolkit. The interesting part is what you build with the tools provided. We’re going to celebrate your creations.

I’m pleased to share that our friends at Project Honey Pot (a great anti-spam network) are using the PhishTank API to develop PhishTank buttons for Microsoft Outlook and Outlook Express. (Ready later this month; you can ask to be notified.)

This may be less useful, but I know what I want: a PhishTank screensaver (reminiscent of Berkeley SystemsAfter Dark?), where suspected phishes are “swimming” by and I can cast a vote or three before I get started in my normal tasks. Not quite SETI@HOME, but a painless way to do a good deed daily.

If you want to build that — or anything else — using PhishTank data, let us know how the API can support you. Remember, it’s free, and open for both non-commercial and commercial usage.

Look forward to your RSVP to our invitation. Comment below or write us privately.

p.s. - Mike and others were disappointed to miss all the API fun down at Yahoo’s hack day, but had their own hack weekend so you could use the results.

p.p.s. - Of course, API means Application Programming Interface.

Server: pt2