PhishTank is operated by OpenDNS, a free service that makes your Internet safer, faster, and smarter. Get started today!

More details about how PhishTank works and what is coming next

posted by John Roberts on October 6th, 2006 in PhishTank, API, Community, Voting, Email, RSS, ASN

We’ve been thrilled with the enthusiastic embrace of PhishTank by an active community. Check those stats! Despite our unspoken office contest to submit and verify as many phishes as possible, all the OpenDNS employees are being blown off the Top Submitters/Verifiers lists (or soon will be) by active individuals around the Internet. That’s a good sign!

This is day five. We’ve been making adjustments and changes all week in response to comments and learnings. We’re not done, so keep telling us how to improve.

There are a lot of different questions we’ve fielded, and ideas we’ve heard. Here are some answers and comments and a quick look ahead on PhishTank.

Screenshots

We know that screenshots of suspected phish sites are valuable in judging a submitted URL, and help avoid visiting a potential phishing site (which should be done with care!). We also know that sometimes the screenshot doesn’t work very well. Please use the “Something wrong with this submission?” link on the right-hand side to alert us. We’ll add a specific choice for “Screenshot problem” shortly. The development team has a ticket for improving this key feature. It’s not a binary issue, but it will get better.

Duplicate URLs

There should not be any…but there are some as I type this. We know why this mistake happened, and it’s being fixed today. My apologies.

Wrong URL picked from email submissions

With some phish submissions via email, the PhishTank software chooses the wrong URL as the phish URL to judge. We’re working to improve our choice, of course. If we’ve got it wrong, please tell us via the “Something wrong with this submission?” link, rather than voting on an obviously biffed URL.

Redirects

Some phishing sites mask their final destination URL by using open redirect URLs at legitimate services. The final destination should certainly be marked as a phish, but the phish URL being judged is often the masked URL. Our take, for now, is that both the full original URL, including the redirect, and the final destination URL are phishing. The point? If someone can click on the URL and get to a phishing site, it’s bad news. This is an understandably grey area, and we’re happy to revisit as the data tells new stories.

Flags

Flags are what we call the notes appended to individual phish IDs via the “Something wrong with this submission?” link. These are read with interest, and help us as PhishTank administrators know where to focus our attention. Please continue to use them!

We are considering whether or not to make them visible to more than just administrators. They are informative, but wondering whether they will bias votes or not. PhishTank doesn’t tell you how others have voted on a submission until you vote because we hope you make your own judgment.

We’re undecided here. Thoughts on making these notes visible?

Judging a site that is offline

We’re continuing to tweak our code for judging (and re-checking) whether a submission is online or offline. We know it’s not 100% accurate, in part due to the normal volatility of phishing sites. If a site is offline, please do not vote. Instead, flag it for review via the “Something wrong with this submission?” link. We use these examples to test and improve our software for checking online status. Our belief is that it’s inappropriate to vote on a site that is not available. Of course, some URLs on their own show phishing intent and no possibility of mistakenly hurting legit folks if identified as phishes; there are grey areas. Help us work to define them further.

Making a mistake

I’ve made a few mistakes already where I mistakenly judged a submission as a phish (or “NOT a phish”) because my mouse finger was moving faster than my brain.

The good news? The community gets it right, and a single mistake vote won’t damage the overall judgment.

There is no need to notify us if you make a mistake. We’re not going to change individual votes. Your choices do matter: better choices will increase the “weight” of your future votes. Still, we’re also going to bake in a (small) allowance for this kind of mistake when judging an individual’s contributions.

We’re going to modify the two links (Is a phish / NOT a phish) to try and make them more distinct and less prone to mistakes.

Displaying suspected phish emails

Several people have asked why we don’t display the suspected phish emails, too. We do store the submitted email, and try to append extra information based on headers where possible. Viewing the email might help in making a better judgment, but there are two elements holding us up.

First, we’re concerned about usability. Before launch, some of the email information was displayed. The individual phish detail page was cluttered. We didn’t solve that problem before launch, but it is solvable.

Second, under no circumstances should PhishTank display personal information about the submitter. With email submissions, that requires extra care. Until we get it right, we will leave the source of the email (for example) behind the scenes.

We are considering screenshots of the emails, although the rendering in different email clients is notably more varied even than web browsers.

MTA (Mail Transfer Agent) information from the email is something we hope to break out, too, for display and API query.

In any event, we know the email itself has valuable information for PhishTank beyond just the phishing URL, and we’re thinking it through.

whois and ASN data

We are adding whois and ASN (Autonomous System Number) data to the submissions, although not currently displayed, primarily because the output of these two fields (especially whois) is so varied. We’ll figure it out.

Coming sooner, probably, are RSS feeds by ASN, so webhosts, ISPs and other organizations can subscribe to notifications about verified phishes on their networks. PhishTank doesn’t do takedowns, but certainly hopes that the data proves useful for those in a position to act.

RSS feeds

The focus for sharing information has been the API (check out the new diagram). But we believe in the simplicity of RSS feeds, too. Beyond the RSS feed for this blog, the site already offers individuals a personal feed to track their contributions. Find it on the My Account page.

We will offer more RSS feeds over time, like the ASN feeds noted above.

Text file of all verified phishes

The API does not offer a way to pull every single verified phish, purposefully. It would not be efficient for developers or PhishTank. However, we’ve heard many requests for a straightforward text file, updated frequently, that lists every verified phish.

We will offer such a file. Goal is to have this up and running sometime next week, barring other interruptions. Availability will be announced on this blog (http://www.phishtank.com/blog/) and in the API documentation.

More API calls coming

There’s more to come with the API. Most immediately, the API will offer calls to submit an email or URL to PhishTank, in addition to check them, as it does now. All that’s needed is some documentation. Stay tuned. If you want something else from the API, just ask. We’ll try to say yes to all reasonable requests; we don’t want to build applications, we want to enable application building.

A few people have written in asking about API limits. I’ll just quote the specific section of the FAQ:

There is no set usage limit. Extreme use will be noted, and we would ask that you contact PhishTank if you plan to use the API heavily. We welcome such usage, but would prefer to hear about it before it begins. PhishTank reserves the right to terminate API usage for accounts which abuse the free privilege.

As we learn more, we’ll get more specific.

Phew… more than enough for now. Comments invited and expected.

20 Responses

  1. Polymorp

    The txt file of all verified phishes would help me out, as I plan add a phish check to the redir script which all the weblinks within our webemail system pass through. It’ll help reduce the number of times the script will have to bug the API.
    For me, Phishtank has proven a great stress reliever for when I need a break from coding, just flick to phishtank and verify a couple of possible phishing site.

  2. firx

    Hi John and all. How about flaging when a link immediately delivers malware. Would no doubt influence votes but might also spare well meaning yet unprepared verifiers some grief. Maybe someway to flag links as ’safe’ instead; might give more confidence to chip in and verify.

    Also, to verify a link this morning I used ‘user: www.phishtank.com‘ and ‘pass: phishtanked’. Anyone know what ‘Viagra’ is?

  3. aeroshark

    More info on how to mark spam sites would be nice. eg. there are lots of pharmacy & mortgage sites. An option to mark as spam would be good.

  4. Blain

    I’ve been around since last night, and I’ve noticed some stuff that could stand some tweaking IMO, some of which you talk about here.

    First are the sites where the link is down, but the uri clearly indicates that the site is phish — http://www.ebay.com.blah.blah/blah&blah…, or http://ebay-verify.com/foo/bar for example. The fact that the specific configuration in the uri has been taken down doesn’t mean that the rest of the domain/subdomain isn’t clearly a phishing source. I don’t mind flagging those as broken, but I can still see value in voting them as phish. I’ll follow your suggestion here henceforth until I’m told otherwise, but this is something to keep in mind.

    2. I need a way to indicate that I can’t vote on a given uri so it gets out of my queue. When the site is in a language I don’t speak (or, in some cases, that my browser can’t even render), I can’t always tell what’s going on enough to determine if we’re phish or not-phish. Especially since the “view next unverified phish” seems to load those unverified phish at random — it keeps sending me the ones that I can’t tell, and it takes several times through the pattern to figure out if I’m all caught up on the new ones and these are all that’s left, or if I’m just getting them for random purposes.

    3. The “add a phish” form requires me to include the body of the email I found the phish in. When I get to the phish through a redirect from another submitted phish, I don’t have an email text to include, and it won’t accept it with that field blank.

    Beyond that, I think this is great. I hope that this data is used to shut these things down more quickly. When they stop working, hopefully they will stop happening.

  5. Blain

    Oh, I forgot:

    4. I’m not getting anything from the RSS feed on my account updates other than the title.

  6. Blain

    Okay, now I’m on a roll, but I think this should do:

    5. When I flag a uri as “the site is down” it should leave my queue also, for the same reason as the uris where I can’t vote.

  7. Ilgaz

    There are very suspicious URLs either completely legit or having referrer link, there are people who either tries to make system unusable by making people fed-up or earn money by (generally pyramid scam) referrer URLs.

    There should be a rather harsh policy against that kind of abuse. There is even a professor who posts dozens of valid URLs with his personal homepage at end. There could be a social experiment but not for free and abusing peoples spare time and a (kind of) security service.

    Phishtank is a community based system, should ask/copy Spamcop.NET’s policies which were evaluated by years of abuse/attacks.

    http://www.spamcop.net/fom-serve/cache/167.html for reference

  8. clock — watching time, the only true currency » » PhishTank’s first week

    […] There’s still a a lot to do, but seven days in, there are almost 2,000 submissions to the site. […]

  9. Blain

    Okay, another thing that would help a lot — please strip non-document uris from the list — a .jpg, .png or .gif is not a phish, even though it might be used in one.

    And if you could test the uri for formedness, that would help too, as http://0x is not going to be a phish either.

    Both of these things are going to suck up your resources, and, particularly, your most precious resource — user time — without producing anything of use.

    And it would be really good if somewhere in the process the uris for the legitimate versions of these documents (if any) are kept track of so that we don’t inadvertently list some real document from Ebay, Paypal, etc.

    I don’t think I’ll come up with more suggestions before my previous ones are released from mod.

  10. Chris Granger

    I’ve been verifying a few of these sites that have been voted 90% NOT phish and I’d just like to clarify something. I consider Anatrim, Pharmacy Express and the rest of the pharmacy spam sites ‘phishy’ because they ask you to enter your credit card details to order their product, yet the sites are run by known criminals who definitely shouldn’t be given access to your personal information.

    What’s your take on this?

  11. Justin Mason

    Hi guys –

    any thought on the URIBL idea? it’s the “industry standard” way for MTA filters (incl SpamAssassin) to look up these kinds of services.

  12. Evil-Dragon

    Can I make a suggestion that certain urls be automatically classed as not phishing? Or an ability to suggest that there is a way to flag the url that it should be banned from being added?

    E.g.
    http://mail.yahoo.com
    http://images.paypal.com/en_US/i/scr/pixel.gif
    http://surgemail.com
    http://www.w3.org/TR/REC-html40
    https://www.paypalobjects.com/en_US/i/scr/pixel.gif
    http://paypal.com/en_US/i/scr/pixel.gif

    I’ve seen them more than a few times now.

    Just my thoughts on the matter.

  13. John

    Thanks everyone for your comments. Reading and reviewing… and we’ll be doing more this week on several of these fronts. One of us will post again with an update in the next couple of days.

  14. barbedtreble

    You write:
    “Second, under no circumstances should PhishTank display
    personal information about the submitter.”
    But I’ve seen someone’s email address as a CGI GET argument in an URL.

    Don’t forget the emails sometimes have quoted-printable and line breaks and stuff. I’ve seen links listed ending in ‘=’ and presumably missing the remainder.

  15. Blain

    Chris — That it would be stupid to give someone your credit card, and that they are criminals, doesn’t make them phish. This is about phish. If the project can expand to cover sites by criminals where it would be stupid to give them your credit card, that might be cool. But it’s not my understanding of what we’re *here* for.

  16. Chris Granger

    Blain - my concern is that most people aren’t aware of what kind of people are running the various pharmacy and mortgage refinance sites. Sure, you and I wouldn’t give these guys our credit card numbers and so on, but we’re not the ones who need protecting.

    A site that requests personal information for dubious purposes is ‘phishy’ in my view. Mortgage refinance sites ‘for US residents only’, yet hosted in China, this sort of thing… Some of these sites even follow the usual method of using images, text copy or site layouts ripped from legitimate sites to build trust.

    I’ll leave it to PhishTank’s administrators to determine the standard for phishiness though. I’m simply skipping these sites for the time being.

  17. Jost Krieger

    > Wrong URL picked from email submissions

    Phishtank seems to pick up URLS from the header of the mail,
    which may be useful or not.

    While this is the case, please remove all SpamAssassin or other extra headers before submitting. This helped a lot
    for my submissions.

    Jost |8-))

  18. Fred Showker

    This is a great site — our applause to you !!!

    Archiving and talking about phishing is one thing, however what do you intend to actually do about it?

    We’ve been tracking and stalking phishers since they first appeared on the scene in the late 1990s. For the past two years, we’ve reported as many as 20 phishing attempts per day — but the sad news is, 99% are either never closed down, or the criminal is back at another IP within hours.

    One particularly frequent phisher hits with broadcast spam redirected to one of several spoof web sites, then disappears for a week or so, then comes back using the SAME IP addresses. This is clear indication that the ISP community is either out to lunch, or in complicity with the criminal activities. Abuse departments are too closely focused on “WHO” sent the spam/phishing attempt, and NOT the spamvertised site.

    I’m probably the ONLY person in the world, advocating these IP blocks be blocked at the DNS level. It’s really the only way to get the ISP’s attention. And, if no one complains, then another open proxy or rogue IP has ceased to exist on the internet.

    GO AFTER THE ISP of the SPAMVERTISED SITE. Once the industry learns to police their own act, phishing will be a thing of the past.

    The problem is ICANN. They neglect to enforce their own regulations, allowing rogue Registrars to kite domains, and allow forged WHOIS information all of which is very friently to phishing and spamming ISPs. (Joker.com comes to mind.)

    Until measures are deployed ‘upstream’ as high as the DNS, then phishing will continue.

    Fred

  19. Frank

    Cool project, great-looking and working website!

    A few points:
    a) RSS-feeds by ASN woould be great. Sadly, the ASNs that have the worst track record are the least likely to subscribe. I woould strongly consider automatic notifications to the ASN’s contact info.
    b) It would be great if you made an RBL for all the domains so that I can use SU-RBL to filter incoming email. Perhaps separate RBLs for those that are new versus confirmed, so that my spam engine can assign them the appropriate scores.
    c) As mentioned by another poster, a whitelist of domains neesd to be generated. In the same way that 8e6 Technologies whitelists the top ### websites, PhishTank ought to do the same.
    d) I would encourage PhishTank to work with other Anti-phishing organizations, most notably the “Anti-Phishing Working Group”.
    e) I would encourage PhishTank to exchange phish emails and URLs with other competing anti-phish organizations as a measure of good will. Too bad there isn’t one clearing house, with each anti-phish website just leveraging the same basic source set but offering different services.

    Frank

  20. FilipZ

    Great project - I wish I was able to use PhishTank as one of URIBL sources with my Mdaemon server.

    Comment regarding screenshots: in the past few days I hardly ever get a valid screenshot - 4 out of 5 times it’s “Screenshot has not been taken yet…” Why’s that?

Leave a Reply

Server: pt1