PhishTank is operated by OpenDNS, a free service that makes your Internet safer, faster, and smarter. Get started today!

PhishTank improvements, including a third choice and new API calls

posted by John Roberts on October 11th, 2006 in API, PhishTank, Site changes, Statistics

Since my long post on Friday, PhishTank has been updated in many ways.

I don’t know

Most visible change? Responding to a common request, we’ve added a third choice when voting on a suspected phish: I don’t know

Crop of phish detail, with new 'I don't know' choice and timestamp and colored voting links

Voting “I don’t know” is not encouraged, but it’s necessary at times. An “I don’t know” vote doesn’t influence the final judgment of the community in any way, and it doesn’t appear in statistics, site-wide or personal.

Most important for the very active members of the community: if you vote “I don’t know,” you will not see that suspected phish ID again from the “Next Unverified Phish” link.

New API actions

submit.url and submit.email API actions are now documented and available for use. We also cleaned up the documentation a bit more. Questions welcomed.

Lots of other changes

Flag radio buttons

Here’s a catalog of changes rolled out since Friday morning:

  • What is phishing? page now includes an annotated website example. (And the fictitious URLs are already registered by us, to defer typosquatting there.)
  • The “Something wrong with this submission?” window was updated in a few ways. In most browsers, you can now click on the words, not just the radio buttons. There are also two new choices: “Screenshot Issues” and “Invalid URL.” These “flags” are read, though they remain invisible to anyone but an admin. You only need to submit it once, I promise, even if you can’t see the result.
  • The “Is a phish” and “Is NOT a phish” vote links are now different colored buttons to limit mistakes. (see the screenshot above)
  • More granular stats. More still to come here (most accurate submitters and most accurate verifiers on tap), but now you can see submission numbers broken down, and the total number of PhishTank members. Also, the start date for stats graphs is now September 30th, right before launch.
  • More code to limit/eliminate duplicates (flag ‘em if you see ‘em), and we cleaned out some cruft that had gotten in there earlier.
  • Some limits to keep over-eager submissions (intentional or otherwise) from flooding the site.
  • Session timeout was increased, so you should be able to stay logged in longer.
  • My Account graphs revised to handle larger numbers more effectively. Some of you needed that!
  • The personal RSS feed should now have more informative titles.
  • The phish detail page now displays the current time in UTC, to make it easier to compare to the submission time.
  • If you flag a suspected phish via the “Something wrong with this submission?” link, that suspected phish should not show up via the “Next Unverified Phish” link until the flag is resolved. This is useful for power users.

Behind the scenes, we’re also adding more measures to ensure the site stays online, functional, and fast. There was a brief outage a bit after midday UTC today, October 10; we’ve changed a few things to avoid a repeat.

We’re still working on a host of other improvements. Keep the suggestions coming!

Oh, and when I write that “we” made changes, I mostly mean miked and aaron.

14 Responses to “PhishTank improvements, including a third choice and new API calls”

  1. priruss says:

    On a future tweak list, please consider adding an option that lets you shun and/or
    not view submissions from a particular poster. One frequent PT contributor will be
    Name Number One on my list – he/she/it posts nothing but pharmacy spam and has never
    to my knowledge posted a legitimate phish. IMO, its not simple cluelessness on the
    part of this poster, its a desire to spam a “captive” audience.

  2. Tim Wilde says:

    You say “flag ‘em if you see ‘em” for duplicates – what type of flag would be appropriate there? “Other” with an appropriate comment, or something else?

  3. Alan Shea says:

    Fantastic, folks! I like the changes you’ve made, they are just right. They will help us keep throwing the phish back in the tank.

  4. John Roberts says:

    Priruss, we’re watching these. The “Is NOT a phish” votes also help devalue spurious contributions. We’re not going to offer an “ignore” functionality, but we will take action on limiting/devaluing contributions from those who prove to be abusing the system.

    Tim, “Other” with an appropriate comment, please. We’re still looking for loopholes to close.

  5. Hi,

    Is there a group/mailing list to discuss API issues ? I am trying to hack up a ruby api library for this, but am constantly getting a response “Cant get shared secret.”

    –Amit

  6. John Roberts says:

    Amit, not yet. I’ve emailed you privately.

    Others with API question… please reach out to me at my first name at opendns dot com.

    We’ll consider a mailing list or forum.

  7. Char says:

    Can you guys as the admins, ban the introduction of certain sites? One user in particular is constantly submitting his personal home page. Yet I have never seen him submit an actual phish.

  8. Blain says:

    Okay, all of these improvements are to the good. A couple more suggestions:

    1. Automagic screening of obvious phish.

    That is, when the uri includes the name of an obvious phish-target like Ebay or Paypal, but in a way that attempts to mask that it’s not actually that target. If this is the case, then automagically pinging the document to see if it loads should be enough to get it listed — there is no way you’re going to find legitimate sites in those situations.

    2. Bayesian Analysis for pre-screening.
    A la POPfile for email classification. This could be done on the text of the emails submitted as well as on the source code of the documents themselves. A little bit of training can go a long way in teaching a classifier what is and isn’t phish. This would make it easier for project members who would only need to confirm or reject the program’s classification.

    The latter would take a different kind of programming, but there are open source projects using this kind of classification to draw upon. It could also spawn phishers throwing lots of bloating code into sites to try to confuse the classifiers, but it would be less likely to be successful in this environment than it would in email, and it’s not very effective in the world of email as it is. And, if you GPLish it, it could make the basis for a number of products used to classify bad websites of various kinds.

  9. Char says:

    Maybe a Bug??? I’ve noticed the site will go a very long period of time with no phishes to verify, and then Bam! there are hundreds. Is this intentional? some glitch? wouldn’t it be easier to verify as they came in?

  10. funchords says:

    If you flag a suspected phish via the “Something wrong with this submission?” link, that suspected phish should not show up via the “Next Unverified Phish” link until the flag is resolved. This is useful for power users.

    I don’t think this is working. It seems to me that it keeps coming up until I finally give up and mark “I don’t know.”

    Here’s one issue. The Phishers are buying domain names, and they take 0-48 hours to broadcast through and clear caches and etcetera. So when a phish URL fails DNS lookup, it might be that my DNS servers don’t have the new data yet. Failing DNS lookup (unknown host) my response should be “I don’t know.” Yet the URL has obvious phish poop in it, such as paypal/https/update — I worry that I should mark these as “Phish” even though the site won’t come up.

    Lead us, oh great ones!!

  11. Blain says:

    The email submission process is pretty seriously flawed. I’ve looked at my last few email submissions, and the uri that’s being grabbed for verification is innocuous. The last one is for the w3c site, while the phish I submitted was about a credit union, complete with an easy-to-see fake link (when you view the phish as text). if email submission is going to be viable, we need a human intervention step that verifies which uri is the problem, or the automagic process has to be able to strip out obviously okay uris (like w3.org).

  12. John Roberts says:

    Blain, we’re working on the email parsing, and improving our whitelisting so that w3.org (for example) doesn’t crop up.

    We go through the flags, and often change the URL based on the feedback there.

  13. micha says:

    Hello,

    Nice work! One minor detail which would be helpful is: see all timestamps with offset, adjusting to users timezone.

    Greetings

  14. jaded says:

    One thing I’ve noticed is many people have submitted “spam” instead of “phish.” Perhaps if there were a link to flag them as “spam” we could properly identify them as legit instead of having people mark them as phish because they don’t like spam.

    On a related note, is there a way for a voter to add a note regarding a particular phish? This site: http://www.phishtank.com/phish_detail.php?phish_id=20327
    was a phish, but it was not obvious until I checked the URL:
    http://israelibrokerageserviceslimired.com/index.php?sect_id=6&form_id=1&position=Financial+manager+for+cooperation+with+private+individuals&country=usa
    Note the R substituted for a T in the word “limired”. And after I voted, I discovered that 50% of the voters have been taken in by this phish. A simple note might have helped them recognize phish from legit.

Server: pt1