API = All Programmers Invited
posted by John Roberts on October 3rd, 2006 in API, PhishTank
To us, API means All Programmers Invited. I want to extend that invitation now, on behalf of PhishTank and its developers, especially miked, who did most of the coding for PhishTank.
Read the documentation. Imagine where data about phishing sites could be applied, shared, and otherwise brought to light where users live. Remember that we’re not done.
More output formats? More actions? Example code? Diagrams? (We think all of these would be useful.)
Be specific… if you have good reasons for your choices, that’s more compelling.
An API is just a toolkit. The interesting part is what you build with the tools provided. We’re going to celebrate your creations.
I’m pleased to share that our friends at Project Honey Pot (a great anti-spam network) are using the PhishTank API to develop PhishTank buttons for Microsoft Outlook and Outlook Express. (Ready later this month; you can ask to be notified.)
This may be less useful, but I know what I want: a PhishTank screensaver (reminiscent of Berkeley Systems‘ After Dark?), where suspected phishes are “swimming” by and I can cast a vote or three before I get started in my normal tasks. Not quite SETI@HOME, but a painless way to do a good deed daily.
If you want to build that — or anything else — using PhishTank data, let us know how the API can support you. Remember, it’s free, and open for both non-commercial and commercial usage.
Look forward to your RSVP to our invitation. Comment below or write us privately.
p.s. – Mike and others were disappointed to miss all the API fun down at Yahoo’s hack day, but had their own hack weekend so you could use the results.
p.p.s. – Of course, API means Application Programming Interface.


Be sure to check with spamcop.com ask them add a Phish button !!
I think it would be of use if the listed phishing URLs within certain parameters, such as date added, could be retrieved rather than queried. This would allow other database services to simply append to their records.
Thanks
Chris
2 questions:
1. Why a new api? Why not publish via dns, like surbl and uribl?
2. I don’t like sending all my queries over the net. Can I rsync the database?
Thanks. Interesting stuff.
Perhaps Matt is suggesting the phish button on spamcom.net, rather than spamcop.com?
Cheers
Warning: The real spamcop is at http://www.spamcop.NET , spamcop.COM is a fake site which feeds personal mail likely for spammers.
Would love to see a mozilla plugin for this. Of course, it may already be in the works or I just didn’t RTFM very well. Maybe netcraft can get in on this too? They already have a nice toolbar. I’m not much of a programmer so I’ll leave that up to the folks that code well
I’m still not exactly sure how to program with your API in PHP. Perhaps over time, there will be more examples for me to learn from. Then I shall include this wonderfull site with some of mine.
Ken: OpenDNS already supply a DNS-RBL for the purpose. However, that only works on domain names. This service works on the entire URL.
To the devs: I’ve noticed that the API is putting all the answers in encapsulation. Rather unnecessary… what gives ?
I’ve created some quick python code to handle forming queries, including signatures, connect and send any request to the API server. I’ve shown an example sending misc.ping and printing out the ‘pong’ that comes back. Should be easily extended to do anything you need.
http://onca.acinonyx.net/~chris/phish-test.py
On top of the standard Python distribution, it needs FourThought’s 4Suite package, from here: http://4suite.org. If you don’t need XPath to traverse the XML files, you can get away with using the standard xml.dom.minidom parser instead of Ft.Xml.Domlette.NonvalidatingParser that 4Suite provides.
Enjoy!
PhishTank Thunderbird Extension Spec…
I was recently pondering some ideas about this new service called PhishTank. The API looks awesome, if only I understood how to make extensions. It seems overly complex, although I’m sure simple for someone who has a great understanding of java…
We’d be interested in trying this out in SpamAssassin — however, it’d pretty much have to be using a domain-name-listing URIBL distributed over DNS, as Ken suggested, for speed, efficiency and cacheability.
Chris Cogdon says that this is already being published — are there more details?
Justin: Looks like I misunderstood what OpenDNS is about. Their service automatically deflects DNS queries away from phishing sites. You ‘use’ them by making the OpenDNS servers your resolver, or upstream DNS cache (if you’re configuring bind). I could not see an interface for the more traditional RBL configuration I think you’re doing.
For that, I’d have a look at the lists available at, or referenced from, spews.org
Remember, though, that just because there’s a phishing page at a site, it does not mean the entire site is phishing-related. You DO want a URIBL, but you CANT distribute that through DNS. There are certainly ways to mangle a URL so it fits into the domain name system, but they’re far from complete.
I’d settle for a rsync-able gzipped text file of all confirmed phishing urls, updated regularly. That would be very nice!
my $.02 = Most phishing domains are throwaway domains registered yesterday for spam today, so you don’t really need the full url, just the domain. Otherwise, phishers could just change a file or directory name & use some apache url magic to make urls somewhat unique but really be hitting the same phish hook. It costs them money to register domain names, so they are less likely to change. You can do this via DNS; surbl and uribl both do, and phishtank would be a nice addition!
Ken: I have to disagree with you there. In my experience, mostly from verifying phishing attempts here, almost all phishing URLS are based on existing domains where the server has been compromised, and the phishing web pages placed in a subdirectory of the existing website.
In some cases, the phishing URL is a sub-domain of the existing URL. Eg: http://www.53.com.existing.com. This may be “preferable” to the phisher, but would only be possible if the domain did wildcard matching (a * in the domain), AND the web server was either modifyable, or answered to any domain.
Ie, neither is a “throwaway” domain. However, you are correct in that the phishing sites are very transitory: a phishing database for the purposes of blocking or tagging needs to be updated and accessible very quickly.