Yahoo! has had to fight abuse of its users, content and services from the time when Internet access was expensive and anything but ubiquitous. During the last decade, Internet access has become inexpensive while the user community has become truly global. A long-standing definition of abuse at Yahoo! was encapsulated in the phrase “using our services in an excessive, malicious manner or for illegal purposes.” A strategy developed around detection and rejection of such usage stood us in good stead. Perpetrators of abuse were typically identifiable by their source, automation characteristics and unnaturally high attempt rates.Platform level solutions were developed and deployed to address three elements of this strategy:
Solutions based on this framework are deployed on almost all of Yahoo!’s front-end infrastructure. The framework currently detects and rejects anywhere from a small percentage to a majority of the requests as abusive, depending on the service. Individual requests are classified for a serve-or-reject decision in a few milliseconds. Several thousand filters are deployed across the network.
Yahoo! was a pioneer in the development of CAPTCHA, which has been our principal human-vs-bot challenge validation strategy. If we suspected but were not sure the request was abusive , we would issue a CAPTCHA challenge. This worked well for a long time, since significant abuse was almost always driven by automated software. Human users could validate the challenge, automated software could not. CAPTCHA usage across the Yahoo! network has gone up greatly over the last few years.Mitigation was handled on two fronts:
In the 2006-07 timeframe, we started to see significant changes in sources and patterns of abuse. The first big set of changes observed was in registration volumes, with apparent “mass registration” campaigns succeeding and accompanied by unnatural spikes in successful CAPTCHA solving. Distributed automation of account creation was enabled by dedicated software available from sources in the underground economy. The software incorporated implementations of CAPTCHA-solving algorithms enabled by innovative university research, further augmented by good engineering talent available in the developing regions of the world that were now on the Internet. In turn, we invested in detection techniques as well as CAPTCHA tweaks to defeat automated solving. The iterations got to a point where genuine users started facing hardships, particularly in their inability to get through CAPTCHA.
The abusers now changed tactics in favor of leveraging the global economy and the availability of low-cost but well-educated and trained labor in developing countries. Individual tasks related to abuse like filling registration forms, posting comments, solving CAPTCHA and so on were parceled out at run-time to service providers that recruited masses of expert but cheap labor. Further, the availability of general-purpose as well as specialized botnets (for a fee) distributed the source of abuse and took “flying-under-the-radar” to a new level of sophistication.
Starting in 2010, the Abuse team at Yahoo! has been working on a two pronged strategy to develop the next generation of abuse defenses. We determined that rate-based detection and source- or other handle-based identification was not going to be sustainable. Instead, each request must be classified algorithmically using data in the request, derived information and learning from prior signals. Further, pluggable strategies specific to service contexts need to be deployed then updated at short notice. Approaches based on machine learning, large-scale data analysis and clustering algorithms are available. This framework has been in development under the Project Blackbird moniker.
The second part of the strategy focused on the development of new challenge and validation techniques beyond classic CAPTCHA. A number of new variants, as well as radically different techniques have been developed and are in gradual deployment.
We will describe the salient features of the new framework including some of the significant abstractions in the form of internal web services, and demonstrate some of the new challenge techniques.
Shyam manages the Abuse Engineering team at Yahoo!. His prior work was in enterprise computing product management, IT infrastructure optimization and graphics software development.
For information on exhibition and sponsorship opportunities at the conference, contact Gloria Lombardo at firstname.lastname@example.org
For media partnerships, contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Velocity contacts