Abuse Prevention in the Globally Distributed Economy

Shyam Mittur (Yahoo! Inc.)
Operations Mission City B1
Average rating: **...
(2.62, 8 ratings)

Yahoo! has had to fight abuse of its users, content and services from the time when Internet access was expensive and anything but ubiquitous. During the last decade, Internet access has become inexpensive while the user community has become truly global. A long-standing definition of abuse at Yahoo! was encapsulated in the phrase “using our services in an excessive, malicious manner or for illegal purposes.” A strategy developed around detection and rejection of such usage stood us in good stead. Perpetrators of abuse were typically identifiable by their source, automation characteristics and unnaturally high attempt rates.

Platform level solutions were developed and deployed to address three elements of this strategy:
  • Identifying abusive requests
  • Challenging suspect sources
  • Remediation
A particularly innovative framework under the YDoD (Yahoo! Department of Defense) moniker has been in place for eight years. This framework provides several major features:
  • A language to develop filters that define handles to count service requests on
  • An algorithm that ensures the most prolific abusers get the most attention
  • A plug-in design that allows work on abuse prevention to proceed independent of the work on service delivery
  • A configurable set of actions that can be taken upon abuse detection
  • A client-server design nullifies distribution strategies that attempt “fly-under-the-radar” abuse.

Solutions based on this framework are deployed on almost all of Yahoo!’s front-end infrastructure. The framework currently detects and rejects anywhere from a small percentage to a majority of the requests as abusive, depending on the service. Individual requests are classified for a serve-or-reject decision in a few milliseconds. Several thousand filters are deployed across the network.

Yahoo! was a pioneer in the development of CAPTCHA, which has been our principal human-vs-bot challenge validation strategy. If we suspected but were not sure the request was abusive , we would issue a CAPTCHA challenge. This worked well for a long time, since significant abuse was almost always driven by automated software. Human users could validate the challenge, automated software could not. CAPTCHA usage across the Yahoo! network has gone up greatly over the last few years.

Mitigation was handled on two fronts:
  • Automated abuse detection and consequent denial of a service request would display a special page that genuine users could use to confirm that a false positive was triggered. We would then fine-tune detection.
  • An internal service to suspend or deactivate accounts that were detected to be the source of gross abuse. Yahoo! has long had a strategy of being liberal on registration of new accounts, while being very aggressive on deactivating abusive accounts.

In the 2006-07 timeframe, we started to see significant changes in sources and patterns of abuse. The first big set of changes observed was in registration volumes, with apparent “mass registration” campaigns succeeding and accompanied by unnatural spikes in successful CAPTCHA solving. Distributed automation of account creation was enabled by dedicated software available from sources in the underground economy. The software incorporated implementations of CAPTCHA-solving algorithms enabled by innovative university research, further augmented by good engineering talent available in the developing regions of the world that were now on the Internet. In turn, we invested in detection techniques as well as CAPTCHA tweaks to defeat automated solving. The iterations got to a point where genuine users started facing hardships, particularly in their inability to get through CAPTCHA.

The abusers now changed tactics in favor of leveraging the global economy and the availability of low-cost but well-educated and trained labor in developing countries. Individual tasks related to abuse like filling registration forms, posting comments, solving CAPTCHA and so on were parceled out at run-time to service providers that recruited masses of expert but cheap labor. Further, the availability of general-purpose as well as specialized botnets (for a fee) distributed the source of abuse and took “flying-under-the-radar” to a new level of sophistication.

Starting in 2010, the Abuse team at Yahoo! has been working on a two pronged strategy to develop the next generation of abuse defenses. We determined that rate-based detection and source- or other handle-based identification was not going to be sustainable. Instead, each request must be classified algorithmically using data in the request, derived information and learning from prior signals. Further, pluggable strategies specific to service contexts need to be deployed then updated at short notice. Approaches based on machine learning, large-scale data analysis and clustering algorithms are available. This framework has been in development under the Project Blackbird moniker.

The second part of the strategy focused on the development of new challenge and validation techniques beyond classic CAPTCHA. A number of new variants, as well as radically different techniques have been developed and are in gradual deployment.

We will describe the salient features of the new framework including some of the significant abstractions in the form of internal web services, and demonstrate some of the new challenge techniques.

Photo of Shyam Mittur

Shyam Mittur

Yahoo! Inc.

Shyam manages the Abuse Engineering team at Yahoo!. His prior work was in enterprise computing product management, IT infrastructure optimization and graphics software development.


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Gloria Lombardo at glombardo@oreilly.com

Media Partner Opportunities

For media partnerships, contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Velocity contacts