Building a Scalable Ad Network: Lessons Learned

When faced with the building an ad network, traditional systems couldn’t satisfy the requirements. The secret: distributed systems.

by Gary Orenstein and Mark Balch

Mobile and online advertising networks face unique vital infrastructure and service challenges compared to other Web companies. Advertising networks have to serve billions of small files, including text, images, and video, and they have to track every click to ensure targeted ad relevancy and proper billing for ads served.

Because ad networks get paid by the number of ads placed, any availability gaps directly impact revenue. Service-level agreements dictate that ads not served quickly result in lost revenue for both advertisers and the ad network.

Online ad serving must be continuously available and able to meet market demand. One mobile advertising company recently reported over 110 billion ad impressions served in three years, with growth expected to continue. Statistics like these showcase the critical nature of infrastructure and the need to move beyond existing solutions which simply cannot keep up with traffic increases.

This article will briefly outline the challenges and describe a new architecture to support expanding workload requirements. Ad serving companies following these guidelines will be able to eliminate infrastructure bottlenecks, ensure proper click-through tracking and billing, and prepare to easily handle rapid expansion.

Infrastructure Requirements for a Scalable Ad Network

Ad networks must handle a range of workloads from small file serving to logging, all while maintaining high-availability. Here are some of the requirements for this project.

Small-file serving: Most ads are relatively small in size, and most ad networks handle millions to billions of requests, which leads to intense pressure on file serving infrastructure. To keep up with the workload, systems must handle:

  • Low latency: Rapid ad serving with low-latency responses is imperative for ad networks to get paid. Typically, ad networks commit to a service level agreement that ads will be delivered in a set number of milliseconds. If they cannot serve an ad within that window, they lose the opportunity to generate impressions and click-throughs, reducing revenue.
  • Random access: Each end user receives a unique ad or combination of ads. Therefore, systems must support random access to a large library of ad content. As the amount of content scales, predictable response times to randomly accessed content must scale, too. Random-access patterns to a large data set are not easily cached, making fast and efficient access to disk critical.
  • Load-balanced performance and peak-load handling: Ad-serving systems must ensure that they can evenly balance workloads across all resources. These systems must also scale to support peak loads. Preparing for seasonal traffic should not require months of planning. You should be able to add new nodes that seamlessly join the system and deliver a proportional increase in overall performance.
  • Logging: Logging workloads are practically the exact opposite of small-file serving. Instead of getting small files out of the system quickly, log workloads drive updates (small and large) into the file-serving infrastructure. Large ad networks rely on logging for ad relevancy and accurate billing. Log data comes to the system from a few to hundreds or thousands of Web servers. In these cases, the ability to handle hundreds of updates per second, often as atomic appends, determines system success.
  • High Availability: Operating 24/7, ad networks must maintain high availability to support partner Web sites and prevent lost revenue opportunities. From a systems perspective, this means being impervious to individual disk failure or server node failure. File systems must maintain ad-serving capabilities throughout such disruptions.

Challenges of Traditional Systems

Traditional systems, such as those designed just five years ago, did not anticipate the unique requirements of Internet scale applications and segments such as ad serving. Legacy file system products cannot effectively or economically meet these workload demands, and fall short in the following categories.

Expensive hardware: Traditional systems typically rely on expensive, proprietary hardware that has a short lifespan and high ongoing maintenance costs. Upgrades are typically complete system replacements leading to premature spending up front during the migration process.

Poor performance: Traditional systems usually take a scale-up approach, meaning that they tend to place too many disk drives behind a single controller, leading to poor performance when running at full capacity (see Figure 1).

Scale-up Architectures Create Disk Controller Bottlenecks
Figure 1: Scale-up Architectures Create Disk Controller Bottlenecks

Inability to scale quickly and to large capacities: Traditional systems generally have large “step” upgrades, meaning that upon reaching certain system capacity thresholds, entire new systems must be added. This leads to excessive spending plus delays in provisioning and deploying storage.

Not Web-optimized: Traditional systems were never optimized for Web workloads. They do not necessarily handle small files well, they do not handle logging well, and they do not include the integrated abilities of a key-value store. All of these functions can help an ad network consolidate systems and reduce overall costs.

The challenges are summarized in Figure 2.

Challenges of Building an Ad Network with Traditional Systems
Figure 2: Challenges of Building an Ad Network with Traditional Systems

New Approach with Distributed Systems

The scale and workload characteristics of ad networks demand a new approach with distributed systems. This design benefits ad networks in allowing them to build more robust, less expensive file serving infrastructure. The following characteristics make this possible.

Use commodity hardware: Use of industry-standard commodity hardware guarantees price transparency on the equipment. By avoiding proprietary hardware architectures, customers can save on capital costs, and maintain lower operating costs.

Achieve linear, low-latency performance: By taking a scale-out approach, where each file serving node includes a combination of CPU, network bandwidth, memory, and storage capacity, a distributed approach delivers linear, low-latency performance for hundreds to thousands of application servers that are serving millions to billions of ads. Traditional systems keel over under loads of this magnitude.

Scale within a single namespace: Distributed systems provide a single namespace up to hundreds of petabytes. Contrast this to traditional systems that lack a single namespace where each system was a fixed resource repository and led to managing multiple storage islands. Scale in a single namespace dramatically simplifies system management and ensures that application workloads can expand dramatically without requiring a changing the file serving and storage infrastructure.

Optimize for Web workloads: Ad networks encompass a combination of Web workloads including small-file serving, logging, and analytics. Traditional systems are not optimized for these workloads and can require multiple systems. A distributed approach with optimizations for Web workloads drives system consolidation, simplifies management, and delivers a more cost-effective approach.

A summary of the benefits of a distributed system is shown in Figure 3.

Benefits of Building an Ad Network with a Distributed File Serving System
Figure 3: Benefits of Building an Ad Network with a Distributed File Serving System

Gary Orenstein is vice president of technical solutions for MaxiScale. Gary is the author of IP Storage Networking: Straight to the Core and is a regular contributor to GigaOM and hosts The Cloud Computing Show (http://cloudcomputingshow.blogspot.com/).