Akamai Pushes the InfoEdge

Akamai uses massive parallelism to reach end users globally.

Imagine: You have 13,000-plus networked PCs, dispersed throughout 63 countries. Not unusual for a big multinational, but here's the hard part: Those 13,000-plus PCs are all servers on somebody else's network, not yours.

About a thousand somebodies, in fact. And those servers must control not only your own data transmissions but also those of your customers … securely, 24x7, without fail.

That's exactly what Akamai did. The company aimed to solve the infamous "last mile" problem—pushing data from the Internet's edge into users' PCs quickly, efficiently and without the usual Internet congestion logjams—without building the massive communications infrastructure usually required for such tasks.

Did Akamai succeed? That's for the market to decide. The solution the company chose, however, offers fascinating insights into the workings of a massively distributed network.

Akamai co-founder Jonathan Seelig (co-founder and CTO Daniel Lewin was killed in September in the World Trade Center attacks) talks about the early days at Akamai with a near-religious fervor. "A few years ago if you asked people how their computer connected to Yahoo, they'd draw a line going from their computer and Yahoo's server, with a big cloud in between. People always talked about the cloud, but didn't necessarily worry about what went on inside the cloud. But inside the cloud, it was really messy."

The number of hops between a Web site and its visitor can depend on not just geography but also on prevailing conditions online. If there's congestion, the information can travel hundreds or even thousands of miles out of the way. This becomes especially important with large downloads and dynamic pages. The closer the server to the recipient, the less likely these delays are.

Seelig and his cohorts reasoned that placing the heaviest Web loads near the user would eliminate many of the bottlenecks and optimize transmission speeds. But instead of developing a Web network to rival provider giants like UUNET, Akamai decided to use those networks.

The company chose relatively inexpensive, off-the-shelf, Intel-based servers, co-locating them with hundreds of ISPs in the United States, then around the world. The servers were to constantly analyze traffic flow and re-route information during periods of outage, peak usage or other potential congestion problems. The network became one of the first successful content-acceleration services on the Web.

Algorithms and software to power the network, based on a modified Linux kernel, were developed in-house. "When people heard what we were doing, a lot of vendors came to us about their network management tools," says Seelig, "But, frankly, we're trying to solve a very different problem. Our machines aren't sitting on our network, they're sitting on other people's networks. Off-the-shelf management products weren't intended to make decisions about what 13,000 different PCs on other peoples' networks were doing."

Massive Parallelism
They also planned to make it big right from the start, "We knew we wanted massive parallelism from the start," he adds. "Put thousands of servers on an automated, distributed network system, and you have something that's not easily duplicated."

Managing that many servers on networks that Akamai actually had little control over proved a tough problem. "We can't do the sexy things like mapping content at the edge of the network, looking for the best available server to deliver to a particular user," says Jonathan Stefansky, Akamai's vice president for network infrastructure and architecture, "if we can't get the unsexy things like our core infrastructure right."

The servers would be delivering status reports and billing information as well. "We eat our own dog food," Stefansky says. Most of the servers are multipurpose—they both serve data, and report information for monitoring. "But we do have some servers that act as dedicated aggregators," he adds. "They compile information needed for billing and push it back to the Network Operations Command Centers (NOCCs)." Customers' content delivery and reporting is done in real-time; Akamai's own billing information collection proceeds at a slightly slower pace.

"Our NOCCs are purely for monitoring reports that the servers send back," says Seelig, "The reports go to Intel systems running an Oracle database on the back-end. Visualization and reporting take place on workstations. Because of that, I can get NOCC functionality from the PC sitting on my desk."

Along the way, Seelig says, he's become an enthusiastic crusader for distributed networks. "When you run a large, business-critical network you become used to being awakened at 3 a.m. because some router failed or a server is down and it's a catastrophe. The neat thing about this kind of distributed architecture is that being awakened in the middle of the night is a rare thing. It's a relief, and it's great incentive to move to a distributed architecture."

The security perils of distributed networks versus centralized data are well known. Seelig argues, however, that the scale of the Akamai network brings inherent safety. "We actually end up with a lot of security through massive parallelism."

"There are actually some problems with centralization," Stefansky points out. "For one thing, it's much harder to deploy cloaking—hiding the origin site—for big, centralized sites. And we can see a denial of service attack and route around it, which is also more difficult to do with a central point for attack. Plus, massively scaling HTTPS/SSL from a single location becomes very difficult."

Instead, the biggest challenge remains just keeping all parts of the network online—especially, says Stefansky, managing their far-flung geography. "It's quite a problem sometimes. We have more control over servers in the United States. In other countries we find lots of barriers to entry—time-of-day problems, cultural changes, content filtering issues. We've had to develop pockets of expertise in import/export within the company to understand these issues and learn how to apply those rules to content delivery."

Details: Akamai

Team Leaders: Jonathan Seelig, co-founder and VP of strategy and corporate development; Jonathan Stefansky, VP of network infrastructure and architecture.

Organization: Akamai Technologies Inc.

Location: Cambridge, Mass.

Web Site: www.akamai.com

Goal: Manage the largest cohesive, managed IP network in the world—without actually owning the networks they're on.

Requirements: "One of our metrics for success is how many hops, on average, there are between the user and Akamai servers. Say you have a 1Mbps DSL; we want you to be able to go out over your DSLAM and ideally, the first server you meet is an Akamai server."—Seelig

Scope:

  • More than 13,000 servers distributed through 63 countries.
  • Servers, routers placed in the networks of more than 1,000 ISPs.

Equipment/Platform: Intel-based dual-processor servers running a modified RedHat Linux kernel.

Products Considered: "When people heard what we were doing, a lot of vendors came to us about their network management tools," says Seelig, "But, frankly, we're trying to solve a very different problem. Our machines aren't sitting on our network, they're sitting on other people's networks. Off-the-shelf management products weren't intended to make decisions about what 13,000 different PCs on other peoples' networks were doing."

Products Used: In-house network management, content distribution software.

Development Environment: RedHat Linux

Future Challenges: Akamai is moving from "It's hard to see how big is big enough, how distributed is distributed enough," says Stefansky. "We'll scale according to two criteria: Customer demand and how close we can get to end users."

About the Author

Cynthia Morgan is a longtime technology journalist and former Senior Editor for Enterprise Systems.