The Life and Times of AdMob Engineering

Archive for the ‘Uncategorized’ category

ADTree for inventory management

November 12th, 2008

Recently, we at the Admob engineering team have been looking for efficient solutions to build an inventory management/forecasting service to help our traffic allocation algorithms. This requires building and mining data sets of request and ad inventory logs. A survey of the existing literature in data structures for quick and (potentially repetitive) counting on data sets pointed to ADTrees (Alternating Decision trees) which has been popularized in the works of Andrew Moore. An ADTree is an efficient representation of a data table consisting of multiple attributes each taking one of a certain number of possible values. The ADTree consists of alternating levels (or types) of nodes, vary and decision (AD) nodes, each node following a parent, child relationship with another node in the previous and next levels respectively. Each vary node represents varying one of the attributes while each AD node under a vary node represents one of the possible values that attribute could assume. The memory efficiency (sparseness) in this table is realized by storing a NULL in place of the the most common value for an attribute in that subtree. This is similar to the classic Huffman codes where length of the code is inversely proportional to the observed frequency of a symbol in the sequence being compressed.

ADNodes and VaryNodes of the tree

This paper provides a simple recursive procedure to construct such a tree on Page 89 but leaves the details of implementing the procedure to query the tree to the reader. Since the tree population procedure is explicitly treated in the paper, we discuss the query procedure (AD_COUNT below) from another follow up result by the same authors here

Preconditions:

Query_list <- list of attribute-value pairs sorted by attribute
Index <- 0
ADnode <- root ADNODE of ADtree

AD_COUNT(ADnode, Query_list, index)

If index equals the size of Query_list then
Return ADnode’s count
Varynode <- Vary node child of ADnode that corresponds to indexth attribute in Query_list
Next_ADnode <- ADNODE child of Varynode that corresponds to indexth value in Query_list
If Next_ADnode’s count is 0 then
Return 0
If Next_ADnode is a MCV then
Count <- AD_COUNT(ADdnode, Query_list, index + 1)
For each s in siblings of Next_ADnode do
Count <- Count - AD_COUNT(s, Query_list, index+1)
Return Count
Return AD_COUNT(Next_ADnode, Query_list, index+1)

The query procedure above turns out to be crucial since the ADTree is a compressed representation of a datatable and hence has reduced redundancy. This translates to multiple traversals through the tree for certain queries involving most common values (MCV) for any (or all) attributes in the tree. The trick used in the above query procedure is

The novel aspect of our implementation of the above ADTree is that, we appealed to the hadoop mapreduce framework of distributed computing for parallel population the ADTree corresponding to every site in the Admob network.

Hence, we are able to populate, track and forecast inventory simultaneously for all the sites in the network. Such an inventory management tool has applications in traffic allocation, bid management, inventory forecasting, auction design and many more. This approach advances state of the art of application of mapreduce to parallel management of request/ad inventory.

- Kamakshi Sivaramakrishnan, Optimization Team

A few months ago we set out to build a high speed reverse proxy server. From the beginning we had a strict set of requirements to meet. It had to be fast, handle a large number of simultaneous connections and scale to a high throughput. Essentially a reverse proxy server dispatches in-bound network traffic to a set of servers and presents a single interface to the caller. For example, a reverse proxy could be used for load balancing a cluster of web servers or routing certain traffic based on configurable rules to back end servers. Our proxy server was going to sit in front of all our ad servers and route many tens of millions of HTTP requests per day!

At a high level the reverse proxy server needs to have the following components:

  • I/O Services, Filters and Handlers. These modules would manage the inbound and outbound connections and filter traffic.
  • Protocol decoding and encoding. It needs to support both HTTP 1.0 and 1.1.
  • Statistics Manager for collecting and reporting real-time and historical metrics.
  • Server Monitoring and Management.
  • HTTP Server for serving static and servlet based content.

Building such a server requires careful design consideration. In order to support a large number of client connections we opted to use asynchronous I/O for handling both the server side socket and back end connections. Java’s implementations is called NIO and lives under the java.nio.* package. Traditionally synchronous I/O requires a single thread per connection which blocks until inbound data arrives to be read or outbound data is ready to be written. On the other hand with asynchronous I/O the server is notified when I/O events are ready to be consumed. The I/O threads do not block while waiting for reads and writes. Instead they continue processing events for other sessions. The server needs to keep track of where each client is within the I/O transaction. This model allows a small group of I/O threads to handle a large number of simultaneous connections and to pass execution of complex tasks to a larger pool of worker threads. Asynchronous I/O is limited by the typical constraints such as CPU, bandwidth, File Descriptors, etc.. while synchronous I/O, in addition to these, is limited by how many threads can run inside a JVM.

There are number of frameworks out there which make it easier to write NIO based servers in Java. The most popular of these are Apache Mina and Grizzly. Both are good frameworks, however Mina has a cleaner API (in our opinion) and excellent documentation. Stay tuned as we talk more about building the various modules including performance results and tuning tips for scaling our server!

Moe Alabed, Server Development Team

Recently, one of my papers (primarily drawn from the work in my PhD thesis) was accepted for publication in the IEEE Transactions of Information Theory. The paper is titled ‘Universal Denoising of Discrete-time Continuous-Amplitude Signals’, an abstract of which (reproduced from the manuscript) is as follows.

We consider the problem of reconstructing a discrete-time signal (sequence) with continuous-valued components corrupted by a known memoryless channel. When performance is measured using a per-symbol loss function satisfying mild regularity conditions, we develop a sequence of denoisers that, although independent of the distribution of the underlying `clean’ sequence, is universally optimal in the limit of large sequence length. This sequence of denoisers is universal in the sense of performing as well as any sliding window denoising scheme which may be optimized for the underlying clean signal. Our results are initially developed in a “semi-stochastic” setting, where the noiseless signal is an unknown individual sequence, and the only source of randomness is due to the channel noise. It is subsequently shown that in the fully stochastic setting, where the noiseless sequence is a stationary stochastic process, our schemes universally attain optimum performance. The proposed schemes draw from nonparametric density estimation techniques and are practically implementable. We demonstrate efficacy of the proposed schemes in denoising gray-scale images in the conventional additive white Gaussian noise setting, with additional promising results for less conventional noise distributions.

In other words, suppose we are presented with some vital data (most often result of measurements made on a system) that is unfortunately corrupted by noise and we are interested in the underlying clean data. The work in this paper involves a new denoising (or noise removal) algorithm that retrieves the underlying clean data from its corresponding noise corrupted version. The corruption source is modeled as a memoryless channel that can be described by its probability distribution function that is (assumed to be) known . The algorithm performs denoising by learning statistics of the noise corrupted data using nonparametric density estimation (smoothing). It is then “psuedo-inverted” to estimate the statistics of the underlying clean data followed by Bayesian estimation of the “actual” (clean) data values itself. The main contribution of the work is the algorithm analysis that establishes the asymptotic optimality of the scheme. A central ideas used in the analysis are, concentration inequalities (in probability) that measures the deviation of the loss incurred from applying the algorithm on the noisy data to a “genie-aided” scheme that has knowledge of the statistics of the underlying clean data. The genie-aided scheme has the luxury of making Bayesian optimal decision and we show that we asymptotically approach its performance by applying the proposed algorithm under mild regularity assumptions on the channel.

Information Theory, in its essence, is a very academic discipline with the publications therein being heavy in mathematical analysis of algorithms for problems such as data compression, prediction, filtering etc. However the problems being analyzed and studied have significant practical implications in building engineering systems. On the face of it, it appears to be a whole different universe from mobile advertising. But many of the analytical tools used in design and analysis of information processing algorithms also find application in optimization of mobile advertising. For example, universal prediction schemes have potential applications in CTR prediction which is vital in revenue estimation and hence traffic allocation problems in an advertisement network

- Kamakshi Sivaramakrishnan, Optimization Team

Engineering a Good Chili

July 23rd, 2008

A couple of months back we held our 1st Annual AdMob Engineering Chili Cookoff. We all love good food at AdMob. We’re especially serious about all things spicy. A cookoff seemed like a good way for us to hang out in the park for an afternoon and give one team of engineers and/or product managers bragging rights over the superiority of their culinary creation.

We had a ton of fun, and in the voting following the event, we had a clear winner. The winning team had a well-crafted strategy. They knew that a sizable fraction of AdMob engineering was vegetarian, so they avoided meat in their recipe. They also appreciated that even though almost everyone in engineering claimed to like spicy food, that everyone’s spiciness threshold was different. Accordingly, they made two batches of their chili and provided a variety of sauces and fresh chilies so that folks could tune spiciness precisely to their taste. The winning team also used great ingredients, sought advice on their recipe from experienced veggie chili makers, and did a good job of managing the logistics of chili construction by careful division of labor and time management.

Not to destroy that which is uniquely artistic about cooking good food for friends, but a lot of what led to the winning team’s success was good product design and engineering. They designed their product to appeal to the widest possible audience. They consulted their peers for best practices. They didn’t skimp on the quality of their components. And they shipped their complicated product on time by planning ahead, splitting the work up into semi-independent chunks, and carefully coordinating the integration of their individual work early and often.

Now here’s the winning recipe as conceived of and prepared by Alvin, Anu, Josh, Prasana, and Sashi.

Ingredients:

  • 2 tablespoons olive oil
  • 1 onion, chopped
  • 2 carrots, peeled, thinly sliced
  • 1 red bell pepper, seeded, chopped
  • 3 large jalapeño chilies, seeded, minced (about 4 1/2 tablespoons)
  • 3 habanero chilis (for the spicy version only)
  • 3 Choyote squashes
  • 1 28-ounce can crushed tomatoes with added puree
  • 3 cups water
  • 2 15-ounce cans black beans, rinsed, drained
  • 2 15-ounce cans kidney beans, rinsed, drained
  • 2 cans of garbanzo beans
  • 2 cans of lima beans
  • 2 tablespoons white wine vinegar
  • 5 garlic cloves, minced
  • 1 1/2 teaspoons ground cumin
  • 1 1/2 teaspoons ground coriander
  • 1/2 teaspoon ground cinnammon
  • 1/2 to 1 bag of Morningstar Veggie Beef Crumbles

Preparation:

Heat 2 tablespoons olive oil in heavy large pot over medium-high heat. Add onion, carrots, red bell pepper, and jalapeños and sauté until onion and carrots are almost tender, about 8 minutes. Add tomatoes, 3 cups water, beans, veggie beef crumbles, white wine vinegar, garlic, and spices. Bring to boil. Reduce heat to medium-high and cook, uncovered, until mixture thickens, stirring often, about 20 minutes. Ladle chili into bowls and serve.

–Kevin Scott

Creating New Ads

July 19th, 2008

A month or so ago we released a long overdue redesign of the most important feature on our website, the Create Ad page. This wasn’t a simple visual overhaul, but a throw-everything-away-and-start-from-scratch project. You can see the new page here.

We took what was originally a 5 page wizard and made a single page “wizard”. (Can a wizard be a single page?) We also added a long awaited feature of the ability to add images to their Text Link ads. This is important as images drive user engagement by helping advertisers better communicate their message.

We had to overcome numerous obstacles trying to simplify the wizard into a single page but I’m only going to cover the engineering ones.

When we started the project we knew that it was going to be heavy in regards to javascript. The final javascript payload for the single page ad wizard weighs in at ~55k and 1500 lines. We had several engineers concurrently working the project so making sure we didn’t step on each other toes was crucial. We divided the page up into modules and laid down some basic rules.
Each module …

  • follows the module pattern
  • will have an init() function to be called onDocumentReady
  • will have an getValues() that returns values to be passed back when the entire page is submitted
  • needs minimal knowledge of other modules

Following these rules we were able to parallelize development much easier. Another benefit is we are able to initialize the page easily with the following code:

MB.createAd.modules = ['copy', 'adText', 'images',
                      'channels', 'targeting', 'preview', 'submit'];
Ext.each(MB.createAd.modules, function(item) {
 MB.createAd[item].init.defer(10, MB.createAd[item]);
}, this);

This runs the init() for each modules pausing 10ms between each. This allows the browser to catch up and render what’s in the queue before executing the next init().

The largest problem that we had was that we needed a tree widget with very specific requirements. We’ve standardized on the Ext JS library, however the baseline tree widget was missing a 3 state checkbox mode. That is if any child nodes of a leaf are checked, the parent node will be in a partially checked state. You can see what I mean on the right.

To solve this problem we developed a new TreeNodeUI class as an extension to Ext JS’s tree. You can see the class here. We want to release this tree as an easily re-usable extension for the Ext JS community. We’re in the process of packaging and documenting so stay tuned!

–Wayne Pan, Lead, Frontend Engineering

In our last post we quickly sketched the basic mechanism by which most mobile content and service providers fetch ads from AdMob using our server-to-server ad request API. We also mentioned that we support ad requests made directly from mobile browsers via Javascript, a mechanism that should be immediately familiar to those with experience running ads online. We didn’t go into details about this second way of requesting AdMob ads, not because we don’t love Javascript ad requests or the more-capable handsets that support them. Quite the contrary. If we had our way, every handset in the world, right now, would be as capable as the iPhone and its brethren.

The big question that we didn’t resolve last post is “Why do you need a server-to-server ad request API in the first place?” A fine question indeed, and one with a multi-faceted answer. Before we dive into the answer though, let’s start by throwing out a non-factor.

Our server-to-server ad request API isn’t a technological differentiator. Our ad business is all about optimally connecting advertisers and mobile consumers. How the ad gets requested is more-or-less immaterial. We have a couple of ad request mechanisms now. We will surely add more in the future. For instance, if the mobile ecosystem collectively switched over to browsers whose primary scripting language was Haskell and used VRML for content markup, we would probably ship a Haskell ad request API and VRML-tailored ad units. Proud as we might be of such a request API written for the programming tool of choice to discriminating hackers, it still wouldn’t be a technological differentiator.

Now back to the question: why do we currently need a server-to-server API for mobile ad serving?

First and foremost, the vast majority of handsets on our mobile network don’t support Javascript. Javascript-capable handsets are growing in popularity. But the price points of these devices right now place a rate limiter on their world-wide adoption. Within a scant few years Moore’s Law will result in mobile devices that are even more powerful than today’s überhandsets and that are at the same time cheap enough for most of the world’s population to afford. That promise is one of the reasons we’re so excited about mobile. Today, however, we still need a server-to-server ad request mechanism.

Second, but still crucial, today’s mobile data networks are nowhere near as fast and reliable as the broadband networks most of us use for our day-to-day Internet activities. The last hop on these networks–the wireless one out to the handset–is typically the slowest and least reliable link. Consequently, we need to use this resource very carefully. With Javascript ad requests originating from the phone, we essentially have an extra round trip over this last hop. Because of this ads may load slower or might get dropped altogether. That can in turn slow down rendering of a site’s main content or even leave rendered pages in an unreadable state. Neither is an awesome experience for the user.

On the other hand, with server-to-server ad requests for text ads, there is no ad request round trip over the wireless hop to the handset. The ad is already bundled with the content payload. Ad retrieval takes place over the Internet, typically over networks whose slowest link is substantially faster than residential broadband. This saves the handset from having to make a DNS lookup, and opening and subsequently closing a TCP connection to the ad server for retrieving the ad. Over a slow wireless network, eliminating these things can be a huge performance win and results in a superior user experience.

–Kevin Scott, VP of Engineering

One of the most frequent questions that we get from technical folks is, “How is mobile ad serving different from online ad serving?” There are many things that are different between these two worlds. Each has a different mix of markup languages, different user interface constraints, different browser capabilities, different types of networking infrastructure and protocol quirks, etc. The difference that’s often most surprising to engineers however–especially ones familiar with online ad serving–is the basic mechanism by which ads are requested and delivered.

In the online world the browser is most often responsible for both fetching and rendering ads. The content owner modifies the markup of their content to include placeholders for ads. The placeholders are often called tags and take the form of a bit of Javascript code or a clickable img tag. When the browser loads a document containing ad tags, it executes some Javascript and/or starts an image load in order to fetch each ad.

In the mobile world the browser is most often only responsible for rendering ads. The content owner typically modifies their application servers to request ads from a mobile ad server. The application server copies returned ad(s) into the appropriate place(s) in the content that they’re preparing to send back to the mobile phone. When the phone receives the page that it requested, the document already contains ad markup ready to be rendered.

The diagram below illustrates a single mobile request for a page with ads.

  1. A user navigates to a new page causing the mobile device to send a request to the carrier gateway asking it to retrieve the specified content.
  2. The carrier gateway forwards the request on to the content owner’s application server via HTTP.
  3. If the requested content contains an AdMob ad, the content owner’s application server bundles up some information about the request context and sends an HTTP ad request to AdMob.
  4. AdMob parses the ad request, finds all ads that could possibly match the request, ranks the ads, selects the top ranked ad and sends the ad back in the markup language appropriate to the requesting device.
  5. The content owner’s application server copies the returned ad into the appropriate spot and returns the content + ad(s) to the carrier gateway.
  6. The carrier gateway forwards the content to the user’s mobile device.

This isn’t to say that every mobile ad request uses this request and delivery mechanism. Some mobile ad requests look an awful lot like their online brethren. For instance, AdMob has a special version of our ad request code for the iPhone. The iPhone code is a bit of Javascript that gets installed within content markup and makes the browser responsible for both fetching and rendering the ad.

We’ll continue this article next week with an answer to the obvious follow up question from this post: “Why do you need to do things differently on mobile?

Copyright © The Life and Times of AdMob Engineering. All rights reserved.