The Life and Times of AdMob Engineering

Posts tagged ‘optimization’

ADTree for inventory management

November 12th, 2008

Recently, we at the Admob engineering team have been looking for efficient solutions to build an inventory management/forecasting service to help our traffic allocation algorithms. This requires building and mining data sets of request and ad inventory logs. A survey of the existing literature in data structures for quick and (potentially repetitive) counting on data sets pointed to ADTrees (Alternating Decision trees) which has been popularized in the works of Andrew Moore. An ADTree is an efficient representation of a data table consisting of multiple attributes each taking one of a certain number of possible values. The ADTree consists of alternating levels (or types) of nodes, vary and decision (AD) nodes, each node following a parent, child relationship with another node in the previous and next levels respectively. Each vary node represents varying one of the attributes while each AD node under a vary node represents one of the possible values that attribute could assume. The memory efficiency (sparseness) in this table is realized by storing a NULL in place of the the most common value for an attribute in that subtree. This is similar to the classic Huffman codes where length of the code is inversely proportional to the observed frequency of a symbol in the sequence being compressed.

ADNodes and VaryNodes of the tree

This paper provides a simple recursive procedure to construct such a tree on Page 89 but leaves the details of implementing the procedure to query the tree to the reader. Since the tree population procedure is explicitly treated in the paper, we discuss the query procedure (AD_COUNT below) from another follow up result by the same authors here

Preconditions:

Query_list <- list of attribute-value pairs sorted by attribute
Index <- 0
ADnode <- root ADNODE of ADtree

AD_COUNT(ADnode, Query_list, index)

If index equals the size of Query_list then
Return ADnode’s count
Varynode <- Vary node child of ADnode that corresponds to indexth attribute in Query_list
Next_ADnode <- ADNODE child of Varynode that corresponds to indexth value in Query_list
If Next_ADnode’s count is 0 then
Return 0
If Next_ADnode is a MCV then
Count <- AD_COUNT(ADdnode, Query_list, index + 1)
For each s in siblings of Next_ADnode do
Count <- Count – AD_COUNT(s, Query_list, index+1)
Return Count
Return AD_COUNT(Next_ADnode, Query_list, index+1)

The query procedure above turns out to be crucial since the ADTree is a compressed representation of a datatable and hence has reduced redundancy. This translates to multiple traversals through the tree for certain queries involving most common values (MCV) for any (or all) attributes in the tree. The trick used in the above query procedure is

The novel aspect of our implementation of the above ADTree is that, we appealed to the hadoop mapreduce framework of distributed computing for parallel population the ADTree corresponding to every site in the Admob network.

Hence, we are able to populate, track and forecast inventory simultaneously for all the sites in the network. Such an inventory management tool has applications in traffic allocation, bid management, inventory forecasting, auction design and many more. This approach advances state of the art of application of mapreduce to parallel management of request/ad inventory.

- Kamakshi Sivaramakrishnan, Optimization Team

Recently, one of my papers (primarily drawn from the work in my PhD thesis) was accepted for publication in the IEEE Transactions of Information Theory. The paper is titled ‘Universal Denoising of Discrete-time Continuous-Amplitude Signals’, an abstract of which (reproduced from the manuscript) is as follows.

We consider the problem of reconstructing a discrete-time signal (sequence) with continuous-valued components corrupted by a known memoryless channel. When performance is measured using a per-symbol loss function satisfying mild regularity conditions, we develop a sequence of denoisers that, although independent of the distribution of the underlying `clean’ sequence, is universally optimal in the limit of large sequence length. This sequence of denoisers is universal in the sense of performing as well as any sliding window denoising scheme which may be optimized for the underlying clean signal. Our results are initially developed in a “semi-stochastic” setting, where the noiseless signal is an unknown individual sequence, and the only source of randomness is due to the channel noise. It is subsequently shown that in the fully stochastic setting, where the noiseless sequence is a stationary stochastic process, our schemes universally attain optimum performance. The proposed schemes draw from nonparametric density estimation techniques and are practically implementable. We demonstrate efficacy of the proposed schemes in denoising gray-scale images in the conventional additive white Gaussian noise setting, with additional promising results for less conventional noise distributions.

In other words, suppose we are presented with some vital data (most often result of measurements made on a system) that is unfortunately corrupted by noise and we are interested in the underlying clean data. The work in this paper involves a new denoising (or noise removal) algorithm that retrieves the underlying clean data from its corresponding noise corrupted version. The corruption source is modeled as a memoryless channel that can be described by its probability distribution function that is (assumed to be) known . The algorithm performs denoising by learning statistics of the noise corrupted data using nonparametric density estimation (smoothing). It is then “psuedo-inverted” to estimate the statistics of the underlying clean data followed by Bayesian estimation of the “actual” (clean) data values itself. The main contribution of the work is the algorithm analysis that establishes the asymptotic optimality of the scheme. A central ideas used in the analysis are, concentration inequalities (in probability) that measures the deviation of the loss incurred from applying the algorithm on the noisy data to a “genie-aided” scheme that has knowledge of the statistics of the underlying clean data. The genie-aided scheme has the luxury of making Bayesian optimal decision and we show that we asymptotically approach its performance by applying the proposed algorithm under mild regularity assumptions on the channel.

Information Theory, in its essence, is a very academic discipline with the publications therein being heavy in mathematical analysis of algorithms for problems such as data compression, prediction, filtering etc. However the problems being analyzed and studied have significant practical implications in building engineering systems. On the face of it, it appears to be a whole different universe from mobile advertising. But many of the analytical tools used in design and analysis of information processing algorithms also find application in optimization of mobile advertising. For example, universal prediction schemes have potential applications in CTR prediction which is vital in revenue estimation and hence traffic allocation problems in an advertisement network

- Kamakshi Sivaramakrishnan, Optimization Team

Copyright © The Life and Times of AdMob Engineering. All rights reserved.