The Life and Times of AdMob Engineering

Archive for November, 2008

ADTree for inventory management

November 12th, 2008

Recently, we at the Admob engineering team have been looking for efficient solutions to build an inventory management/forecasting service to help our traffic allocation algorithms. This requires building and mining data sets of request and ad inventory logs. A survey of the existing literature in data structures for quick and (potentially repetitive) counting on data sets pointed to ADTrees (Alternating Decision trees) which has been popularized in the works of Andrew Moore. An ADTree is an efficient representation of a data table consisting of multiple attributes each taking one of a certain number of possible values. The ADTree consists of alternating levels (or types) of nodes, vary and decision (AD) nodes, each node following a parent, child relationship with another node in the previous and next levels respectively. Each vary node represents varying one of the attributes while each AD node under a vary node represents one of the possible values that attribute could assume. The memory efficiency (sparseness) in this table is realized by storing a NULL in place of the the most common value for an attribute in that subtree. This is similar to the classic Huffman codes where length of the code is inversely proportional to the observed frequency of a symbol in the sequence being compressed.

ADNodes and VaryNodes of the tree

This paper provides a simple recursive procedure to construct such a tree on Page 89 but leaves the details of implementing the procedure to query the tree to the reader. Since the tree population procedure is explicitly treated in the paper, we discuss the query procedure (AD_COUNT below) from another follow up result by the same authors here

Preconditions:

Query_list <- list of attribute-value pairs sorted by attribute
Index <- 0
ADnode <- root ADNODE of ADtree

AD_COUNT(ADnode, Query_list, index)

If index equals the size of Query_list then
Return ADnode’s count
Varynode <- Vary node child of ADnode that corresponds to indexth attribute in Query_list
Next_ADnode <- ADNODE child of Varynode that corresponds to indexth value in Query_list
If Next_ADnode’s count is 0 then
Return 0
If Next_ADnode is a MCV then
Count <- AD_COUNT(ADdnode, Query_list, index + 1)
For each s in siblings of Next_ADnode do
Count <- Count – AD_COUNT(s, Query_list, index+1)
Return Count
Return AD_COUNT(Next_ADnode, Query_list, index+1)

The query procedure above turns out to be crucial since the ADTree is a compressed representation of a datatable and hence has reduced redundancy. This translates to multiple traversals through the tree for certain queries involving most common values (MCV) for any (or all) attributes in the tree. The trick used in the above query procedure is

The novel aspect of our implementation of the above ADTree is that, we appealed to the hadoop mapreduce framework of distributed computing for parallel population the ADTree corresponding to every site in the Admob network.

Hence, we are able to populate, track and forecast inventory simultaneously for all the sites in the network. Such an inventory management tool has applications in traffic allocation, bid management, inventory forecasting, auction design and many more. This approach advances state of the art of application of mapreduce to parallel management of request/ad inventory.

- Kamakshi Sivaramakrishnan, Optimization Team

Copyright © The Life and Times of AdMob Engineering. All rights reserved.