<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stochastic Nonsense</title>
	<atom:link href="http://blog.earlh.com/index.php/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.earlh.com</link>
	<description></description>
	<lastBuildDate>Mon, 19 Sep 2011 03:30:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Labeled boxplot in R</title>
		<link>http://blog.earlh.com/index.php/2011/09/labeled-boxplot-in-r/</link>
		<comments>http://blog.earlh.com/index.php/2011/09/labeled-boxplot-in-r/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 03:24:03 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=739</guid>
		<description><![CDATA[As generated by R&#8217;s boxplot function. I individually labeled the median, quartiles, min, max, and outliers for inclusion in a presentation where the audience can&#8217;t be assumed to know how to interpret box plots. Please feel free to use this &#8230; <a href="http://blog.earlh.com/index.php/2011/09/labeled-boxplot-in-r/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.earlh.com/wp-content/uploads/2011/09/labeled-boxplot-02.jpg"><img src="http://blog.earlh.com/wp-content/uploads/2011/09/labeled-boxplot-02.jpg" alt="boxplot with labeled parts" title="labeled boxplot 02" width="602" height="792" class="aligncenter size-full wp-image-740" /></a></p>
<p>As generated by R&#8217;s boxplot function.  I individually labeled the median, quartiles, min, max, and outliers for inclusion in a presentation where the audience can&#8217;t be assumed to know how to interpret box plots.  Please feel free to use this image if you have a similar need.</p>
<p>In text, shamelessly stolen from a <a href='http://chartsgraphs.wordpress.com/2008/11/18/boxplots-r-does-them-right/'> climate blog</a>,</p>
<blockquote><p>
The rectangle shows the interquartile range (IQR); it goes from the first quartile (the 25th percentile) to the third quartile (the 75th percentile). The whiskers go from the minimum value to the maximum value unless the distance from the minimum value to the first quartile is more than 1.5 times the IQR. In that case the whisker extends out to the smallest value within 1.5 times the IQR from the first quartile. A similar rule is used for values larger than 1.5 times IQR from the third quartile. A special symbol shows the values, called outliers, which are smaller or larger than the whiskers
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/09/labeled-boxplot-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Calculating the hessian of the logistic log likelihood</title>
		<link>http://blog.earlh.com/index.php/2011/09/calculating-the-hessian-of-the-logistic-log-likelihood/</link>
		<comments>http://blog.earlh.com/index.php/2011/09/calculating-the-hessian-of-the-logistic-log-likelihood/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 03:19:25 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Classifiers]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=743</guid>
		<description><![CDATA[I may be the only person who feels this way, but it&#8217;s awfully easy to read a paper or a book, see some equations, think about them a bit, then sort of nod your head and think you understand them. &#8230; <a href="http://blog.earlh.com/index.php/2011/09/calculating-the-hessian-of-the-logistic-log-likelihood/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I may be the only person who feels this way, but it&#8217;s awfully easy to read a paper or a book, see some equations, think about them a bit, then sort of nod your head and think you understand them.  However, when you go to actually implement them you look back and the jump from the symbols on the page to code that runs on a computer is little bigger than you thought.  So this is mostly me thinking aloud, but I was reading about optimization functions that rely on the hessian and I wrote this out to make sure I understood this well enough to calculate it if I want.</p>
<p>I picked some random training data.  First, we set up the design matrix x, the dependent variable y, and the theta (or beta) at which we will evaluate the hessian:<br />
<center><br />
\(<br />
x = \left[ \begin{array}{ccc}<br />
2 &#038; 3 \\ 4 &#038; 7 \\ 5 &#038; 6 \end{array} \right];<br />
y = \left[ \begin{array}{c} 0 \\ 1 \\ 1 \end{array} \right];<br />
\theta = \left[ \begin{array}{c} 1 \\ 2 \end{array} \right]<br />
\)<br />
</center></p>
<pre class="brush:text">
y <- matrix(nrow=3, c(0, 1, 1))
x <- matrix(nrow=3, ncol=2, byrow=T, c(2, 3, 4, 7, 5, 6))
theta <- matrix(nrow=2, c(1,2))
</pre>
<p>Typically you'd put a column of ones on the left of the matrix as an intercept term, but I didn't set my problem up that way.  The Hessian is the n by n matrix of 2nd derivatives of a scalar valued function.   In our case, there are two parameters, ie two explanatory variables, so<br />
<center><br />
\(<br />
H[l] = \left[ \begin{array}{cc}<br />
\frac{\partial^2 l}{\partial \theta_1^2} &#038; \frac{\partial^2 l}{\partial \theta_1\,\partial \theta_2}  \\<br />
\frac{\partial^2 l}{\partial \theta_2\,\partial \theta_1} &#038; \frac{\partial^2 l}{\partial \theta_2^2}<br />
\end{array} \right]<br />
\)<br />
</center></p>
<p>Note that we denote the ith of m training example as<br />
<center><br />
\((x^{(i)}, y^{(i)}), i = 1\ldots m\)<br />
</center><br />
the superscript in parentheses is not exponentiation.  Here x is a column vector and y is either 0 or 1.</p>
<p>We also need some functions.  R supports closures so you don't have to pass x and y around.<br />
<center><br />
\(<br />
g(\theta; x) = \frac{ 1 }{ 1 + \exp^{ - x^{ \mathrm{T} }\theta } }<br />
\)<br />
\(<br />
l(\theta; x, y) = \sum_{i=1}^{m} y^{(i)} \log( g(x^{(i)})) + (1 - y^{(i)})\log(1 - g(x^{(i)}))<br />
\)<br />
</center></p>
<pre class="brush:text">
g <- function(x, theta) 1 / (1 + exp(-1 * x %*% theta))

logistic_loglik <- function(theta){
  sum(log(g(x, theta)) * y) + sum((1 - y) * log(1 - g(x, theta)))
}
</pre>
<p>Finally, we can use the numDeriv package to calculate the Hessian and compare with a hand calculation:</p>
<pre class="brush:text">
require (numDeriv)

H <- hessian(logistic_loglik, theta)

#
# hand calculate the hessian at theta just to make sure I understand
#
m <- nrow(x) 		# ie number of training examples
H_hand <- matrix(nrow=nrow(theta), ncol=nrow(theta))
for (row in 1:nrow(H_hand)){
	for (col in 1:ncol(H_hand)){
		H_hand[ row, col ] <- 0
		for (j in 1:m){
			h <- g(x[j, ], theta)
			H_hand[row, col ] <- H_hand[ row, col ] + x[j, row] * x[j, col] * h * (1 - h)
		}
	}
}
H_hand <- H_hand * -1

# error
err <- norm(H - H_hand)
print(sprintf('error: norm = %f', err))
</pre>
<p>You can clearly see why any optimization algorithm requiring the hessian will be slow; you iterate over every training example once for each explanatory variable.</p>
<p>Also, MathJax is an awesome and painless <a href='http://www.mathjax.org/'> latex for your blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/09/calculating-the-hessian-of-the-logistic-log-likelihood/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Saving both stdout and stderr while echoing to screen</title>
		<link>http://blog.earlh.com/index.php/2011/08/saving-both-stdout-and-stderr-while-echoing-to-screen/</link>
		<comments>http://blog.earlh.com/index.php/2011/08/saving-both-stdout-and-stderr-while-echoing-to-screen/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 07:11:15 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[bash]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=727</guid>
		<description><![CDATA[As mentioned before, tee, a useful but horridly named utility, allows you to save stdout while echoing it to the screen. Sometimes, however, you need both stderr and stdout. Bash allows you to combine stderr and stdout by appending 2&#62;&#38;1 &#8230; <a href="http://blog.earlh.com/index.php/2011/08/saving-both-stdout-and-stderr-while-echoing-to-screen/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As mentioned before, <a href="http://blog.earlh.com/index.php/2011/07/saving-output-of-a-command-and-echoing-to-the-screen/">tee</a>, a useful but horridly named utility, allows you to save stdout while echoing it to the screen.  Sometimes, however, you need both stderr and stdout.  Bash allows you to combine stderr and stdout by appending 2&#62;&amp;1 to your command.  Thus</p>
<pre class="brush:bash;">
$ hadoop --jar $J/job.jar --job asdf 2>&#038;1 | tee -a log.asdf.00
</pre>
<p>saves both stderr and stdout, correctly temporally interleaved, to the specified log file.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/08/saving-both-stdout-and-stderr-while-echoing-to-screen/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Advantage made my cat have seizures</title>
		<link>http://blog.earlh.com/index.php/2011/08/advantage-made-my-cat-have-seizures/</link>
		<comments>http://blog.earlh.com/index.php/2011/08/advantage-made-my-cat-have-seizures/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 06:32:59 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Cat Blogging]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=717</guid>
		<description><![CDATA[For all of the web searchers, apparently Advantage can make cats have seizures. Six months ago Lucy the cat got fleas, most likely from the vet where she&#8217;d been for some minor surgery. We applied canine advantage to our dog &#8230; <a href="http://blog.earlh.com/index.php/2011/08/advantage-made-my-cat-have-seizures/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>For all of the web searchers, apparently Advantage can make cats have seizures.</p>
<p>Six months ago Lucy the cat got fleas, most likely from the vet where she&#8217;d been for some minor surgery.  We applied canine advantage to our dog and feline advantage to the cat.  Within minutes after applying the frontline to the cat, she had a small seizure.  I&#8217;d had her 12 years at the time and, to the best of my knowledge, she&#8217;d never had a seizure before.  We decided never to reapply Advantage to her.</p>
<p>This week, our dog got fleas.  We applied canine advantage to the dog and nothing to the cat.  She had a seizure a day later, her first seizure, to the best of our knowledge, since the last time we used advantage.  Our vet thinks it may be because she likes to sleep in his upstairs kennel on his bed during the day because it gets a lot of sun.</p>
<p>These are the only two times I know of her having a seizure since I adopted her in October 1999.  So if you find this via google, you&#8217;re not the only ones and I&#8217;d stay away from advantage, or at least closely monitor your cat if you use it.  Our vet recommended Frontline for the next time either gets fleas; if we have to use it, I&#8217;ll post again.</p>
<div id="attachment_720" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.earlh.com/wp-content/uploads/2011/08/IMG_20110318_111141.jpeg"><img class="size-medium wp-image-720" title="Lucy sitting on my chest" src="http://blog.earlh.com/wp-content/uploads/2011/08/IMG_20110318_111141-300x225.jpg" alt="" width="300" height="225" /></a><p class="wp-caption-text">Lucy</p></div>
<p>Other accounts of cats having seizures after exposure to advantage are available <a href="http://www.purelypets.com/wwwboard-1/messages/1351.html">here</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/08/advantage-made-my-cat-have-seizures/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bash tricks: drop the first line of a file</title>
		<link>http://blog.earlh.com/index.php/2011/07/bash-tricks-drop-the-first-line-of-a-file/</link>
		<comments>http://blog.earlh.com/index.php/2011/07/bash-tricks-drop-the-first-line-of-a-file/#comments</comments>
		<pubDate>Wed, 27 Jul 2011 05:27:22 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[bash]]></category>
		<category><![CDATA[Data Munging]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=711</guid>
		<description><![CDATA[I work with a bunch of data that often comes in text files. I regularly want to cut off the header / first line, but I thought that to use tail you had to know how many lines are in &#8230; <a href="http://blog.earlh.com/index.php/2011/07/bash-tricks-drop-the-first-line-of-a-file/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I work with a bunch of data that often comes in text files.  I regularly want to cut off the header / first line, but I thought that to use tail you had to know how many lines are in your file.  It turns out that if you just use</p>
<pre class="brush:bash">
$ tail -n +2 [file]
</pre>
<p>it will just skip the first line without forcing you to know how many lines there are.  </p>
<pre class="brush:bash;">
earl $ cat a.csv
1
2
3
4
earl $ tail -n +2 a.csv
2
3
4
</pre>
<p>This is much more convenient for piping into awk or other commands downstream.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/07/bash-tricks-drop-the-first-line-of-a-file/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Amazon price jumps</title>
		<link>http://blog.earlh.com/index.php/2011/07/amazon-price-jumps/</link>
		<comments>http://blog.earlh.com/index.php/2011/07/amazon-price-jumps/#comments</comments>
		<pubDate>Sun, 24 Jul 2011 04:37:24 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=704</guid>
		<description><![CDATA[Does anyone else regularly see large price swings on Amazon? I&#8217;m in the habit of adding books to my shopping cart until I run out of things to read at home, then buying whatever is in my cart at the &#8230; <a href="http://blog.earlh.com/index.php/2011/07/amazon-price-jumps/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.earlh.com/wp-content/uploads/2011/07/Amazon.com-Shopping-Cart-20110723-price-jumps.jpg"><img src="http://blog.earlh.com/wp-content/uploads/2011/07/Amazon.com-Shopping-Cart-20110723-price-jumps.jpg" alt="" title="Amazon.com Shopping Cart - 20110723 price jumps" width="576" height="397" class="aligncenter size-full wp-image-705" /></a></p>
<p>Does anyone else regularly see large price swings on Amazon?  I&#8217;m in the habit of adding books to my shopping cart until I run out of things to read at home, then buying whatever is in my cart at the time.  Thus I have books sitting in my cart for a month or so and every time you visit your cart amazon notifies you if prices have changed.  Managing Humans dropped by $9 and The Algorithm Design Manual jumped by $25.  The next two price changes are more characteristic, moving around by a couple cents.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/07/amazon-price-jumps/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Saving output of a command and echoing to the screen</title>
		<link>http://blog.earlh.com/index.php/2011/07/saving-output-of-a-command-and-echoing-to-the-screen/</link>
		<comments>http://blog.earlh.com/index.php/2011/07/saving-output-of-a-command-and-echoing-to-the-screen/#comments</comments>
		<pubDate>Tue, 19 Jul 2011 20:33:20 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[bash]]></category>
		<category><![CDATA[Data Munging]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=699</guid>
		<description><![CDATA[When using bash, it&#8217;s really nice to both save the output of a command to a file and print it on the screen. I couldn&#8217;t find something that did this so I wrote my own ruby script. A utility that &#8230; <a href="http://blog.earlh.com/index.php/2011/07/saving-output-of-a-command-and-echoing-to-the-screen/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When using bash, it&#8217;s really nice to both save the output of a command to a file and print it on the screen.  I couldn&#8217;t find something that did this so I wrote my own ruby script.  A utility that does exactly what you want is actually included in a standard linux install, but with a filename that I simply couldn&#8217;t google.  tee does what you need:</p>
<pre class="brush:bash;">
$ hadoop --jar $J/job.jar --job asdf | tee -a log.asdf.00
</pre>
<p>saves stdout to log.job.00 and echoes to screen</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/07/saving-output-of-a-command-and-echoing-to-the-screen/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Online display advertising ecosystem</title>
		<link>http://blog.earlh.com/index.php/2011/07/online-display-advertising-ecosystem/</link>
		<comments>http://blog.earlh.com/index.php/2011/07/online-display-advertising-ecosystem/#comments</comments>
		<pubDate>Tue, 19 Jul 2011 20:08:37 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Computational Advertising]]></category>
		<category><![CDATA[Online Advertising]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=666</guid>
		<description><![CDATA[The biggest divide in the online advertising world is search advertising vs display advertising, and search sounds exactly like what it is &#8212; search is generally the ads next to searches on Google, Yahoo, Bing, etc. Search is bigger than &#8230; <a href="http://blog.earlh.com/index.php/2011/07/online-display-advertising-ecosystem/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.earlh.com/wp-content/uploads/2011/07/display_advertising_ecosystem_011011.png"><img src="http://blog.earlh.com/wp-content/uploads/2011/07/display_advertising_ecosystem_011011-1024x741.png" alt="" title="display_advertising_ecosystem_011011" width="940" height="680" class="aligncenter size-large wp-image-667" /></a></p>
<p>The biggest divide in the online advertising world is search advertising vs display advertising, and search sounds exactly like what it is &#8212; search is generally the ads next to searches on Google, Yahoo, Bing, etc.  Search is bigger than display by revenues <a href='#link_01'>[1]</a>, and much more concentrated. The nice benefit of search is that it corresponds much better to intent than display advertising &#8212; when you search for hotels in Palm Springs, you&#8217;re most likely in the market for a hotel in Palm Springs, etc. The other thing search has going for it is it&#8217;s easy and quantifiable &#8212; you can sign into google adwords with nothing more than your credit card, type up some text ads, and be running quantifiable campaigns the next day.</p>
<p>The display advertising world is structured differently. Display ads are obviously those pictures you see plastered all over the sites you visit. There&#8217;s less raw intent so individual impressions earn a lot less money.  Typically the amounts are measured as cpm, cost per mille, ie cost per 1k impressions. It&#8217;s also important to understand the structure of the market a bit &#8212; in the beginning (90s), people pretty naively bought ad impressions in units of 1k. Performance was often evaluated based on pure impressions or ctr, click through rates.  Ads were often sold on the basis of quantifiability &#8212; you could, for the first time, measure how many ads were seen (in the sense that a user loaded the page), who clicked, how often, where he or she went, etc. As search advertising evolved, I think a lot of the people chasing quantifiable advertising moved to that, while display became more about branding. This, btw, is the value of facebook &#8212; brand advertisers want to be able to precisely target age, gender, income, and other demographics; on facebook, users freely and generally accurately share this information.</p>
<p>The display ecosystem has a bunch of moving pieces. If you look at the display advertising tech landscape graphic from Luma Partners, you&#8217;ll see:</p>
<p><b>Agencies</b> &#8212; these are the (7?) big advertising agencies that most large accounts go through. Companies like Toyota, GM, General Mills, etc, will give these companies 10s to 100s of millions of dollars to run ad campaigns on their behalf. </p>
<p><b> Media Buying Desks</b> &#8212; The ad agencies weren&#8217;t really capable of managing digital campaigns. That is, when ad agencies came about, your media outlets were maybe 10 national TV networks, radio stations, local newspapers, and a couple national magazines. The media buying process was pretty simple &#8212; the agencies would send out an RFP that said eg we want manly men in their 40s who buy outdoorsy cologne and the aforementioned publishers would respond and say how their audience matched that profile. Compare this to the online world &#8212; there are thousands of premier publishers such as the NYT, ESPN, online magazine versions, etc. Trafficking ads is an order of magnitude more work &#8212; buy what, where, on which site, when, with what creatives, etc. So the agencies built or bought companies that have the capability to build digital media, traffic campaigns, etc. Eg Vivaki is Publicis, b3 is WPP, etc.</p>
<p><b>Ad Exchanges</b>  are remnant ad sources. Basically, there is premier and remnant inventory. Premier inventory is something like display ads on high quality reporting on ESPN or ads on articles on ars technica. These are often sold by in house salespeople in a process remarkably similar to how everything used to work, though people mostly email pdfs instead of sending faxes. Every ad impression that isn&#8217;t sold as premier is referred to as remnant, and these remnant impressions are offered to ad exchanges such as Right Media &#8212; rmx, owned by Yahoo &#8212; in exchange for a cut. So the way this works is I can buy, with some rules, 1MM impressions on rmx and rmx will put these impressions on their publishers such as ESPN in ad impressions that ESPN didn&#8217;t sell. These impressions go for an order of magnitude less money than premier. RMX is one of the more technically sophisticated. The benefit for publishers is they get some money for inventory they didn&#8217;t fill. Just to be clear, a good ecpm for premier might be $20-$40 and a good ecpm for remnant might be $3-$5.</p>
<p><b>DSP</b> ie Demand Site Platform &#8212; There isn&#8217;t necessarily a common definition of DSP. I&#8217;d say they are more technically advanced ad exchanges that are starting to blur the lines between remnant and premier ads. They also help you manage line items and creative and everything else. The other thing DSPs do is they help advertisers that aren&#8217;t big enough to go to one of the big 7 agencies. This might be advertisers spending $10 &#8211; $50k / month, like your local Toyota dealership instead of the national dealer chain, etc.</p>
<p><b>Ad Servers</b> &#8212; these help publishers. See eg OpenX, DoubleClick Dart, etc. Particularly for larger publishers, coordinating all these ad purchases is complicated. Your advertisers want to give you rules, such as user bleaching rules (only so many impressions to a given user per some amount of time), time of day, what pages an ad can run on (few people want to run next to naked folks, etc). They also want to be able to update and optimize their creatives or even change the creatives or the landing page they go to. Advertisers, or their agencies, also demand reporting &#8212; how many times was an ad seen. On what pages did users click on the ad. etc. Within publishers the ad sales or monetization folks don&#8217;t want to be releasing the site every time they tweak ads. Ad servers are internal or external software that manages all this and can be quite complex.</p>
<p><b>Data optimization</b> &#8212; this requires some explanation. In the beginning, people basically bought broad swaths of display ads. The value to optimization is the more targeted you can make your ad, the more value it has. My favorite example is espn &#8212; say 10% of their online audience is female. Say you&#8217;re an advertiser that wants to sell female sports jerseys, your ctr amongst women is 5%, your conversion rate is 5%, and a conversion is worth $50, your value per 1k impressions is 1000 * .1 * .05 * .05 * 50 = $12.5, so your cpm has to be < $12.50. However, say I could pick out the women (with some error, obviously), but say I can enrich the demos so that women are 50% of your impressions. Suddenly advertising on espn is worth 5 times as much for the advertiser and hence espn can charge 5 times as much. This is the value of data optimization. It's performed many ways -- from things as simple as geo targeting, day parting, to more sophisticated demographic estimation, retargeting, behavioral retargeting, etc.</p>
<p><b> Retargeters</b> &#8212; Retargeting is a simple idea.  Say that I see cookies going to a site like a bmw forum.  I might reasonably intuit that these cookies are interested in bmws and choose to show bmw display ads to these cookies as they browse the internet.</p>
<p>Behavioral retargeting is the next step of retargeting &#8212; retargeting is nice, but it suffers from a couple flaws.  First, it has limited reach, ie there are only so many cookies that go to a bmw forum.  BMW probably wants to reach more purchasers than just those.  Second, it doesn&#8217;t really help generate intent &#8212; if you&#8217;re going to a bmw forum, you&#8217;re probably already pretty interested in bmws, so that may not be the best person for bmw to advertise to.  Behavioral retargeting means any of a variety of ways of trying to figure out cookies to advertise to to get broader reach or cheaper acquisitions than retargeting.</p>
<p>The other big movement going on in the display world is the evolution of how people buy ads.  In the early days &#8212; 90s &#8212; people tended to buy online ads in a high touch process with salespeople.  Ad networks started which brought more buyers and fewer salespeople.  Companies like Right Media &#8212; which Yahoo bought &#8212; started and allowed you to create bidding rules that run on their servers so advertisers can buy ads.  So I can say that I want to, across many websites, target cookies that have visited a site or set of sites (retargeting), or show them so many ads per day, etc, and based on a variety of characteristics of a cookie and the site which that cookie is visiting and the web page they are viewing, $X is my bid for that cookie.  RTB or real time bidding is the new new &#8212; now, instead of giving limited rules to someone like RMX, you register with Google (the largest RTB platform) or Yahoo&#8217;s RTB, and their servers, for each impression, send you a bid request.  Your computers located in a server farm near their servers are given typically 100ms to respond with a per impression bid for that cookie on that page and that impression.  See <a href='#link_02'>[2]</a>.  Also, this is obviously an enormous tech investment.  DSPs also help with this; most companies aren&#8217;t capable nor is it worthwhile to build this out in house.</p>
<p>NB: I work for one of these companies.  My posts are not now, have never been, and probably never will be the official opinion of my employers nor endorsed by any of them. You should not infer anything about my employers from anything I write.  Seriously.  Don&#8217;t be a tool.</p>
<p id='link_01'>
[1] <a href='http://techcrunch.com/2011/05/27/online-advertising-revenues-up-23-percent-since-q1-2010-reach-7-3-billion/'> Techcrunch: Online Advertising Revenues Up 23 Percent Since Q1 2010, Reach $7.3 Billion</a></p>
<p id='link_02'>
[2] <a href='http://www.businessinsider.com/real-time-bidding-2010-8'> Business Insider: The Rise Of Real-Time Bidding Is The Biggest Online Advertising Story Of 2010</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/07/online-display-advertising-ecosystem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unique is broken in R</title>
		<link>http://blog.earlh.com/index.php/2011/07/unique-is-broken-in-r/</link>
		<comments>http://blog.earlh.com/index.php/2011/07/unique-is-broken-in-r/#comments</comments>
		<pubDate>Sun, 10 Jul 2011 04:38:12 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Programming Languages Suck]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R Tip]]></category>
		<category><![CDATA[Suck]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=686</guid>
		<description><![CDATA[Are you kidding me? $ R > unique(1,1,2,3,4) [1] 1 This was the source of yesterday&#8217;s nasty to track down bug. What you really want is unique on a vector, as in: > unique(c(1,1,2,3,4)) [1] 1 2 3 4 I &#8230; <a href="http://blog.earlh.com/index.php/2011/07/unique-is-broken-in-r/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Are you kidding me?</p>
<pre class="brush:plain">
$ R
> unique(1,1,2,3,4)
[1] 1
</pre>
<p>This was the source of yesterday&#8217;s nasty to track down bug.  What you really want is unique on a vector, as in:</p>
<pre class="brush:plain">
> unique(c(1,1,2,3,4))
[1] 1 2 3 4
</pre>
<p>I can&#8217;t believe someone decided to let this silently fail in the manner most likely to screw the user.  Note that other functions such as sum and max behave as expected:</p>
<pre class="brush:plain">
> max(1,2,3,4) == max(c(1,2,3,4))
[1] TRUE
> sum(1,2,3,4) == sum(c(1,2,3,4))
[1] TRUE
> 
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/07/unique-is-broken-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Formatted numbers in Ruby</title>
		<link>http://blog.earlh.com/index.php/2011/06/formatted-numbers-in-ruby/</link>
		<comments>http://blog.earlh.com/index.php/2011/06/formatted-numbers-in-ruby/#comments</comments>
		<pubDate>Thu, 30 Jun 2011 02:19:42 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Programming Languages Suck]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=651</guid>
		<description><![CDATA[In C or C++, it&#8217;s can be a pain to get thousands separators in printf. In ruby, it can be trivial, as long as you use the right libraries. If you have ActiveSupport installed (which I believe comes with Rails), &#8230; <a href="http://blog.earlh.com/index.php/2011/06/formatted-numbers-in-ruby/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In C or C++, it&#8217;s can be a pain to get <a href='http://blog.earlh.com/index.php/2011/06/thousands-separator-in-printf-in-c/'> thousands separators in printf</a>.  In ruby, it can be trivial, as long as you use the right libraries.  If you have ActiveSupport installed (which I believe comes with Rails), you&#8217;re all set.  Note that you don&#8217;t have to be using Rails; this will work in a plain ruby script.</p>
<pre class="brush:ruby;">
$ irb
irb(main):001:0> require 'action_view'
=> true
irb(main):002:0> include ActionView::Helpers::NumberHelper
=> Object
irb(main):003:0> number_with_delimiter(123456)
=> "123,456"
irb(main):004:0> number_to_human(123456)
=> "123 Thousand"
</pre>
<p>number_with_delimiter is great, and number_to_human is a nice bonus.</p>
<p>For the record, software versions and the install command:</p>
<pre class="brush:bash">
$ ruby --version
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.7.4]
$ gem --version
1.3.7
$ gem install actionpack
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/06/formatted-numbers-in-ruby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

