<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stochastic Nonsense &#187; R</title>
	<atom:link href="http://blog.earlh.com/index.php/category/r/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.earlh.com</link>
	<description></description>
	<lastBuildDate>Mon, 19 Sep 2011 03:30:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Labeled boxplot in R</title>
		<link>http://blog.earlh.com/index.php/2011/09/labeled-boxplot-in-r/</link>
		<comments>http://blog.earlh.com/index.php/2011/09/labeled-boxplot-in-r/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 03:24:03 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=739</guid>
		<description><![CDATA[As generated by R&#8217;s boxplot function. I individually labeled the median, quartiles, min, max, and outliers for inclusion in a presentation where the audience can&#8217;t be assumed to know how to interpret box plots. Please feel free to use this &#8230; <a href="http://blog.earlh.com/index.php/2011/09/labeled-boxplot-in-r/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.earlh.com/wp-content/uploads/2011/09/labeled-boxplot-02.jpg"><img src="http://blog.earlh.com/wp-content/uploads/2011/09/labeled-boxplot-02.jpg" alt="boxplot with labeled parts" title="labeled boxplot 02" width="602" height="792" class="aligncenter size-full wp-image-740" /></a></p>
<p>As generated by R&#8217;s boxplot function.  I individually labeled the median, quartiles, min, max, and outliers for inclusion in a presentation where the audience can&#8217;t be assumed to know how to interpret box plots.  Please feel free to use this image if you have a similar need.</p>
<p>In text, shamelessly stolen from a <a href='http://chartsgraphs.wordpress.com/2008/11/18/boxplots-r-does-them-right/'> climate blog</a>,</p>
<blockquote><p>
The rectangle shows the interquartile range (IQR); it goes from the first quartile (the 25th percentile) to the third quartile (the 75th percentile). The whiskers go from the minimum value to the maximum value unless the distance from the minimum value to the first quartile is more than 1.5 times the IQR. In that case the whisker extends out to the smallest value within 1.5 times the IQR from the first quartile. A similar rule is used for values larger than 1.5 times IQR from the third quartile. A special symbol shows the values, called outliers, which are smaller or larger than the whiskers
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/09/labeled-boxplot-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unique is broken in R</title>
		<link>http://blog.earlh.com/index.php/2011/07/unique-is-broken-in-r/</link>
		<comments>http://blog.earlh.com/index.php/2011/07/unique-is-broken-in-r/#comments</comments>
		<pubDate>Sun, 10 Jul 2011 04:38:12 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Programming Languages Suck]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R Tip]]></category>
		<category><![CDATA[Suck]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=686</guid>
		<description><![CDATA[Are you kidding me? $ R > unique(1,1,2,3,4) [1] 1 This was the source of yesterday&#8217;s nasty to track down bug. What you really want is unique on a vector, as in: > unique(c(1,1,2,3,4)) [1] 1 2 3 4 I &#8230; <a href="http://blog.earlh.com/index.php/2011/07/unique-is-broken-in-r/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Are you kidding me?</p>
<pre class="brush:plain">
$ R
> unique(1,1,2,3,4)
[1] 1
</pre>
<p>This was the source of yesterday&#8217;s nasty to track down bug.  What you really want is unique on a vector, as in:</p>
<pre class="brush:plain">
> unique(c(1,1,2,3,4))
[1] 1 2 3 4
</pre>
<p>I can&#8217;t believe someone decided to let this silently fail in the manner most likely to screw the user.  Note that other functions such as sum and max behave as expected:</p>
<pre class="brush:plain">
> max(1,2,3,4) == max(c(1,2,3,4))
[1] TRUE
> sum(1,2,3,4) == sum(c(1,2,3,4))
[1] TRUE
> 
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/07/unique-is-broken-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding the sort order of an array in R or Ruby</title>
		<link>http://blog.earlh.com/index.php/2011/06/finding-the-sort-order-of-an-array-in-r-or-ruby/</link>
		<comments>http://blog.earlh.com/index.php/2011/06/finding-the-sort-order-of-an-array-in-r-or-ruby/#comments</comments>
		<pubDate>Sun, 26 Jun 2011 21:58:18 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Programming Languages Suck]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R Tip]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=628</guid>
		<description><![CDATA[Suppose you have an array that you&#8217;d like to sort by another array. A common use case might be a set of arrays of somethings and for each something you generate a score in say [0,1]. Now you&#8217;d like to &#8230; <a href="http://blog.earlh.com/index.php/2011/06/finding-the-sort-order-of-an-array-in-r-or-ruby/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Suppose you have an array that you&#8217;d like to sort by another array.  A common use case might be a set of arrays of somethings and for each something you generate a score in say [0,1].  Now you&#8217;d like to sort your somethings by their scores.</p>
<p>Concretely, say you have an array of scores:</p>
<pre class="brush:text";>
scores: [0.3347867, 0.9069004, 0.4391635, 0.8376249, 0.7133011]
indices: [0, 1, 2, 3, 4]
</pre>
<p>and you want the indices of the sorted scores, ie</p>
<pre class="brush:text";>
scores_sorted: [0.3347867, 0.4391635, 0.7133011, 0.8376249, 0.9069004]
indices_sorted: [0, 2, 4, 3, 1]
</pre>
<p>in R, you can always use order, as in</p>
<pre class="brush:text";>
$ R
> scores <- runif(5)
> perm <- order(scores)
> data.frame(score=scores, order=perm)
      score order
1 0.3347867     1
2 0.9069004     3
3 0.4391635     5
4 0.8376249     4
5 0.7133011     2
>
> # and just to check
> scores[ perm ]
[1] 0.3347867 0.4391635 0.7133011 0.8376249 0.9069004
</pre>
<p>You can do something similar in ruby:</p>
<pre class="brush:ruby";>
irb > scores = [0.3347867, 0.9069004, 0.4391635, 0.8376249, 0.7133011]
=> [0.3347867, 0.9069004, 0.4391635, 0.8376249, 0.7133011]
irb > scores.zip( (1..scores.length).to_a )
=> [[0.3347867, 1], [0.9069004, 2], [0.4391635, 3], [0.8376249, 4], [0.7133011, 5]]
irb >
irb > scores.zip( (1..scores.length).to_a ).sort_by{ |e| e.first }
=> [[0.3347867, 1], [0.4391635, 3], [0.7133011, 5], [0.8376249, 4], [0.9069004, 2]]
irb >
irb > perm = scores.zip( (1..scores.length).to_a ).sort_by{ |e| e.first }.map{ |e| e[1] - 1 }
=> [0, 2, 4, 3, 1]
irb >
irb > scores.values_at(*perm)
=> [0.3347867, 0.4391635, 0.7133011, 0.8376249, 0.9069004]
</pre>
<p>And finally in C++, you can leverage qsort_r; this function was designed to be a reentrant / threadsafe qsort so you&#8217;re given a void* to pass a block of memory into your comparison function.  You can use this to sort the indices array by the scores:</p>
<pre class="brush:cpp">
#include<stdlib.h>

// [...]
// utility fns that join an array of u (unsigned int) or f (double) into a string
char* vsprintf_u(char* buff, unsigned int* array, unsigned int len){
	char* orig = buff; buff += sprintf(buff, "["); for (int i=0; i < len; i++) buff += sprintf(buff, "%3u, ", array[i]); sprintf(buff, "]");
	return orig;
}
char* vsprintf_f(char* buff, double* array, unsigned int len){
	char* orig = buff;
	buff += sprintf(buff, "["); for (int i=0; i < len; i++) buff += sprintf(buff, "%1.4f, ", array[i]); sprintf(buff, "]");
	return orig;
}

/**
 * qsort_r comparison fn: sort array indices by scores
 */
int score_comparator(void* scoresv, const void* leftv, const void* rightv){
	unsigned int* left = (unsigned int*)leftv;
	unsigned int* right = (unsigned int*)rightv;
	double* scores = (double*)scoresv;

	if (scores[ *left ] < scores[ *right ])
		return -1;
	else if (scores[ *left ] == scores[ *right ])
		return 0;
	return 1;
}

// [...]

printf("test code:\n");
double scores[5] = {0.3347867, 0.9069004, 0.4391635, 0.8376249, 0.7133011};
unsigned int perm[] = {0, 1, 2, 3, 4};
char buff[4192];

printf("presort:  %s\n", vsprintf_f(buff, scores, 5));
printf("presort:  %s\n", vsprintf_u(buff, perm, 5));

qsort_r(perm, 5, sizeof(unsigned int), scores, &#038;score_comparator);

printf("postsort: %s\n", vsprintf_u(buff, perm, 5));
printf("postsort: [");
for (unsigned int i=0; i < 5u; i++)
	printf("%1.4f, ", scores[ perm[ i ] ]);
printf("]\n");
</pre>
<p>which produces when run</p>
<pre class="brush:text">
$ ./a.out
presort:  [0.3348, 0.9069, 0.4392, 0.8376, 0.7133, ]
presort:  [  0,   1,   2,   3,   4, ]
postsort: [  0,   2,   4,   3,   1, ]
postsort: [0.3348, 0.4392, 0.7133, 0.8376, 0.9069, ]
</pre>
<p>Note the canonical way to sort somethings by a float in c++ is to bang everything into a struct or class and leverage qsort on the structs/classes directly.  However, this is often pretty inconvenient, and if you have a lot of whatever you want to sort, it's too memory intensive to put everything into structs/classes with the sole addition of your score field.</p>
<p>I think it's obvious why I prefer to program in R.</p>
<p>NB: I am developing for OS X; if you are targeting linux you'll have to figure out how to link qsort_r yourself.  I think someone also decided to permute the argument order.  Sigh.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/06/finding-the-sort-order-of-an-array-in-r-or-ruby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting the value of a variable from a string in R</title>
		<link>http://blog.earlh.com/index.php/2011/06/getting-the-value-of-a-variable-from-a-string-in-r/</link>
		<comments>http://blog.earlh.com/index.php/2011/06/getting-the-value-of-a-variable-from-a-string-in-r/#comments</comments>
		<pubDate>Sat, 25 Jun 2011 21:50:10 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[R Tip]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=618</guid>
		<description><![CDATA[It&#8217;s often convenient to use reflection to get the value of a variable from the name as a string. In R, you can use the get function to do this. In R : blog $ R > x = 3 &#8230; <a href="http://blog.earlh.com/index.php/2011/06/getting-the-value-of-a-variable-from-a-string-in-r/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s often convenient to use reflection to get the value of a variable from the name as a string.  In R, you can use the get function to do this.</p>
<p>In R :</p>
<pre class="brush: text">
blog $ R
> x = 3
> get('x')
[1] 3
>
</pre>
<p>In Ruby:</p>
<pre class="brush:ruby">
blog $ irb
irb(main):001:0> x = 3
=> 3
irb(main):002:0> eval 'x'
=> 3
</pre>
<p>though Ruby&#8217;s eval is more general, and is equivalent to <a href="http://blog.earlh.com/index.php/2009/06/eval-in-r-running-code-from-a-string/"> eval in R</a> allowing you to evaluate arbitrary code in a string.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2011/06/getting-the-value-of-a-variable-from-a-string-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interactive Plotting in R</title>
		<link>http://blog.earlh.com/index.php/2010/01/interactive-plotting-in-r/</link>
		<comments>http://blog.earlh.com/index.php/2010/01/interactive-plotting-in-r/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 21:53:34 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Data Munging]]></category>
		<category><![CDATA[Plotting]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R Tip]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=538</guid>
		<description><![CDATA[There are many ways to compare univariate distributions; one of my favorites is violin plots. However, if you are only comparing two distributions, then the best solution is often a scatter plot. To that end, I&#8217;ve build some code that &#8230; <a href="http://blog.earlh.com/index.php/2010/01/interactive-plotting-in-r/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>There are many ways to compare univariate distributions; one of my favorites is <a href="http://blog.earlh.com/index.php/2009/07/visualizing-and-comparing-distributions-part-8-of-a-series/">violin plots</a>.  However, if you are only comparing two distributions, then the best solution is often a scatter plot.  To that end, I&#8217;ve build some code that creates an interactive scatter plot of two distributions and allows you to interactively print arbitrary strings on the graph when you select / deselect points.  This creates a slightly kludgy but very handy tool for hand comparing distributions.</p>
<p>Unfortunately, truly interactive plotting isn&#8217;t really a part of R and you are thus forced to lean on external tools.  I picked JGR, the java gui for R.  This is best used by getting the <a href="http://jgr.markushelbig.org/Download.html">JGR launch tool</a>.</p>
<p>Basically, I have data with multiple tests; a single line shows the results for one item across several tests.  I wish to compare the distributions.</p>
<pre class="brush:text;">
> head(age)
    name    default      test1      test2
1 item 1 0.02110710 0.01900870 0.02030870
2 item 2 0.03160770 0.02926650 0.03345660
3 item 3 0.03909570 0.03702500 0.04016650
4 item 4 0.00262195 0.00225917 0.00302822
5 item 5 0.01668860 0.01555010 0.01783400
6 item 6 0.04223370 0.03904630 0.04123270
</pre>
<p><a href='http://blog.earlh.com/wp-content/uploads/2010/01/iplot.test_.csv_.txt'>test data</a></p>
<p>You can use this function to throw up a window, and allow you to draw a box around items to see their information displayed in the upper left.</p>
<pre class="brush:text;">
  library('iplots')

  visCompare <- function(dat, xname, yname){  

    # override this to display your preferred text
    makeDispString <- function(row){
      sprintf('%s : %s = %0.3f; %s = %0.3f; diff = %0.3f', row$name,
        xname, row[[xname]], yname, row[[yname]], row[[xname]] - row[[yname]])
    }

    ypoint <- 0.05 + max(dat[[yname]])

    iplot(x=dat[[xname]], y=dat[[yname]], xlab=xname, ylab=yname,
      ylim=c(0, ypoint + 0.05), xlim=c(0, max(dat[[xname]])), lwd=2)

    iabline(coef=c(0,1))
    d <- iplot.data()
    cat('Select break from the menu to exit loop')

    txtObj <- NULL

    while (!is.null(ievent.wait())){
      if (iset.sel.changed()){
        cat("sel changed\n")
        s <- iset.selected()

        if (length(s) >= 1){
          if (!is.null(txtObj) ){
            iobj.rm( txtObj )
          }

          aa <- paste( makeDispString(dat[s[1:min(3, length(s))],]), collapse="\n")
          cat(paste(aa, "\n"))
          txtObj <- itext(x=0, y=ypoint, labels=aa)
        }

      } else {
        if ( !is.null(txtObj)){

          cat(paste('removing ', txtObj, "\n"))
          iobj.rm( txtObj )
          txtObj <- NULL
        }
      }

    }
  }
</pre>
<p>To test, you can use these two bits of code:</p>
<pre class="brush:text;">
if (F){
	read.csv(file='iplot.test.csv.txt', header=T, sep=',')
	visCompare(age, 'default', 'test2')
}
if (F){
	read.csv(file='iplot.test.csv.txt', header=T, sep=',')
	myDispFn <- function(a){ return(paste(a$name, 'blah blah', sep=' : ') }
	visCompare(age, 'default', 'test2', myDispFn)
}
</pre>
<p>And here are the results: first, a visual check via scatterplot of the differences of the two distributions:<br />
<a href="http://blog.earlh.com/wp-content/uploads/2010/01/JGR01.png"><img src="http://blog.earlh.com/wp-content/uploads/2010/01/JGR01-300x236.png" alt="" title="JGR01 -- scatterplot of two distributions" width="300" height="236" class="aligncenter size-medium wp-image-550" /></a></p>
<p>and with the ability to highlight points and see what you're looking at:<br />
<a href="http://blog.earlh.com/wp-content/uploads/2010/01/JGR02.png"><img src="http://blog.earlh.com/wp-content/uploads/2010/01/JGR02-300x237.png" alt="" title="JGR -- scatterplot with info for item" width="300" height="237" class="aligncenter size-medium wp-image-551" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2010/01/interactive-plotting-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Querying Postgres or Greenplum From R on a Mac, Installation Instructions</title>
		<link>http://blog.earlh.com/index.php/2010/01/querying-postgres-or-greenplum-from-r-on-a-mac-installation-instructions/</link>
		<comments>http://blog.earlh.com/index.php/2010/01/querying-postgres-or-greenplum-from-r-on-a-mac-installation-instructions/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 21:19:28 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Data Munging]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R Tip]]></category>
		<category><![CDATA[greenplum]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[R and Databases]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=530</guid>
		<description><![CDATA[NB: this works on 64b versions of R; I tested it with the R64 app with R version 2.10.1 on Snow Leopard Step by step instructions for talking to Postgres or Greenplum: install macports install postgres; I used 8.4 sudo &#8230; <a href="http://blog.earlh.com/index.php/2010/01/querying-postgres-or-greenplum-from-r-on-a-mac-installation-instructions/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>NB: this works on 64b versions of R; I tested it with the R64 app with R version 2.10.1 on Snow Leopard</p>
<p>Step by step instructions for talking to Postgres or Greenplum:</p>
<ol>
<li> install <a href="http://www.macports.org/">macports</a></li>
<li> install postgres; I used 8.4<br />
<code>
<pre class="brush:bash;">
sudo port install postgresql84
</pre>
<p></code>
</li>
<li> in a shell, create an environmental variable PG_CONFIG pointing to the pg_config binary installed by postgres.  In my installation, this is something like<br />
<code>
<pre class="brush:bash;">
export PG_CONFIG=/opt/local/lib/postgresql84/bin/pg_config
</pre>
<p></code>
</li>
<li> in the same shell, tell R to install the RPostgreSQL package *from source*, ie<br />
<code>
<pre class="brush:text;">
$ R
> install.packages('RPostgreSQL', type='source')
</pre>
<p></code>
</li>
<li>test the installation works:<br />
<code>
<pre class="brush:text;">
> library('RPostgreSQL')
Loading required package: DBI
> drv <- dbDriver('PostgreSQL')
> db <- dbConnect(drv, host='greenplum.ip', user='earl', dbname='dbname')
> dbGetQuery(db, 'select 1')
?column?
1       1
</pre>
<p></code>
</li>
</ol>
<p>Diagnosing error messages / problems:</p>
<ul>
<li>If R says<br />
<code>
<pre class="brush:text;">
Warning message:
In install.packages("RPostgreSQL") : package ‘RPostgreSQL’ is not available
</pre>
<p></code><br />
you must specify to install the package from source, as above with type=&#8217;source&#8217; </li>
<li>If you get compilation errors when installing the package that mention libpq-fe.h, then R <a href="http://blog.earlh.com/index.php/2009/12/querying-postgres-or-greenplum-from-r-on-a-mac/">can&#8217;t find pg_config</a></li>
<li>if the package installs but when loading it you get errors involving <a href="http://blog.earlh.com/index.php/2010/01/querying-databases-from-r-on-a-mac/">missing symbol _PQbackendPID</a> then you are mixing 32 and 64 bit software.</li>
</ul>
<p>Follow the links for instructions to fix your problems.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2010/01/querying-postgres-or-greenplum-from-r-on-a-mac-installation-instructions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Querying Databases From R on a Mac</title>
		<link>http://blog.earlh.com/index.php/2010/01/querying-databases-from-r-on-a-mac/</link>
		<comments>http://blog.earlh.com/index.php/2010/01/querying-databases-from-r-on-a-mac/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 00:28:08 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Data Munging]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R Tip]]></category>
		<category><![CDATA[greenplum]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[R and Databases]]></category>
		<category><![CDATA[R Tips]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=520</guid>
		<description><![CDATA[I use a mac, currently running OS 10.6 / Snow Leopard, and I&#8217;d like to query our greenplum / postgres database from R. This used to work with R 2.9, but I unfortunately had to upgrade R, and R 2.10 &#8230; <a href="http://blog.earlh.com/index.php/2010/01/querying-databases-from-r-on-a-mac/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I use a mac, currently running OS 10.6 / Snow Leopard, and I&#8217;d like to query our greenplum / postgres database from R.  This used to work with R 2.9, but I unfortunately had to upgrade R, and R 2.10 on the mac is a 64 bit app.  So, I want to use either RODBC or RPostgreSQL packages under 64 bit R on a mac to query postgres / greenplum.</p>
<p>First, I tried just <a href="http://blog.earlh.com/index.php/2009/12/querying-postgres-or-greenplum-from-r-on-a-mac/">installing RPostgreSQL</a> as before.  Unfortunately, I started getting weird errors when I attempted to load the package:</p>
<p><code>
<pre  class="brush:text;">
>library('RPostgreSQL')
Loading required package: DBI
Error in dyn.load(file, DLLpath = DLLpath, ...) :
  unable to load shared library '/Library/Frameworks/R.framework/Resources/library/RPostgreSQL/libs/x86_64/RPostgreSQL.so':
  dlopen(/Library/Frameworks/R.framework/Resources/library/RPostgreSQL/libs/x86_64/RPostgreSQL.so, 6): Symbol not found: _PQbackendPID
  Referenced from: /Library/Frameworks/R.framework/Resources/library/RPostgreSQL/libs/x86_64/RPostgreSQL.so
  Expected in: flat namespace
 in /Library/Frameworks/R.framework/Resources/library/RPostgreSQL/libs/x86_64/RPostgreSQL.so
Error: package/namespace load failed for 'RPostgreSQL'
</pre>
<p></code></p>
<p>The key bit of the error message is the missing symbol: _PQbackendPID.  Some googling suggested this could be caused by mixing 32 and 64 bit libs.  I used file to check and yes, indeed, I had a 32 bit version of Postgres that was refusing to talk to a 64 bit version on R.  Suck.</p>
<p>In brief, the solution is to use ports to install postgres &#8212; in this case, postgres 8.4 as such:<br />
<code>
<pre class="brush:bash;">
sudo port install postgres84
</pre>
<p></code></p>
<p>you can use the file command to see what architecture your installed postgres is configured as:<br />
<code>
<pre  class="brush:bash;">
laptop:src earl$ file `echo $PG_CONFIG`
/opt/local/lib/postgresql84/bin/pg_config: Mach-O 64-bit executable x86_64
</pre>
<p></code></p>
<p>checking, my previous postgres 8.4 install, from the Postgres Plus prebuild package, produces<br />
<code>
<pre  class="brush:bash;">
file /Library/PostgresPlus/8.4SS/bin/pg_config
/Library/PostgresPlus/8.4SS/bin/pg_config: Mach-O universal binary with 2 architectures
/Library/PostgresPlus/8.4SS/bin/pg_config (for architecture ppc):	Mach-O executable ppc
/Library/PostgresPlus/8.4SS/bin/pg_config (for architecture i386):	Mach-O executable i386
</pre>
<p></code><br />
Notice the lack of any 64bit support.</p>
<p>Then open a terminal, set the PG_CONFIG environmental variable to point to the right location, then run R from the terminal and install the package.<br />
<code>
<pre class="brush:text;">
laptop: work earl$ export PG_CONFIG=/opt/local/lib/postgresql84/bin/pg_config

laptop: work earl$ R64
install.packages('RPostgreSQL', type='source')
</pre>
<p></code></p>
<p>If you have misconfigured the pg_config, this is the relevant bit of the compilation error message you will receive:<br />
<code>
<pre class="brush:text;">
checking for "/libpq-fe.h"... no
configure: error: File libpq-fe.h not in ; installation may be broken.
ERROR: configuration failed for package ‘RPostgreSQL’
* removing ‘/Library/Frameworks/R.framework/Versions/2.10/Resources/library/RPostgreSQL’
</code></pre>
<p>Otherwise, RPostgreSQL will compile and install.  Seriously, though, there *must* be a better way of distributing software on macs.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2010/01/querying-databases-from-r-on-a-mac/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Querying Postgres or Greenplum from R on a Mac</title>
		<link>http://blog.earlh.com/index.php/2009/12/querying-postgres-or-greenplum-from-r-on-a-mac/</link>
		<comments>http://blog.earlh.com/index.php/2009/12/querying-postgres-or-greenplum-from-r-on-a-mac/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 16:00:50 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Data Munging]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R Tip]]></category>
		<category><![CDATA[greenplum]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[R and Databases]]></category>
		<category><![CDATA[R Tips]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=501</guid>
		<description><![CDATA[So, I&#8217;m using snow leopard, and I want to query our postgres / greenplum database. First things first: I&#8217;m familiar with the RODBC package on CRAN. This installs fine, since it&#8217;s a binary package. I also installed the ODBC Administrator &#8230; <a href="http://blog.earlh.com/index.php/2009/12/querying-postgres-or-greenplum-from-r-on-a-mac/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>So, I&#8217;m using snow leopard, and I want to query our postgres / greenplum database.</p>
<p>First things first: I&#8217;m familiar with the <a href="http://cran.r-project.org/web/packages/RODBC/index.html">RODBC</a> package on CRAN.  This installs fine, since it&#8217;s a binary package.  I also installed the ODBC Administrator app that you have to download from apple <a href="http://support.apple.com/downloads/ODBC_Administrator_Tool_for_Mac_OS_X"> here </a>.  Now all I need is the postgres ODBC driver, which is harder to get your hands on than you&#8217;d think.  I first installed postgres84 via ports, but that didn&#8217;t seem to include the ODBC driver.  I then installed the full postgres84 package in a pre-packaged distro from <a href="http://www.enterprisedb.com/products/pgdownload.do#osx"> EnterpriseDB</a>.  This required rebooting my mac and then manually disabling postgres db &#8212; since I only want the odbc drivers &#8212; by removing the obvious files from /Library/LaunchDaemons.  Then&#8230; no love.  I started ODBC Administrator, selected a System DSN, chose the psqlODBC driver, and then ended up with a screen that had no prompts and just a bunch of key / value pairs with no suggestions as to what might be required &#8212; typically some variation of host, hostname, user, username, etc.  Unfortunately, clicking on the key field in the rows doesn&#8217;t allow me to edit them;  Hitting enter allows me to modify the key, but hell if I know how to modify the value.</p>
<p>So my next attempt was installing the <a href="http://cran.r-project.org/web/packages/RPostgreSQL/index.html">RPostgreSQL</a> package from CRAN.<br />
<code>
<pre class="brush:text;">
install.packages('RPostgreSQL')
</pre>
<p></code><br />
fails, as by default R will only grab binary packages and this is a source package.  You will have to do this:<br />
<code>
<pre class="brush:text;">
install.packages('RPostgreSQL', type='source')
</pre>
<p></code></p>
<p>This, of course, then fails to build, complaining that it can&#8217;t find libpq-fe.h.  Awesome.</p>
<p>If you look hard enough, the missing header file should be wherever you installed postgres.  Either in /opt/local/something if you used ports to install postgres, or in /Library/PostgresPlus/8.4SS if you installed the binary distribution as I did.  Inside that directory lives an include directory which has our .h file.  Setting PG_INCDIR to that path &#8212; eg<br />
<code>
<pre class="brush:bash;">
export PG_INCDIR="/Library/PostgresPlus/8.4SS/include"
</pre>
<p></code></p>
<p>then running R from that shell now gets me far enough that when you rerun install.packages from R you get a complaint about a missing lib:<br />
<code>
<pre class="brush:text;">
> install.packages('RPostgreSQL')
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package ‘RPostgreSQL’ is not available
> ? install.packages
> install.packages('RPostgreSQL', type='source')
also installing the dependency ‘DBI’

trying URL 'http://cran.stat.ucla.edu/src/contrib/DBI_0.2-5.tar.gz'
Content type 'application/x-tar' length 308395 bytes (301 Kb)
opened URL
==================================================
downloaded 301 Kb

trying URL 'http://cran.stat.ucla.edu/src/contrib/RPostgreSQL_0.1-6.tar.gz'
Content type 'application/x-tar' length 141399 bytes (138 Kb)
opened URL
==================================================
downloaded 138 Kb

* Installing *source* package ‘DBI’ ...
** R
** inst
** preparing package for lazy loading
Creating a new generic function for "summary" in "DBI"
** help
*** installing help indices
 >>> Building/Updating help pages for package 'DBI'
     Formats: text html latex example
  DBI-internal                      text    html    latex
  DBIConnection-class               text    html    latex   example
  DBIDriver-class                   text    html    latex   example
  DBIObject-class                   text    html    latex   example
  DBIResult-class                   text    html    latex   example
  dbCallProc                        text    html    latex
  dbCommit                          text    html    latex   example
  dbConnect                         text    html    latex   example
  dbDataType                        text    html    latex   example
  dbDriver                          text    html    latex   example
  dbGetInfo                         text    html    latex   example
  dbListTables                      text    html    latex   example
  dbReadTable                       text    html    latex   example
  dbSendQuery                       text    html    latex   example
  dbSetDataMappings                 text    html    latex   example
  fetch                             text    html    latex   example
  make.db.names                     text    html    latex   example
  print.list.pairs                  text    html    latex   example
** building package indices ...
* DONE (DBI)
* Installing *source* package ‘RPostgreSQL’ ...
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for pg_config... no
configure: checking for PostgreSQL header files
checking for "/Library/PostgresPlus/8.4SS/include/libpq-fe.h"... yes
configure: creating ./config.status
config.status: creating src/Makevars
** libs
** arch - i386
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/Library/PostgresPlus/8.4SS/include -I/usr/local/include    -fPIC  -g -O2 -c RS-DBI.c -o RS-DBI.o
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/Library/PostgresPlus/8.4SS/include -I/usr/local/include    -fPIC  -g -O2 -c RS-PostgreSQL.c -o RS-PostgreSQL.o
gcc -arch i386 -std=gnu99 -dynamiclib -Wl,-headerpad_max_install_names -mmacosx-version-min=10.4 -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/lib -o RPostgreSQL.so RS-DBI.o RS-PostgreSQL.o -L -lpq -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
ld: library not found for -lpq
collect2: ld returned 1 exit status
make: *** [RPostgreSQL.so] Error 1
ERROR: compilation failed for package ‘RPostgreSQL’
* Removing ‘/Library/Frameworks/R.framework/Versions/2.9/Resources/library/RPostgreSQL’

The downloaded packages are in
	‘/private/var/folders/-E/-E9MDL2qECqW8Ik4CfUX6U+++TM/-Tmp-/RtmpvTtehd/downloaded_packages’
Updating HTML index of packages in '.Library'
Warning message:
In install.packages("RPostgreSQL", type = "source") :
  installation of package 'RPostgreSQL' had non-zero exit status
</pre>
<p></code></p>
<p>Thanks to an email to the <a href="http://www.mail-archive.com/r-help@r-project.org/msg67209.html">R help list</a>, the answer is to tell gcc where to find pg_config, which somehow magically solves this.  eg:<br />
<code>
<pre class="brush:text;">
earl:bin $ export PG_CONFIG=/Library/PostgresPlus/8.4SS/bin/pg_config
earl:bin $ R

R version 2.9.2 (2009-08-24)
[...]
> install.packages('RPostgreSQL', type='source')
--- Please select a CRAN mirror for use in this session ---
[...]
checking for pg_config... /Library/PostgresPlus/8.4SS/bin/pg_config
checking for "/Library/PostgresPlus/8.4SS/include/libpq-fe.h"... yes
configure: creating ./config.status
config.status: creating src/Makevars
** libs
** arch - i386
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/Library/PostgresPlus/8.4SS/include -I/usr/local/include    -fPIC  -g -O2 -c RS-DBI.c -o RS-DBI.o
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/Library/PostgresPlus/8.4SS/include -I/usr/local/include    -fPIC  -g -O2 -c RS-PostgreSQL.c -o RS-PostgreSQL.o
gcc -arch i386 -std=gnu99 -dynamiclib -Wl,-headerpad_max_install_names -mmacosx-version-min=10.4 -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/lib -o RPostgreSQL.so RS-DBI.o RS-PostgreSQL.o -L/Library/PostgresPlus/8.4SS/lib -lpq -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
** R
** inst
** preparing package for lazy loading
Creating a new generic function for "format" in "RPostgreSQL"
Creating a new generic function for "print" in "RPostgreSQL"
** help
*** installing help indices
 >>> Building/Updating help pages for package 'RPostgreSQL'
     Formats: text html latex example
  PostgreSQL                        text    html    latex   example
  PostgreSQLConnection-class        text    html    latex   example
  PostgreSQLDriver-class            text    html    latex   example
  PostgreSQLObject-class            text    html    latex   example
  PostgreSQLResult-class            text    html    latex   example
  S4R                               text    html    latex   example
  dbApply-methods                   text    html    latex   example
  dbApply                           text    html    latex   example
  dbBuildTableDefinition            text    html    latex
  dbCallProc-methods                text    html    latex
  dbCommit-methods                  text    html    latex   example
  dbConnect-methods                 text    html    latex   example
  dbDataType-methods                text    html    latex   example
  dbDriver-methods                  text    html    latex   example
  dbGetInfo-methods                 text    html    latex   example
  dbListTables-methods              text    html    latex   example
  dbObjectId-class                  text    html    latex   example
  dbReadTable-methods               text    html    latex   example
  dbSendQuery-methods               text    html    latex   example
  dbSetDataMappings-methods         text    html    latex   example
  fetch-methods                     text    html    latex   example
  isIdCurrent                       text    html    latex   example
  make.db.names-methods             text    html    latex   example
  postgresqlDBApply                 text    html    latex   example
  postgresqlSupport                 text    html    latex
  safe.write                        text    html    latex   example
  summary-methods                   text    html    latex
** building package indices ...
* DONE (RPostgreSQL)

The downloaded packages are in
	‘/private/var/folders/-E/-E9MDL2qECqW8Ik4CfUX6U+++TM/-Tmp-/RtmpurzqTb/downloaded_packages’
Updating HTML index of packages in '.Library'
> library(RPostgreSQL)
Loading required package: DBI
</pre>
<p></code></p>
<p>You can now test this:<br />
<code>
<pre class="brush:text;">
> library('RPostgreSQL')
Loading required package: DBI
> drv <- dbDriver('PostgreSQL')
> drv
 PostgreSQLDriver:(1825)
> db <- dbConnect(drv, host='greenplum.ip', user='earl', dbname='db01')
> db
 PostgreSQLConnection:(1825,0)
> dbGetQuery(db, 'select 1')
  ?column?
1        1
>
>
>
> dbGetQuery(db, 'select count(*) from earl_fav_wd')
  count
1    34
>
</pre>
<p></code></p>
<p>Success!  I can query my greenplum db from R.  Also, I hate computers.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2009/12/querying-postgres-or-greenplum-from-r-on-a-mac/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Plotting in Grids</title>
		<link>http://blog.earlh.com/index.php/2009/12/plotting-in-grid/</link>
		<comments>http://blog.earlh.com/index.php/2009/12/plotting-in-grid/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 04:03:20 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Plotting]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[plotting series]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=474</guid>
		<description><![CDATA[This is post #12 in a running series about plotting in R. I regularly find myself wanting to show arrays or grids of plots in R. This is straightforward using par and mfrow as long as you want a symmetric, &#8230; <a href="http://blog.earlh.com/index.php/2009/12/plotting-in-grid/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div style="border: 1px solid rgb(230, 219, 85); margin: 0px auto; padding: 10px; width: 70%; background-color: #F5F5F5; font-size: 0.9em; text-align: center;">
	This is post #12 in a running <a href="http://blog.earlh.com/index.php/plotting-in-r-a-series/"> series </a> about plotting in R.
</div>
<p>I regularly find myself wanting to show arrays or grids of plots in R.  This is straightforward using par and mfrow as long as you want a symmetric, evenly spaced grid of plots.  Unfortunately, this often is not what I want.  Even more unfortunately, this is a hard question to google for.  I&#8217;ve tried array of plots, grid of plots, matrix of plots, asymmetric grids of plots, asymmetric arrays, uneven grids of plots, uneven mfrow, uneven mfcol, etc, and nothing worked.  (Searches listed here in the hopes that other people with the same question will find the answer.)</p>
<p>I actually didn&#8217;t think this could be accomplished without using lattice and ggplot2, but I recently discovered that it can be done with R&#8217;s base plotting functions.  The function layout provides what we&#8217;re looking for.  It takes a matrix describing where you want your sequence of plots to go.  After creating your layout, you can use layout.show to visually see where your plots will go.  Let&#8217;s take a look at some examples.</p>
<p>This creates a two by two grid, exactly as mfrow does.<br />
<code>
<pre class="brush: text;">
# 2 by 2 grid, the same as mfrow=c(2,2)
pp <- layout(matrix(c(1,2,3,4), 2, 2, byrow=T))
layout.show(pp)
</pre>
<p></code><br />
<center><br />
<a href="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.00.png"><img src="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.00-300x300.png" alt="plot12.00" title="plot12.00" width="300" height="300" class="aligncenter size-medium wp-image-485" /></a><br />
</center></p>
<p>For comparison, this creates a 2 by 2 grid as mfcol does.  The only difference is the order of the plot numbers in the matrix.<br />
<code>
<pre class="brush: text;">
# 2 by 2 grid, the same as mfcol=c(2,2)
pp <- layout(matrix(c(1,2,3,4), 2, 2, byrow=F))
layout.show(pp)
</pre>
<p></code><br />
<center><br />
<a href="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.01.png"><img src="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.01-300x300.png" alt="plot12.01" title="plot12.01" width="300" height="300" class="aligncenter size-medium wp-image-485" /></a><br />
</center></p>
<p>We can put 0 in any position in the matrix to not plot there.<br />
<code>
<pre class="brush: text;">
# no plotting in the first quadrant
pp <- layout(matrix(c(1,0,2,3), 2, 2, byrow=T))
layout.show(pp)
</pre>
<p></code><br />
<center><br />
<a href="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.02.png"><img src="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.02-300x300.png" alt="plot12.02" title="plot12.02" width="300" height="300" class="aligncenter size-medium wp-image-485" /></a><br />
</center></p>
<p>Now, let's just have one plot use all of the left column.  The trick to spanning columns like this is to repeat the number of the plot that you want to span -- note that 1 occurs twice in the layout matrix.<br />
<code>
<pre class="brush: text;">
# now a fat plot on the left and two small plots in the right column
pp <- layout(matrix(c(1, 1, 2, 3), 2, 2, byrow=F))
layout.show(pp)
</pre>
<p></code><br />
<center><br />
<a href="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.03.png"><img src="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.03-300x300.png" alt="plot12.03" title="plot12.03" width="300" height="300" class="aligncenter size-medium wp-image-485" /></a><br />
</center></p>
<p>Finally, we can set widths for the columns (or for the rows -- just use heights instead of widths).<br />
<code>
<pre class="brush: text;">
# same as above, but with the left column having 3/4 of the width
pp <- layout(matrix(c(1, 1, 2, 3), 2, 2, byrow=F), widths=c(3,1))
layout.show(pp)
</pre>
<p></code><br />
<center><br />
<a href="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.04.png"><img src="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.04-300x300.png" alt="plot12.04" title="plot12.04" width="300" height="300" class="aligncenter size-medium wp-image-485" /></a><br />
</center></p>
<p>Now, let's show off what I originally wanted to do: display a plot of two dimensions of a distribution, along with the marginal distributions.  I'm wrapping the functionality up into a function so it's easy to reuse.  I use plot to show the sample and barplot to show the distribution as calculated by hist.</p>
<p><code>
<pre class="brush: text;">
# now lets demonstrate with a plot of the multivariate normal and histograms of the marginal distributions
# use package MASS to get the mvrnorm function

plotWithMarginals <- function(x, y){

	# find min / max on each dimension
	# then set up breaks so that even if x, y are on very different ranges things work
	mm <- max(abs(range(x, y)))
	breaks <- seq(-mm, mm, by=(2*mm)/1000)

	hist0 <- hist(x, breaks=breaks, plot=F)
	hist1 <- hist(y, breaks=breaks, plot=F)

	# create a grid and check it out to make sure that it's what we want
	pp <- layout(matrix(c(2,0,1,3), 2, 2, byrow=T), c(3,1), c(1,3), T)
	layout.show(pp)

	rang <- c(-mm, mm)

	par(mar=c(3,3,1,1))
	plot(x, y, xlim=rang, ylim=rang, xlab='', ylab='')

	# now plot marginals
	top <- max(hist0$counts, hist1$counts)
	par(mar=c(0,3,1,1))
	barplot(hist0$counts, axes=F, ylim=c(0, top), space=0)

	par(mar=c(3,0,1,1))
	barplot(hist1$counts, axes=F, xlim=c(0,top), space=0, horiz=T)
}

# mvrnorm <-- sample from a multivariate normal distn
library(MASS)
</pre>
<p></code></p>
<p>Now that all the prep is done, this shows a multivariate normal distribution with no correlation between the two variables.  Note the shape of the marginal distributions.<br />
<code>
<pre class="brush: text;">
eye2 <- matrix(c(1,0,0,1), 2, 2)
sample <- mvrnorm(n=10000, mu=c(0,0), Sigma=eye2)
plotWithMarginals(sample[,1], sample[,2])
</pre>
<p></code></p>
<p><center><br />
<a href="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.05.png"><img src="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.05-300x300.png" alt="plot12.05" title="plot12.05" width="300" height="300" class="aligncenter size-medium wp-image-485" /></a><br />
</center></p>
<p>And finally, for contrast, a correlated multivariate normal.<br />
<code>
<pre class="brush: text;">
yescorr <- matrix(c(1, 0.9, 0.9, 1), 2, 2, byrow=T)
sample <- mvrnorm(n=10000, mu=c(0,0), Sigma=yescorr)
plotWithMarginals(sample[,1], sample[,2])
</pre>
<p></code></p>
<p><center><br />
<a href="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.06.png"><img src="http://blog.earlh.com/wp-content/uploads/2009/12/plot12.06-300x300.png" alt="plot12.06" title="plot12.06" width="300" height="300" class="aligncenter size-medium wp-image-485" /></a></center></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2009/12/plotting-in-grid/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Querying Databases in R</title>
		<link>http://blog.earlh.com/index.php/2009/08/querying-databases-in-r/</link>
		<comments>http://blog.earlh.com/index.php/2009/08/querying-databases-in-r/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 16:00:36 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Data Munging]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R Tip]]></category>
		<category><![CDATA[data frame]]></category>
		<category><![CDATA[greenplum]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[R and Databases]]></category>
		<category><![CDATA[R Tips]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=449</guid>
		<description><![CDATA[One of the first things you&#8217;ll want to do in R is set it up to talk to databases. The easiest way to do this is using ODBC, via package RODBC. To get the package, run > install.packages(RODBC) Once you &#8230; <a href="http://blog.earlh.com/index.php/2009/08/querying-databases-in-r/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>One of the first things you&#8217;ll want to do in R is set it up to talk to databases.  The easiest way to do this is using ODBC, via package RODBC.</p>
<p>To get the package, run<br />
<code>
<pre class="brush:text;">
> install.packages(RODBC)
</pre>
<p></code></p>
<p>Once you have RODBC installed, you call it in R as follows.  But it&#8217;s very simple: a bit of setup, then sqlQuery will run your sql and return the results in a data frame.<br />
<code>
<pre class="brush: text;">
library(RODBC)

db <- odbcConnect( dsn='your dsn name' )
sql <- 'select page_id, count(*) as cnt
           from document_ads
           group by page_id
           having count(*) > 1'

results <- sqlQuery(db, sql, errors=T, rows_at_time=1024)
str(results)
'data.frame':	282432 obs. of  2 variables:
 $ page_id: int  17646774 17115332 17606022 15899428 17099174 17283774 8604200 16315025 17259751 17283270 ...
 $ cnt            : int  489 1119 132 113 148 200 112 121 1135 633 ...
</pre>
<p></code></p>
<p>On Windows, you setup the DSNs in the ODBC Data Sources inside the control panel; on MacOS, mysql includes a program called ODBC Administrator; on linux, you'll have to install <a href="http://www.easysoft.com/developer/interfaces/odbc/linux.html"> unixODBC </a>.</p>
<p>Also, it's often convenient to write code that caches your query results, particularly if the query takes a while.  I've found that the easiest thing to do is write the results into a data file and check for the file existence like such:<br />
<code>
<pre class="brush:text;">
filename <- 'query cache.RData'
if (!file.exists(filename)){
   # don't have a cached copy so run the query
   library(RODBC)
   [snip]
   query1 <- sqlQuery(db, sql, errors=T, rows_at_time=1024)

   # save the query results for the future
   save(list=c('query1', 'sql'), file=filename)
   rm(list=c('query1', 'sql') )
}
load(file=filename)
</pre>
<p></code</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2009/08/querying-databases-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

