<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stochastic Nonsense &#187; mysql</title>
	<atom:link href="http://blog.earlh.com/index.php/tag/mysql/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.earlh.com</link>
	<description></description>
	<lastBuildDate>Mon, 19 Sep 2011 03:30:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Querying Databases in R</title>
		<link>http://blog.earlh.com/index.php/2009/08/querying-databases-in-r/</link>
		<comments>http://blog.earlh.com/index.php/2009/08/querying-databases-in-r/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 16:00:36 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Data Munging]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R Tip]]></category>
		<category><![CDATA[data frame]]></category>
		<category><![CDATA[greenplum]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[R and Databases]]></category>
		<category><![CDATA[R Tips]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=449</guid>
		<description><![CDATA[One of the first things you&#8217;ll want to do in R is set it up to talk to databases. The easiest way to do this is using ODBC, via package RODBC. To get the package, run > install.packages(RODBC) Once you &#8230; <a href="http://blog.earlh.com/index.php/2009/08/querying-databases-in-r/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>One of the first things you&#8217;ll want to do in R is set it up to talk to databases.  The easiest way to do this is using ODBC, via package RODBC.</p>
<p>To get the package, run<br />
<code>
<pre class="brush:text;">
> install.packages(RODBC)
</pre>
<p></code></p>
<p>Once you have RODBC installed, you call it in R as follows.  But it&#8217;s very simple: a bit of setup, then sqlQuery will run your sql and return the results in a data frame.<br />
<code>
<pre class="brush: text;">
library(RODBC)

db <- odbcConnect( dsn='your dsn name' )
sql <- 'select page_id, count(*) as cnt
           from document_ads
           group by page_id
           having count(*) > 1'

results <- sqlQuery(db, sql, errors=T, rows_at_time=1024)
str(results)
'data.frame':	282432 obs. of  2 variables:
 $ page_id: int  17646774 17115332 17606022 15899428 17099174 17283774 8604200 16315025 17259751 17283270 ...
 $ cnt            : int  489 1119 132 113 148 200 112 121 1135 633 ...
</pre>
<p></code></p>
<p>On Windows, you setup the DSNs in the ODBC Data Sources inside the control panel; on MacOS, mysql includes a program called ODBC Administrator; on linux, you'll have to install <a href="http://www.easysoft.com/developer/interfaces/odbc/linux.html"> unixODBC </a>.</p>
<p>Also, it's often convenient to write code that caches your query results, particularly if the query takes a while.  I've found that the easiest thing to do is write the results into a data file and check for the file existence like such:<br />
<code>
<pre class="brush:text;">
filename <- 'query cache.RData'
if (!file.exists(filename)){
   # don't have a cached copy so run the query
   library(RODBC)
   [snip]
   query1 <- sqlQuery(db, sql, errors=T, rows_at_time=1024)

   # save the query results for the future
   save(list=c('query1', 'sql'), file=filename)
   rm(list=c('query1', 'sql') )
}
load(file=filename)
</pre>
<p></code</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2009/08/querying-databases-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL, Batch Imports, and Rails</title>
		<link>http://blog.earlh.com/index.php/2009/08/mysql-batch-imports-and-rails/</link>
		<comments>http://blog.earlh.com/index.php/2009/08/mysql-batch-imports-and-rails/#comments</comments>
		<pubDate>Thu, 13 Aug 2009 13:00:48 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Data Munging]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=426</guid>
		<description><![CDATA[I really love Rails, but it&#8217;s not the most performant code in the world. Though it doesn&#8217;t often arise in CRUD programming, if you do any sort of stats, ML, or data analytics, you&#8217;ll frequently find yourself wanting to import &#8230; <a href="http://blog.earlh.com/index.php/2009/08/mysql-batch-imports-and-rails/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I really love Rails, but it&#8217;s not the most performant code in the world.  Though it doesn&#8217;t often arise in CRUD programming, if you do any sort of stats, ML, or data analytics, you&#8217;ll frequently find yourself wanting to import lots of data into your db.  You could create an ActiveRecord object for each row, but this is glacial, requiring one round trip to the db server per row, and is likely to abuse the kindness of your dba.  Instead, there is a wonderful gem called ar-extensions that allows you to access mysql&#8217;s native bulk import facilities.  To use it you just call Model.import with arrays of data and their corresponding fields.  For example, say I have a table like this:</p>
<p><code>
<pre class="brush:sql;">
mysql> describe adsense_analytics_days;
+------------------+----------+------+-----+---------+----------------+
| Field            | Type     | Null | Key | Default | Extra          |
+------------------+----------+------+-----+---------+----------------+
| id               | int(11)  | NO   | PRI | NULL    | auto_increment |
| page_id          | int(11)  | NO   | MUL | NULL    |                |
| impressions      | int(11)  | YES  |     | NULL    |                |
| clicked          | int(11)  | YES  |     | NULL    |                |
| ecpm             | float    | YES  |     | NULL    |                |
| ctr              | float    | YES  |     | NULL    |                |
| cpc              | float    | YES  |     | NULL    |                |
| revenue          | float    | YES  |     | NULL    |                |
| start_date       | date     | YES  | MUL | NULL    |                |
| end_date         | date     | YES  |     | NULL    |                |
| created_at       | datetime | YES  |     | NULL    |                |
+------------------+----------+------+-----+---------+----------------+
11 rows in set (0.09 sec)

mysql> 
</pre>
<p></code><br />
This has a corresponding model AdsenseAnalyticsDay.  Batch importing with rails is then trivial:</p>
<p><code>
<pre class="brush: ruby;">
require 'ar-extensions'
require 'ar-extensions/import/mysql'

# instead of
if false
  rows.each do |row|
    AdsenseAnalyticsDay.create( ) # etc
  end
end

# you can accomplish a bulk import from, eg, a csv as such:
f = File.new('bulk_import.csv', 'r')
data = []
while line = f.gets
  puts "#{line}" if rand(1000) >= 999
  # pid, impr, clicked, ecpm, ctr, cpc, revenue, start_date, end_date, created_at
  d = line.split(',')
  (0..2).each{ |i| d[i] = d[i].to_i  }
  (3..6).each{ |i| d[i] = d[i].to_f }
  data << d[0..8]
end
f.close

fields = [:page_id, :impressions, :clicked, :ecpm, :ctr, :cpc, :revenue, :start_date, :end_date]
AdsenseAnalyticsDay.import(fields, data, {:validate => false })
</pre>
<p></code></p>
<p>where the csv looks like:<br />
<code>
<pre class="brush: bash;">
MacBook-2:work earl$ head bulk_import.csv
0,344,5,0.755814,0.0145349,0.052,0.26,2009-08-06,2009-08-06,2009-08-10 19:49:12
1,8,1,0,0.125,0,0,2009-08-06,2009-08-06,2009-08-10 19:49:12
2,32,9,76.875,0.28125,0.273333,2.46,2009-08-06,2009-08-06,2009-08-10 19:49:12
4,16,1,1.875,0.0625,0.03,0.03,2009-08-06,2009-08-06,2009-08-10 19:49:12
6,17,2,8.82353,0.117647,0.075,0.15,2009-08-06,2009-08-06,2009-08-10 19:49:12
12,15,1,80,0.0666667,1.2,1.2,2009-08-06,2009-08-06,2009-08-10 19:49:12
34,5,0,0,0,0,0,2009-08-06,2009-08-06,2009-08-10 19:49:12
36,2,0,0,0,0,0,2009-08-06,2009-08-06,2009-08-10 19:49:12
39,46,2,11.7391,0.0434783,0.27,0.54,2009-08-06,2009-08-06,2009-08-10 19:49:12
41,3,0,0,0,0,0,2009-08-06,2009-08-06,2009-08-10 19:49:12
MacBook-2:work earl$ 
</pre>
<p></code></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2009/08/mysql-batch-imports-and-rails/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Writing MySQL Query Results to Disk</title>
		<link>http://blog.earlh.com/index.php/2009/07/writing-mysql-query-results-to-disk/</link>
		<comments>http://blog.earlh.com/index.php/2009/07/writing-mysql-query-results-to-disk/#comments</comments>
		<pubDate>Wed, 15 Jul 2009 01:57:07 +0000</pubDate>
		<dc:creator>earl</dc:creator>
				<category><![CDATA[Data Munging]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[tsv]]></category>

		<guid isPermaLink="false">http://blog.earlh.com/?p=185</guid>
		<description><![CDATA[Notes to myself: how to easily write query results to disk using mysql. mysql -h main-backup.local -u earl -e "select count(*) from adsense_analytics_days;" -p collegelist_development > csvname.csv; where h specifies the name of the mysql server, u the username, e &#8230; <a href="http://blog.earlh.com/index.php/2009/07/writing-mysql-query-results-to-disk/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Notes to myself: how to easily write query results to disk using mysql.</p>
<p><code>
<pre class="brush: text;">
mysql -h main-backup.local -u earl -e "select count(*) from adsense_analytics_days;" -p collegelist_development > csvname.csv;
</pre>
<p></code><br />
where h specifies the name of the mysql server, u the username, e the query, p the database.</p>
<p>This will output a tsv file; to turn it into csv try <a href="http://blog.earlh.com/index.php/2009/07/howto-remove-tabs-from-csv-files/">using sed to transform tabs into commas</a> or with <a href="http://blog.earlh.com/index.php/2009/07/howto-remove-tabs-from-csv-files-a-second-method/">tr </a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.earlh.com/index.php/2009/07/writing-mysql-query-results-to-disk/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

