Category Archive

The following is a list of all entries from the Data Munging category.

Horizontal Paging of Greenplum or Postgres Queries

When using gpsql or pgsql to query greenplum or postgres respectively, query results which exceed the width of your term will wrap in a very annoying fashion. To get horizontal paging, set the environmental variable PAGER:

export PAGER=’less -RSFX’

then either in your psql or gpsql session, or in your .psqlrc file,

\pset pager always

Note that if [...]


Querying Postgres or Greenplum From R on a Mac, Installation Instructions

NB: this works on 64b versions of R; I tested it with the R64 app with R version 2.10.1 on Snow Leopard
Step by step instructions for talking to Postgres or Greenplum:

install macports
install postgres; I used 8.4

sudo port install postgresql84

in a shell, create an environmental variable PG_CONFIG pointing to the pg_config binary installed [...]


Querying Databases From R on a Mac

I use a mac, currently running OS 10.6 / Snow Leopard, and I’d like to query our greenplum / postgres database from R. This used to work with R 2.9, but I unfortunately had to upgrade R, and R 2.10 on the mac is a 64 bit app. So, I want to use [...]


Querying Postgres or Greenplum from R on a Mac

So, I’m using snow leopard, and I want to query our postgres / greenplum database.
First things first: I’m familiar with the RODBC package on CRAN. This installs fine, since it’s a binary package. I also installed the ODBC Administrator app that you have to download from apple here . Now all [...]


Querying Databases in R

One of the first things you’ll want to do in R is set it up to talk to databases. The easiest way to do this is using ODBC, via package RODBC.
To get the package, run

> install.packages(RODBC)

Once you have RODBC installed, you call it in R as follows. But it’s very simple: a bit [...]


MySQL, Batch Imports, and Rails

I really love Rails, but it’s not the most performant code in the world. Though it doesn’t often arise in CRUD programming, if you do any sort of stats, ML, or data analytics, you’ll frequently find yourself wanting to import lots of data into your db. You could create an ActiveRecord object for [...]


Examining Data Frames — head and tail

head and tail, for those familiar with the unix command line, are two very handy utilities for looking at data frames. Along with str, which displays the structure of a data frame, they help you look at your data:

> d d
> str(d)
‘data.frame’: 50 obs. of 2 variables:
$ mean: int 1 [...]


Picking Subsets of CSV/TSV Files With awk

Say you have a csv or tsv file, and you want to only select the bits where a particular column is not zero. Start with a csv like this:

earl$ head ttt
104834, 0, 206, 104578, false
104837, 4, 206, 103566, false
104854, 0, 193, 101063, false
104856, 0, 195, 101851, false
8469683, 0, 149, 50191, false
121867, 4, 207, 107816, [...]


Howto Remove Tabs From CSV Files, A Second Method

As mentioned before, you regularly want to transform tsv files into csv files. While tr is a much less powerful program than sed or awk, it is much easier to use:

tr ‘\t’ ‘,’ < input_file > output_file


Writing MySQL Query Results to Disk

Notes to myself: how to easily write query results to disk using mysql.

mysql -h main-backup.local -u earl -e “select count(*) from adsense_analytics_days;” -p collegelist_development > csvname.csv;

where h specifies the name of the mysql server, u the username, e the query, p the database.
This will output a tsv file; to turn it into csv try using [...]