Archives
Categories
Category Archives: Data Munging
Bash tricks: drop the first line of a file
I work with a bunch of data that often comes in text files. I regularly want to cut off the header / first line, but I thought that to use tail you had to know how many lines are in … Continue reading
Posted in bash, Data Munging
1 Comment
Saving output of a command and echoing to the screen
When using bash, it’s really nice to both save the output of a command to a file and print it on the screen. I couldn’t find something that did this so I wrote my own ruby script. A utility that … Continue reading
Posted in bash, Data Munging
Leave a comment
Horizontal Paging of Greenplum or Postgres Queries
When using gpsql or pgsql to query greenplum or postgres respectively, query results which exceed the width of your term will wrap in a very annoying fashion. To get horizontal paging, set the environmental variable PAGER: export PAGER=’less -RSFX’ then … Continue reading
Interactive Plotting in R
There are many ways to compare univariate distributions; one of my favorites is violin plots. However, if you are only comparing two distributions, then the best solution is often a scatter plot. To that end, I’ve build some code that … Continue reading
Posted in Data Munging, Plotting, R, R Tip, Visualization
Leave a comment
Querying Postgres or Greenplum From R on a Mac, Installation Instructions
NB: this works on 64b versions of R; I tested it with the R64 app with R version 2.10.1 on Snow Leopard Step by step instructions for talking to Postgres or Greenplum: install macports install postgres; I used 8.4 sudo … Continue reading
Querying Databases From R on a Mac
I use a mac, currently running OS 10.6 / Snow Leopard, and I’d like to query our greenplum / postgres database from R. This used to work with R 2.9, but I unfortunately had to upgrade R, and R 2.10 … Continue reading
Posted in Data Munging, R, R Tip
Tagged greenplum, postgres, R and Databases, R Tips
Leave a comment
Querying Postgres or Greenplum from R on a Mac
So, I’m using snow leopard, and I want to query our postgres / greenplum database. First things first: I’m familiar with the RODBC package on CRAN. This installs fine, since it’s a binary package. I also installed the ODBC Administrator … Continue reading
Posted in Data Munging, R, R Tip
Tagged greenplum, postgres, R and Databases, R Tips
Leave a comment
Querying Databases in R
One of the first things you’ll want to do in R is set it up to talk to databases. The easiest way to do this is using ODBC, via package RODBC. To get the package, run > install.packages(RODBC) Once you … Continue reading
Posted in Data Munging, R, R Tip
Tagged data frame, greenplum, mysql, postgres, R and Databases, R Tips
Leave a comment
MySQL, Batch Imports, and Rails
I really love Rails, but it’s not the most performant code in the world. Though it doesn’t often arise in CRUD programming, if you do any sort of stats, ML, or data analytics, you’ll frequently find yourself wanting to import … Continue reading
Examining Data Frames — head and tail
head and tail, for those familiar with the unix command line, are two very handy utilities for looking at data frames. Along with str, which displays the structure of a data frame, they help you look at your data: > … Continue reading