Monthly Archives: July 2009

Zipline Construction Video

Scribd building the zipline from Tim Morgan.

Posted in Uncategorized | Tagged | Leave a comment

Picking Subsets of CSV/TSV Files With awk

Say you have a csv or tsv file, and you want to only select the bits where a particular column is not zero. Start with a csv like this: earl$ head ttt 104834, 0, 206, 104578, false 104837, 4, 206, … Continue reading

Posted in Data Munging | Tagged , | Leave a comment

Filled Line Plots / Graphs in R — Part 10 in a Series

This is post #10 in a running series about plotting in R. Otherwise known as filled curves. Say you want to, instead of drawing a single line, draw a filled curve. R’s basic plot doesn’t make the especially easy, though … Continue reading

Posted in Plotting, R, Visualization | 1 Comment

Building a Zip Line at Scribd

Scribd built a zip line! Chris Seifert and I, along with our coworkers’ help, built a zip line over 3 nights at Scribd. Which is why I haven’t been posting more. Scribd has a long office space, with 6 pairs … Continue reading

Posted in Uncategorized | Tagged | 1 Comment

Multiple Y Axes in R Plots — Part 9 in a Series

This is post #09 in a running series about plotting in R. Frequently, you want to plot data that is not at all on the same scale. In R, this is done via plotting a second graph on top of … Continue reading

Posted in Plotting, R, Visualization | Tagged , | 2 Comments

In Which Lucy Goes to the Vet

The vet seemed to think she’s healthy and all, except she’s a 10 pound 2 oz cat in a 9 pound cat body. Unfortunately for her. So the diet will continue.

Posted in Uncategorized | Leave a comment

Howto Remove Tabs From CSV Files, A Second Method

As mentioned before, you regularly want to transform tsv files into csv files. While tr is a much less powerful program than sed or awk, it is much easier to use: tr ‘\t’ ‘,’ < input_file > output_file

Posted in Data Munging | Tagged , | 1 Comment

Writing MySQL Query Results to Disk

Notes to myself: how to easily write query results to disk using mysql. mysql -h main-backup.local -u earl -e “select count(*) from adsense_analytics_days;” -p collegelist_development > csvname.csv; where h specifies the name of the mysql server, u the username, e … Continue reading

Posted in Data Munging | Tagged , , | 1 Comment

Howto Swap the Order of Columns in a CSV or TSV File – Use awk

Sample file: tab separated col1 col2 col3 val11 val12 val13 val21 val22 val23 val31 val32 val33 blog earl$ awk ‘{FS=”\t”; OFS=”, “; print $1,$3,$2}’ < input.tsv In this case, FS is the field separator for the input and OFS is … Continue reading

Posted in Data Munging | Tagged | Leave a comment

Howto Transform TSV to CSV, or Just Remove Tabs

Unfortunately, statistics and machine learning seem to degenerate into a giant mess of getting data from multiple sources, munging it together, transforming it, and formatting the output, even before you can get to the work proper. A common problem is … Continue reading

Posted in Data Munging | Tagged | 1 Comment