Zipline Construction Video
Filed in Uncategorized, July 29, 2009, 5:11 pmScribd building the zipline from Tim Morgan.
Picking Subsets of CSV/TSV Files With awk
Filed in Data Munging, July 28, 2009, 3:50 pmSay you have a csv or tsv file, and you want to only select the bits where a particular column is not zero. Start with a csv like this:
earl$ head ttt
104834, 0, 206, 104578, false
104837, 4, 206, 103566, false
104854, 0, 193, 101063, false
104856, 0, 195, 101851, false
8469683, 0, 149, 50191, false
121867, 4, 207, 107816, [...]
Filled Line Plots / Graphs in R — Part 10 in a Series
Filed in Plotting, R, Visualization, , 1:16 amThis is post #10 in a running series about plotting in R.
Otherwise known as filled curves.
Say you want to, instead of drawing a single line, draw a filled curve. R’s basic plot doesn’t make the especially easy, though it can be made much easier with packages such as ggplot2 as we’ll see [...]
Building a Zip Line at Scribd
Filed in Uncategorized, July 24, 2009, 7:27 pmScribd built a zip line! Chris Seifert and I, along with our coworkers’ help, built a zip line over 3 nights at Scribd. Which is why I haven’t been posting more.
Scribd has a long office space, with 6 pairs of 8-sided concrete columns running down the middle. Chris and I decided to [...]
Multiple Y Axes in R Plots — Part 9 in a Series
Filed in Plotting, R, Visualization, July 20, 2009, 9:00 amThis is post #09 in a running series about plotting in R.
Frequently, you want to plot data that is not at all on the same scale. In R, this is done via plotting a second graph on top of your first and building the axes labels by hand. Here’s a rough [...]
In Which Lucy Goes to the Vet
Filed in Uncategorized, July 19, 2009, 8:50 pmThe vet seemed to think she’s healthy and all, except she’s a 10 pound 2 oz cat in a 9 pound cat body. Unfortunately for her. So the diet will continue.
Howto Remove Tabs From CSV Files, A Second Method
Filed in Data Munging, July 16, 2009, 2:40 amAs mentioned before, you regularly want to transform tsv files into csv files. While tr is a much less powerful program than sed or awk, it is much easier to use:
tr ‘\t’ ‘,’ < input_file > output_file
Writing MySQL Query Results to Disk
Filed in Data Munging, July 14, 2009, 6:57 pmNotes to myself: how to easily write query results to disk using mysql.
mysql -h main-backup.local -u earl -e “select count(*) from adsense_analytics_days;” -p collegelist_development > csvname.csv;
where h specifies the name of the mysql server, u the username, e the query, p the database.
This will output a tsv file; to turn it into csv try using [...]
Howto Swap the Order of Columns in a CSV or TSV File – Use awk
Filed in Data Munging, , 6:52 pmSample file: tab separated
col1 col2 col3
val11 val12 val13
val21 val22 val23
val31 val32 val33
blog earl$ awk ‘{FS=”\t”; OFS=”, “; print $1,$3,$2}’ < input.tsv
In this case, FS is the field separator for the input and OFS is the field separator for the output. Thus if we wanted to go to eg tsv to tsv we would set both to “\t” (default for awk); csv [...]
Howto Transform TSV to CSV, or Just Remove Tabs
Filed in Data Munging, , 6:51 pmUnfortunately, statistics and machine learning seem to degenerate into a giant mess of getting data from multiple sources, munging it together, transforming it, and formatting the output, even before you can get to the work proper. A common problem is taking tab separate value (tsv) files, perhaps produced as the output of a mysql [...]