Category Archives: Data Munging

Picking Subsets of CSV/TSV Files With awk

Say you have a csv or tsv file, and you want to only select the bits where a particular column is not zero. Start with a csv like this: earl$ head ttt 104834, 0, 206, 104578, false 104837, 4, 206, … Continue reading

Posted in Data Munging | Tagged , | Leave a comment

Howto Remove Tabs From CSV Files, A Second Method

As mentioned before, you regularly want to transform tsv files into csv files. While tr is a much less powerful program than sed or awk, it is much easier to use: tr ‘\t’ ‘,’ < input_file > output_file

Posted in Data Munging | Tagged , | 1 Comment

Writing MySQL Query Results to Disk

Notes to myself: how to easily write query results to disk using mysql. mysql -h main-backup.local -u earl -e “select count(*) from adsense_analytics_days;” -p collegelist_development > csvname.csv; where h specifies the name of the mysql server, u the username, e … Continue reading

Posted in Data Munging | Tagged , , | 1 Comment

Howto Swap the Order of Columns in a CSV or TSV File – Use awk

Sample file: tab separated col1 col2 col3 val11 val12 val13 val21 val22 val23 val31 val32 val33 blog earl$ awk ‘{FS=”\t”; OFS=”, “; print $1,$3,$2}’ < input.tsv In this case, FS is the field separator for the input and OFS is … Continue reading

Posted in Data Munging | Tagged | Leave a comment

Howto Transform TSV to CSV, or Just Remove Tabs

Unfortunately, statistics and machine learning seem to degenerate into a giant mess of getting data from multiple sources, munging it together, transforming it, and formatting the output, even before you can get to the work proper. A common problem is … Continue reading

Posted in Data Munging | Tagged | 1 Comment

Removing Extra Column of Data from CSVs in R — R Tip

When R writes a csv file, you get an extra column of data as such: > s > plot(x=s$x, y=s$y ) > > write.csv(x=s, file=’s0.csv’ ) When you peek in the csv file, you see this: blog earl$ head s0.csv … Continue reading

Posted in Data Munging, R, R Tip | Tagged , | Leave a comment

Examining CSV Data Columns From a Shell

It’s very handy to be able to pop open a shell and peek in your csv files. awk is a command that will do just that — it divides each line into fields based either on a whitespace separator or … Continue reading

Posted in Data Munging, R, R Tip | Tagged | Leave a comment

Removing Quotes From csv Files

Many programs, particularly Excel, having an annoying habit of dumping crap such as quotes or currency symbols into your csv files. I pointed out earlier a simple way to deal with this in R, but if you’re more comfortable with … Continue reading

Posted in Data Munging | Tagged | Leave a comment

Saving MySQL Query Results into csv

Say you have a mysql query such as select start_date, count(*), sum(impressions) as impr, sum( revenue) as revenue from adsense_analytics_days group by start_date order by start_date desc and you want to save the results into a csv file. MySQL makes … Continue reading

Posted in Data Munging | Tagged , | Leave a comment

Plotting With Custom X Axis Labels in R — Part 5 in a Series

This is post #05 in a running series about plotting in R. There are a variety of ways to control how R creates x and y axis labels for plots. Let’s walk through the typical process of creating good labels … Continue reading

Posted in Data Munging, Plotting, R, Visualization | Tagged , | 1 Comment