Howto Transform TSV to CSV, or Just Remove Tabs

Unfortunately, statistics and machine learning seem to degenerate into a giant mess of getting data from multiple sources, munging it together, transforming it, and formatting the output, even before you can get to the work proper. A common problem is taking tab separate value (tsv) files, perhaps produced as the output of a mysql or postgres query, and turning them into comma separated value (csv) files.

Here’s one method, using sed and pretty standard regexp syntax:

work earl$ head for_tyler.csv
day	referer	ip
2009-07-06	http://powertwitter.me/	111113333
2009-07-06	http://powertwitter.me/	111113333
2009-07-06	http://powertwitter.me/	111113333
2009-07-06	http://powertwitter.me/	111113333
2009-07-06	http://twitter.com/	111113333
2009-07-06	http://twitter.com/home	111113333
2009-07-06	http://twitter.com/home	111113333
2009-07-06	http://twitter.com/home	111113333
2009-07-06	http://twitter.com/home	111113333
work earl$
work earl$ head for_tyler.csv | sed "s/\t/,/g"
day,referer,ip
2009-07-06,http://powertwitter.me/,111113333
2009-07-06,http://powertwitter.me/,111113333
2009-07-06,http://powertwitter.me/,111113333
2009-07-06,http://powertwitter.me/,111113333
2009-07-06,http://twitter.com/,111113333
2009-07-06,http://twitter.com/home,111113333
2009-07-06,http://twitter.com/home,111113333
2009-07-06,http://twitter.com/home,111113333
2009-07-06,http://twitter.com/home,111113333
work earl$
work earl$

The key bit above is this: “s/\t/,/g”. That says turn every tab (\t) into a comma (,). If you instead preferred to just remove tabs from the file period, you could use sed on “s/\t//g”.

So, yet another thing I learned today: the version of sed that ships with MacOS, even through 10.5.7, doesn’t support special character sequences. If the above isn’t working for you, and instead is just replacing every t character in the file with a comma, then try this:

earl$ echo "   a       " | sed "s/\t/,/g"


Note that to type those tabs, you’ll have to hit ctrl-v (^V). If the output isn’t “,a,”, then you have to type literal tabs in your sed command. The \t works under reasonable versions of linux; you’ll have to use literal tabs under OS X. Bleh.

This entry was posted in Data Munging and tagged . Bookmark the permalink.

One Response to Howto Transform TSV to CSV, or Just Remove Tabs

  1. Pingback: Stochastic Nonsense | Writing MySQL Query Results to Disk

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>