Howto Swap the Order of Columns in a CSV or TSV File – Use awk

Sample file: tab separated

col1	col2	col3
val11	val12	val13
val21	val22	val23
val31	val32	val33

blog earl$ awk  '{FS="\t"; OFS=", "; print $1,$3,$2}' < input.tsv


In this case, FS is the field separator for the input and OFS is the field separator for the output. Thus if we wanted to go to eg tsv to tsv we would set both to "\t" (default for awk); csv to csv would be FS=', ' and OFS = ', ', etc. Of course, you can also replicated columns by repeating them, eg $1,$1,$3,$2,$3 would output 5 columns of which 2 are replicated. In fact, there are all sorts of neat tricks that one can play with sed.

Again, this is far easier in R, but sometimes either the overhead of loading data into R is too much (R will happily work with millions of rows, but I regularly end up working with tens or hundreds), or R isn't available, or you really want to automate a process (that also can be done inside R, but still).
Equivalent code in R would go something like:

> in <- read.csv(file='input.csv', header=T, sep=',')
>
> # permute the order of columns
> in <- in[, c(1,3,2)]
>
> write.csv(file='output.csv', row.names=F)


And of course, if you needed to replicate the columns in R, you could say

> # replicate columns as above with awk
> in <- in[, c(1,1,3,2,3)]

This entry was posted in Data Munging and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>