Author Archives: earl

Unique is broken in R

Are you kidding me? $ R > unique(1,1,2,3,4) [1] 1 This was the source of yesterday’s nasty to track down bug. What you really want is unique on a vector, as in: > unique(c(1,1,2,3,4)) [1] 1 2 3 4 I … Continue reading

Posted in Programming, Programming Languages Suck, R, R Tip, Suck | 1 Comment

Formatted numbers in Ruby

In C or C++, it’s can be a pain to get thousands separators in printf. In ruby, it can be trivial, as long as you use the right libraries. If you have ActiveSupport installed (which I believe comes with Rails), … Continue reading

Posted in Programming, Programming Languages Suck, Ruby | Leave a comment

Finding the sort order of an array in R or Ruby

Suppose you have an array that you’d like to sort by another array. A common use case might be a set of arrays of somethings and for each something you generate a score in say [0,1]. Now you’d like to … Continue reading

Posted in Programming, Programming Languages Suck, R, R Tip | Leave a comment

Getting the value of a variable from a string in R

It’s often convenient to use reflection to get the value of a variable from the name as a string. In R, you can use the get function to do this. In R : blog $ R > x = 3 … Continue reading

Posted in R, R Tip | Leave a comment

Windows still sucks, can’t read exfat formatted on a mac

Now that OS X Snow Leopard supports exfat / fat64 and Microsoft Windows theoretically does, you might naively assume that you can share external drives between Vista and Snow Leopard. This is true with one giant caveat: you can’t format … Continue reading

Posted in Suck | Leave a comment

Prepping the Reuters 21578 classification sample dataset

I’ve been playing around with some topic models and decided to look at the Reuters 21578 dataset. For your convenience, this dataset is stored as xml split between 20 files or so. And invalid xml at that. I prefer to … Continue reading

Posted in Classifiers, Programming, Programming Languages Suck | Leave a comment

Thousands Separator in printf in C++

I’ve unfortunately been writing some C++. It’s the crappiest language in the world. I just wasted 90 perfectly good minutes attempting to put thousands separators in numbers that I’m printf ing. If you naively read the man pages, it looks … Continue reading

Posted in Programming, Programming Languages Suck | Leave a comment

Watching Lecture Videos on Your Computer

I’ve recently been watching some of the lecture videos available on videolectures.net. The site is a great resource, but often the lecturers speak too slowly. I really prefer to watch lecture videos at a higher speed, otherwise I lose focus, … Continue reading

Posted in Uncategorized | Leave a comment

Horizontal Paging of Greenplum or Postgres Queries

When using gpsql or pgsql to query greenplum or postgres respectively, query results which exceed the width of your term will wrap in a very annoying fashion. To get horizontal paging, set the environmental variable PAGER: export PAGER=’less -RSFX’ then … Continue reading

Posted in Data Munging | Tagged , | Leave a comment

Interactive Plotting in R

There are many ways to compare univariate distributions; one of my favorites is violin plots. However, if you are only comparing two distributions, then the best solution is often a scatter plot. To that end, I’ve build some code that … Continue reading

Posted in Data Munging, Plotting, R, R Tip, Visualization | Leave a comment