moving!

This blog has moved to earlh.com/blog . Most of the posts are being redirected; please update your links. Also, there’s actual new content there!

Thank you.

Posted in Uncategorized | Leave a comment

Equifax are Scum Who Sell Your Email Address to Scammers

There’s only one company that (should) have ever seen the highlighted email address. It’s also not a common word that you would find in a dictionary attack.

Posted in Uncategorized | 1 Comment

Useful tweaks for Hadoop on EMR

more ram for the workers: modify mapred-site.xml and add

mapred.child.java.opts-Xmx3192m

To push the changes to all the machines, use the script to modify mapper or reducer count on a running emr cluster.

Posted in emr, hadoop, Hive, Programming Languages Suck, yak shaving | Leave a comment

Modifying the Number of Mappers or Reducers on a Running EMR Cluster

Amazon emr unfortunately doesn’t give you an easy way to change the number of mappers and reducers on a running cluster. To do so before booting the cluster, add

 --bootstrap-action="s3://elasticmapreduce/bootstrap-actions/configure-hadoop"  \
   --args "-m,mapred.tasktracker.map.tasks.maximum=4,-m,mapred.tasktracker.reduce.tasks.maximum=2"

as appropriate to the elastic-mapreduce.rb command.

For a running emr cluster, you can use the following scripts. Navigate to the conf directory; it will be in a path similar to:

/home/hadoop/.versions/1.0.3/conf

Edit mapred-site.xml and replace either or both of

mapred.tasktracker.map.tasks.maximum

or

mapred.tasktracker.reduce.tasks.maximum

Then copy and paste these commands:

$ # distribute the file to all nodes
hadoop job -list-active-trackers | sed "s/^.*_//" | sed "s/:.*//" | xargs -t -I{} -P10 scp -o StrictHostKeyChecking=no  mapred-site.xml hadoop@{}:.versions/1.0.3/conf/
$
$ # bounce the tasktrackers on each node
hadoop job -list-active-trackers | sed "s/^.*_//" | sed "s/:.*//" | xargs -t -I{} -P10 ssh -o StrictHostKeyChecking=no hadoop@{}   sudo /etc/init.d/hadoop-tasktracker stop
$
$ # restart the jobtracker on the headnode
sudo /etc/init.d/hadoop-jobtracker stop

One way to verify this worked is on the jobtracker web page.

jobtracker

Posted in bash, church of xargs, emr, hadoop | Leave a comment

Building lush on OSX Lion

If building lush2 errors out with this compilation error

g++ -L/opt/local/lib -DHAVE_CONFIG_H -DNO_DEBUG -Wall -O3 -mmmx -msse -I../include  -I/opt/local/include -I/opt/local/include/freetype2  -o lush2 at.o binary.o cref.o calls.o arith.o check_func.o date.o dh.o dump.o eval.o fileio.o fltlib.o fpu.o function.o event.o graphics.o htable.o idx1.o idx2.o idx3.o idx4.o index.o io.o list.o main.o math.o misc.o cmm.o module.o number.o oostruct.o regex.o storage.o string.o symbol.o toplevel.o user.o weakref.o ps_driver.o rng.o lisp_driver.o x11_driver.o unix.o   cpp.o -L/opt/local/lib -lXft -lSM -lICE -lX11 -liconv -lreadline -lcurses -lutil -ldl -lm  
Undefined symbols for architecture x86_64:
  "_FcNameParse", referenced from:
      _getfont in x11_driver.o
  "_FcPatternDestroy", referenced from:
      _getfont in x11_driver.o
  "_FcPatternGet", referenced from:
      _getfont in x11_driver.o
  "_FcPatternDel", referenced from:
      _getfont in x11_driver.o
  "_FcPatternAdd", referenced from:
      _getfont in x11_driver.o
  "_FcNameUnparse", referenced from:
      _getfont in x11_driver.o
ld: symbol(s) not found for architecture x86_64
collect2: ld returned 1 exit status
make[1]: *** [lush2] Error 1
make: *** [all] Error 2

Jason Aten was kind enough to fix this for Snow Leopard and later, as detailed in the lush mailing list archive. Grab Jason’s lush2 git repo from github.

Posted in yak shaving | Leave a comment

Make Private Mode Work in Safari

As I mentioned, private mode is broken in safari. There is, however, a workaround:

  1. start safari
  2. start private mode
  3. browse secret websites!
  4. exit private mode
  5. close all your tabs and the window
  6. open a new window (this window will not be in private mode)
  7. close the new window

And it works! amazon searches no longer persist! You can now exit safari. This works on osx 10.7.4 build 11E53 with safari 5.1.7.

Alternately… just get chrome. It works out of box.

Posted in Uncategorized | 3 Comments

Quoting for Hacker News

I hate hand quoting text for hacker news posts. Mostly as a memo to myself, this command will wrap stdin to 77 columns preferentially breaking on spaces, insert the first 3 spaces so that HN recognizes the text as a quote, then leave the output in my copybuffer.

echo 'the stuff I want to quote here' | fold -w 77 -s | sed "s/^/   /" | pbcopy

Feel free to use pbpaste instead of echo.

Posted in Uncategorized | Tagged | Leave a comment

Howto Make find xargs grep Robust to Spaces in Filenames on a Mac

The unix pattern for filtering files with a predicate then searching within them is find xargs grep. For example, to search every file whose filename contains notes for a line containing mysql

$ find . -iname "*notes*" | xargs grep -i mysql

Unfortunately, on OSX this is not robust to spaces or quotes in filenames. Thus if you have a filename like

./Dropquest 2012/Captain's Logs/Chapter 1.txt

in your search path the typical find xargs grep invocation will terminate with the error

xargs: unterminated quote

The first thing to know is you can use the -t parameter in xargs to at least tell you which filename it’s dying on, but that’s of limited use in making the command work. Even using -I{} with xargs and grep to surround the filename with quotes doesn’t fix this.

$ find . -iname "*notes*" | xargs -I{} grep -i mysql "{}"

Many people must have run into this problem because there is a simple solution that all the tools understand: use nulls instead of newlines to delimit files.

$ find . -iname "*notes*" -print0 | xargs -I{} -0 grep -i mysql "{}"

and it works!

Posted in bash, OSX, Suck | Leave a comment

Transpose or Pivot from Bash

I recently had a set of data in rows that I wanted to put in columns, just like transpose does in excel. Here’s a little ruby script that will do it. Ideally, I’d extend this to take a -F argument to control what the script splits on just like awk.

$ cat bin/transpose 
#!/usr/bin/ruby

# otherwise reading blocks
exit if STDIN.tty?

lines = []
STDIN.each do |line|
   lines << line.strip.split(',').each{ |x| x.strip! }
end

columns = lines.shift
columns = columns.zip(*lines)

columns.each do |column|
   puts "#{ column.join(', ') }"
end

usage:

$ echo "col1,0, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99, 0.999, 1.0
col2,0.0,1.0,1.0,1.0,2.0,2.0,2.0,4.0,5.0,7.0,9.0,14.0,22.0,47.0,85.0,258.0,1127.0,1834676.0" | transpose
col1, col2
0, 0.0
0.001, 1.0
0.005, 1.0
0.01, 1.0
0.05, 2.0
0.1, 2.0
0.2, 2.0
0.3, 4.0
0.4, 5.0
0.5, 7.0
0.6, 9.0
0.7, 14.0
0.8, 22.0
0.9, 47.0
0.95, 85.0
0.99, 258.0
0.999, 1127.0
1.0, 1834676.0
Posted in bash, Data Munging | Leave a comment

Two Quotes About the Internet

“I’m not going to apologize for the cost,” Zimmermann told CNET, adding that the final price has not been set. “This is not Facebook. Our customers are customers. They’re not products. They’re not part of the inventory.”

Phil Zimmerman, creator of pgp.

“If you are not paying for it, you’re not the customer; you’re the product being sold.”

blue_beetle on mefi regarding yet another crappy Digg redesign

Posted in Online Advertising | Leave a comment