Tim Dysinger

Life & Tech on Kauai

Importing Enron Into CouchDB

I have been goofing around with couchdb for about a year now. In order to do anything fun or interesting with it, you first must have some data to play with. To solve this I imported the enron email dataset into couchdb so we can have a couple hundred thousand documents.

How? First I downloaded all enron data from the Carnagie Melon University’s Enron Email Data. Then I used the ‘mail trends’ project’s enron.py code to convert the loose files into a unix mbox format so it’s easily understood by code. Once we have the enron data in a format we like, we can use a ruby script below to take the email and push it into couchdb. (Make sure your couchdb installed and running.) The code is as follows:

cat >;rakefile.rb <<\THEEND
%w(time tmail find restclient json).each {|l| require l}

file_create('enron_mail_030204.tar.gz') do
  `curl -O http://download.srv.cs.cmu.edu/~enron/enron_mail_030204.tar.gz`
end

file_create('maildir' => 'enron_mail_030204.tar.gz') do
  `tar xzof enron_mail_030204.tar.gz`
end

desc('import the email to localhost couchdb')
task(:import => 'maildir') do
  RestClient.put('http://localhost:5984/enron', '') rescue nil
  Find.find('maildir') do |path|
    next if FileTest.directory?(path)
    begin
      txt = IO.read(path)
      msg = TMail::Mail.parse(txt)
      next if msg.date < @t = Time.parse("1999-01-01")
      attrs = msg.header.merge('to' => msg.to_addrs,
                               'cc' => msg.cc_addrs,
                               'bcc' => msg.bcc_addrs,
                               'body' => msg.body).reject {k,v v.to_s.empty?}
      RestClient.post('http://localhost:5984/enron',
                      attrs.to_json,
                      :content_type => 'application/json')
    rescue Interrupt
      exit(1)
    rescue Exception => ex
      puts "#{path} #{ex.inspect}"
    end
  end
end
THEEND

sudo gem install rake rest-client json tmail
rake -T
rake import
# .....wait for it.....
rake irb

This will take a while. Not long after the script starts you will see documents showing up in your couchdb. You will see a couple dozen emails are not properly formatted or that wont convert to json but you’ll still end up with most of the emails in your couchdb. Navigate to Couchdb’s Futon and start mappin’ and reducin’ :)

Using Amazon EC2 Metadata as a Simple DNS

I use the amazon metadata for creating /etc/hosts and do this on a cron schedule. This does everything I need. Instead of fancy DynDNS tricks or having to run and manage an internal DNS server I just have a ruby script that looks at the metadata ec2 to build /etc/hosts. It’s easy. To set it up yourself and try it all you need are 3 easy steps.

Start each of your instances with unique named key that matches what you want their internal hostname to be. Such as “onion” or “potato” or whatever you want to call them.

Make sure you have ruby, rubygems and amazon-ec2 (rubygem) installed. Then create a ruby script in /usr/local/sbin/hosts that has the following:

#!/usr/bin/env ruby
%w(optparse rubygems EC2 resolv pp).each {|l| require l}
options = {}
parser = OptionParser.new do |p|
  p.banner = "Usage: hosts [options]"
  p.on("-a", "--access-key USER", "The user's AWS access key ID.") do |aki|
    options[:access_key_id] = aki
  end
  p.on("-s",
       "--secret-key PASSWORD",
       "The user's AWS secret access key.") do |sak|
    options[:secret_access_key] = sak
  end
  p.on_tail("-h", "--help", "Show this message") {
    puts(p)
    exit
  }
  p.parse!(ARGV) rescue puts(p)
end
if options.key?(:access_key_id) and options.key?(:secret_access_key)
  puts "127.0.0.1 localhost"
  EC2::Base.new(options).describe_instances.reservationSet.item.each do |r|
    r.instancesSet.item.each do |i|
      if i.instanceState.name =~ /running/
        puts(Resolv::DNS.new.getaddress(i.privateDnsName).to_s +
             " #{i.keyName}.ec2 #{i.keyName}")
      end
    end
  end
else
  puts(parser)
  exit(1)
end

Setup a cron job to update /etc/hosts as often as you like. I do it once per hour on all my machines

0 * * * * /usr/local/sbin/hosts -a myaccess -s mysecret >/etc/hosts

All my machines have this ec2 security key + script + cron approach. I do not have to run dyndns or any private dns servers to keep track of all my internal server ip addresses. My /etc/hosts looks like the following on the three machines in the test cluster:

127.0.0.1 localhost
10.252.202.221 oahu.ec2 oahu
10.253.115.175 maui.ec2 maui
10.253.114.190 hawaii.ec2 hawaii

Rack: An API for Web Servers and Ruby Frameworks

In today’s ruby web application landscape, every framework developer is writing his/her own handlers for every server he/she wants to support. This results in semi-duplicate code, if not for the web-server developer then for the framework-developer. This is the pain-point that Rack aims to solve. Rack proposes “why not have some common ground?” Java did this with the Servlet API 10 years ago. Python did this with WSGI 5 years ago.

By leveraging Rack, framework developers and web-server developers gain access to one another without having to write special adapters. Today that’s WEBrick, Mongrel, CGI, Ebb, Fuzed & Thin for web-servers and Rails, Camping, Coset, Halcyon, Maveric, Merb, Racktools::SimpleApplication, Ramaze, Sinatra & Vintage for web-frameworks (this list will undoubtably be outdated soon). Tomorrow every new web-server and web-framework that supports Rack can be used together. You’ll be able to pick and choose the best web-server for you without changing your favorite web-framework and vice-versa.

“What do Rails developers really stand to gain today by leveraging Rack?” It might be the ability to run several “rackable” applications side by side inside a single web-server instance. It might be the possibility to leverage or stack applications. You can intercept requests, modify them and pass them through to other handlers. You can also have multiple rackable applications sitting next to each other that comprise one user-facing application. Don’t like file uploads with Rails? Use another web framework or the Rack API directly to write it and place it along side your Rails app in the same application. Want to use single-sign-on for 3 Rails apps? No problem. Rack makes it easy to tie apps together.

Rack Hello World

( gem install rack & mongrel & first and then after firing up the example visit localhost )

%w(rubygems rack).each {|l| require l}
Rack::Handler::Mongrel.run(
  lambda {|x| [301, {'Location' => 'http://rubyurl.com/g6L'},'']},
  :Port => 3000
)

Karma Yoga in Software Engineering

I work in software development and it is a very competitive business. At times I have to catch myself, when I feel an emotion, and ask myself “Why?”. Why am I being competitive? Why am I seeking recognition? Why am I wanting control? Is my argument on the design the best for the team? Are my motivations the best for the project?

In reading about Karma Yoga, I realize that this is exactly what software developers need to do when writing software. Karma Yoga means “discipline of action” and is based on the teachings of the Bhagwat Geeta, a sacred Sanskrit scripture of Hinduism.

Karma Yoga is described as a way of acting, thinking and willing by which one does one’s duty without consideration of personal selfish desires, likes or dislikes. Acting without being attached to the fruits of one’s deeds. In software this is doing what needs to be done for the betterment of the project and team without attaching your ego and self-worth to the code you write or the contribution you make.

When this mindset is taken on by software developers, collaboration, camaraderie and team-work increases while tensions, egos, stress, competition and caustic attitudes decrease. It has to be consciously chosen, but this is something to be strived for on teams.

Creating Blank Git Branches

Most of the time in git you will be creating branches of your main project and working on them. What if you wanted to create a git headless branch called ‘documentation’? It doesn’t really deserve it’s own repository because it’s so closely related. The git project itself does this with documentation. The git project repository has separate branches for master, docs and man pages etc too. Here’s how you do it.

Go into your git project and type

git symbolic-ref HEAD refs/heads/empty
touch .gitignore
git add .gitignore
git commit -m 'Initial headless branch commit'

That’s it - now you have a new branch ‘empty’.

Creating the Perfect Gentoo Amazon EC2 AMI (Image)

Update: I need to upgrade this for amazon ec2 2008-02-01 api.

I been playing with Gentoo again. I hadn’t been an active Gentoo user since it pissed me off in a emerge -u world snafu in 2004. I created some Gentoo EC2 images and thought I would share with you all.

I have recently stopped using Xen to create new images and started using Amazon EC2 AMIs to create new AMIs directly – “dog food”-style. The script below is an example of this. There is no need to have 32 & 64-bit Xen Dom0 machines around the house to get started creating custom AMIs. All you need is an Amazon EC2 account. Just fire up someone else’s Linux image and go to work creating a new AMI. I have been using Amazon’s Fedora 4 “developer” 32-bit “small” image to create a nice lean Gentoo image. Here is my script.

# Boot a developer image at EC2 && Login as root on the instance

# Move the /tmp dir to the big drive
mv /tmp /mnt && ln -sf /mnt/tmp /

# Bootstrap
mkdir /mnt/gentoo
wget -O - \
  http://gentoo.osuosl.org/releases/x86/current/stages/stage3-i686-2007.0.tar.bz2  \
  tar xjC /mnt/gentoo
wget -O - http://gentoo.osuosl.org/snapshots/portage-latest.tar.bz2  \
  tar xjC /mnt/gentoo/usr
wget -O - http://s3.amazonaws.com/ec2-downloads/linux-2.6.16-ec2.tgz  \
  tar xzC /mnt/gentoo/usr/src
zcat /proc/config >/mnt/gentoo/usr/src/linux-`uname -r`/.config

# FUSE module (has to be compiled with the same gcc as ec2's kernel)
cd /tmp
wget -O - \
  http://superb-west.dl.sourceforge.net/sourceforge/fuse/fuse-2.7.3.tar.gz  \
  tar xz
cd fuse-2.7.3
./configure --enable-kernel-module \
  --with-kernel=/mnt/gentoo/usr/src/linux-`uname -r`
cd kernel
make && make install
mkdir -p /mnt/gentoo/lib/modules/`uname -r`
cp -r /lib/modules/`uname -r` /mnt/gentoo/lib/modules/`uname -r`

# Setup
cat /proc/mounts >/mnt/gentoo/etc/mtab
mount -o rbind /proc /mnt/gentoo/proc
mount -o rbind /dev /mnt/gentoo/dev
mount -o rbind /sys /mnt/gentoo/sys
cp /etc/resolv.conf /mnt/gentoo/etc

# Chroot
chroot /mnt/gentoo /bin/bash
env-update
source /etc/profile
export PS1="(image) $PS1"

# Modules / Kernel
depmod -a
modprobe loop
echo 'loop' >>/etc/modules.autoload.d/kernel-2.6
echo 'fuse' >>/etc/modules.autoload.d/kernel-2.6
cd /usr/src && ln -sf linux-`uname -r` linux

# Cleanup
cd /
rm -rf tmp && ln -sf var/tmp tmp
rm -rf opt && ln -sf usr/local opt
rm -rf boot

# Root
usermod -p \
  `dd if=/dev/urandom count=50 2> /dev/null  md5sum  cut -d " " -f1-1` \
  root

# Rebuild
cat >/etc/make.conf <<\EOF
CFLAGS="-O2 -march=i686 -pipe -mno-tls-direct-seg-refs"
CXXFLAGS="${CFLAGS}"
CHOST="i686-pc-linux-gnu"
MAKEOPTS="-j2"
EOF
emerge --sync
emerge -e world
emerge --update --newuse --deep world ; # are these both needed ^ <-
etc-update
emerge eix gentoolkit
emerge --depclean
revdep-rebuild

# Locale
cat >/etc/locale.gen <<\EOF
en_US ISO-8859-1
en_US.UTF-8 UTF-8
EOF
locale-gen

# Timezone
cp /usr/share/zoneinfo/GMT /etc/localtime
cat >>/etc/conf.d/clock <<\EOF
TIMEZONE="GMT"
EOF

# Mounts
cat >/etc/fstab <<\EOF
/dev/sda1 /        ext3  user_xattr          0 1
/dev/sda2 /mnt     ext3  user_xattr          0 2
/dev/sda3 swap     swap  sw                  0 0
shm       /dev/shm tmpfs nodev,nosuid,noexec 0 0
EOF

# TTY
perl -p -i -e 's/^c([^1])/\#c$1/g' /etc/inittab

# Network
emerge dhcpcd ddclient net-misc/ntp
rc-update add net.eth0 default
rc-update add sshd default
rc-update add ntpd default
cat >/etc/ssh/sshd_config <<\EOF
Protocol 2
StrictModes yes
MaxStartups 10:30:60
Ciphers aes256-cbc,aes256-ctr
PasswordAuthentication no
ChallengeResponseAuthentication no
Subsystem sftp /usr/lib/misc/sftp-server
UseDNS no
EOF

# Boot
cat >/etc/conf.d/local.start <<\EOF
# /etc/conf.d/local.start
# Root SSH Public Key
[ ! -e /root ] && cp -r /etc/skel /root
wget --timeout 15 -q -O - \
  http://169.254.169.254/2007-12-15/meta-data/public-keys/0/openssh-key >\
  /root/.ssh/authorized_keys
chmod -R go-rwsx /root
# Userdata Shell Script
wget --timeout 15 -q -O - http://169.254.169.254/2007-12-15/user-data  sh
EOF

# EC2 tools
emerge ruby curl unzip symlinks
cd /tmp
wget http://s3.amazonaws.com/ec2-downloads/ec2-ami-tools.zip
cd /usr/local
unzip /tmp/ec2-ami-tools.zip
ln -sf ec2* ec2-ami-tools
chmod -R go-rwsx ec2*
rm -rf /tmp/ec2*
# Recompile rsync (lutimes doesn't work with old ec2 kernel)
cd /tmp
wget -O - http://www.samba.org/ftp/rsync/src/rsync-2.6.9.tar.gz \
  tar xz
cd rsync-2.6.9
perl -pi.bak -e 's/\blutimes\b//' ./configure
./configure --prefix=/usr/local/ec2-ami-tools
make
make install
cd ..
rm -rf rsync*

# Bundle
export AMAZON_USER_ID='FIXME put your user id here'
export AMAZON_ACCESS_KEY_ID='FIXME put your access key here'
export AMAZON_SECRET_ACCESS_KEY='FIXME put your secret access key here'
cat >/mnt/pk.pem <<\EOF
-----BEGIN PRIVATE KEY-----
FIXME: put your cert here
-----END PRIVATE KEY-----
EOF
cat >/mnt/cert.pem <<\EOF
-----BEGIN CERTIFICATE-----
FIXME: put your cert here
-----END CERTIFICATE-----
EOF
export EC2_PRIVATE_KEY=/mnt/pk.pem
export EC2_CERT=/mnt/cert.pem

cat >/usr/local/sbin/image <<\EOF
#!/bin/bash
export EC2_AMITOOL_HOME=/usr/local/ec2-ami-tools
PATH=$EC2_AMITOOL_HOME/bin:$PATH
BUNDLE=`date '+%y%m%d%H%M%S'`
ec2-bundle-vol -r i386 -u $AMAZON_USER_ID \
  -k $EC2_PRIVATE_KEY -c $EC2_CERT \
  -b -d /mnt -s 10000 --fstab /etc/fstab \
  -e /root -p $BUNDLE
ec2-upload-bundle -b $HOSTNAME -m /mnt/$BUNDLE.manifest.xml \
  -a $AMAZON_ACCESS_KEY_ID -s $AMAZON_SECRET_ACCESS_KEY
rm -rf /mnt/$BUNDLE* /mnt/img-mnt
EOF
chmod 700 /usr/local/sbin/image

export HOSTNAME=gentoo-i686
rm -rf /var/tmp/* /usr/portage/distfiles /usr/portage/packages
symlinks -crsdv /
image

# Register & make the ami public (on another machine)
ec2-register $HOSTNAME/$BUNDLE.manifest.xml
ec2-modify-image-attribute ami-xxxxxx --launch-permission -a all

# Below is an example of a boot script that you might pass in as
# "userdata" You would configure the hostname and dyndns and/or
# maybe puppet or cfengine

#!/bin/bash
# Hostname
echo 'HOSTNAME="fqdn.example.com"' >/etc/conf.d/hostname
/etc/init.d/hostname restart
echo '127.0.0.1 '`hostname -f`' '`hostname -s`' localhost' >/etc/hosts
echo 'search '`hostname -d` >/etc/resolv.conf
echo 'nameserver 172.16.0.23' >>/etc/resolv.conf
echo 'dhcp_eth0="release nodns nontp nonis"' >/etc/conf.d/net
/etc/init.d/net.eth0 restart
# DynDNS
cat >/etc/ddclient/ddclient.conf <<\EOF
daemon=300
syslog=yes
mail=root
mail-failure=root
ssl=yes
use=web, web=169.254.169.254/2007-12-15/meta-data/public-ipv4
protocol=dyndns2, server=members.dyndns.org, custom=yes, \
login=FIXME, password=FIXME \
EOF
hostname >>/etc/ddclient/ddclient.conf
/etc/init.d/ddclient start
rc-update add ddclient default
fi

# After the new instance is booted, you may want to login and
# configure some basic tools or whatever

# Extras Tools
cat >>/etc/portage/package.keywords <<\EOF
dev-util/git
sys-fs/encfs
sys-fs/fuse
sys-fs/sshfs-fuse
EOF
emerge dev-util/git
emerge sys-fs/fuse sys-fs/encfs sys-fs/sshfs-fuse

Using Ruby to Control Lego Mindstorms NXT

Playing with my son on Lego NXT requires me to get Ruby in the mix just for fun. Here is the install notes that I used to get everything going.

cd /tmp

# Ruby Serial
svn export http://ruby-serialport.rubyforge.org/svn/trunk ruby-serial
cd ruby-serial
ruby extconf.rb
make
sudo make install
cd ..

# Install libusb
svn export https://libusb.svn.sourceforge.net/svnroot/libusb/trunk/libusb libusb
cd libusb
sh autogen.sh
./configure
make
sudo make install
cd ..

# Ruby USB
svn export svn://svn@svn.a-k-r.org/akr/ruby-usb/trunk ruby-usb
cd ruby-usb
ruby extconf.rb
make
sudo make install
cd ..

# Ruby NXT
gem install ruby-nxt

# Try it in IRB
require 'rubygems'
require 'nxt_comm'
comm = NXTComm.new
comm.connected?
comm.get_device_info
comm.get_firmware_version

Mounting Remote Servers as a Drive on OS X With Mac FUSE and SSHFS

A handy tip for all you Mac OS X users out there: Have servers to deal with over SSH? You can download and install MacFUSE and SSHFS. Once you have installed both, you can fire up the SSHFS app. SSHFS can mount a remote server as a local drive on your mac. Then you can edit files in place and drag and drop files to transfer securely to the remote server.

It’s pretty cool. Thanks to the FUSE team for writing it. Thanks to Google for porting it to the Mac MacFUSE

Happy 60th Birthday, Dad!

Just wanted to acknowledge my father as he turns 60 today. I really admire you and all of your accomplishments. You have always taken the time to talk with me and be a mad-scientist-mentor to me. My first gunpowder-bombs, tool-boxes, fishing-poles, camping-gear, snow-sleds, mini-bikes, tree-forts, computers and other fun stuffs came from you, dad. Thanks for teaching me that moving forward requires persistence and perspiration. Thanks for encouraging me on my way with many a computer-gift and a pat on the back for encouragement.

I love you, dad.

Ruby Lasagna

You have heard of Spaghetti code if you’ve done procedural or scripting-based programming. A similar thing can happen in Ruby with it’s polymorphism and the super-dynamic behavior. Ruby can become a frustrating “Lasagna” sometimes with all these dynamic classes, dynamic instances and dynamic behavior. All this can be hard to debug and follow. It’s almost like too many lisp macros.

Chad Fowler recently wrote “I love the tricks you can do with Ruby. method_missing, const_missing, autoloading, and their friends make really powerful things possible. But they do so at a price. When something goes wrong in a piece of code that relies heavily on one of these tricks, it can be much much harder to track down. So the decision to use such a tool shouldn’t be taken lightly. These are power tools. Used effectively, really cool things can happen. Used incorrectly, you can easily find yourself limb-less and bloody. So when you decide to use one of these power tools, you have to ask yourself: is it worth the risk?”