Wednesday, December 22, 2010

Using Amazon EC2 to speed up matlab optimisation Part II: Setting up an Ubuntu EC2 instance to run the compiled Matlab code

In this tutorial (part 2 of 3), I go through the steps to set up an Amazon EC2 instance to run compiled Matlab code. You need to go through this procedure the first time to install your software. Subsequent times, you can just re-run (many times simultaneously if you like) your instance.

Before running this, you will need to set up an Amazon AWS account (including giving them your credit card details to pay for this).

I have used mostly the AWS management console because it is easy to use.

The first step is to log onto the AWS management console:
https://console.aws.amazon.com/s3/home


and select the EC2 tab. Under region, select whichever region is closest to you (for fastest performance - I chose the Asia-Pacific one, in Singapore).

Before starting the instance, there is some work to do first!

You need to create a key  - Select "key-pairs" on the bottom of the left menu), give it a name, then save it somewhere where you will remember (you will need it later!). The key ensures that only you can log into your server.

You also need to define some "security policies". This basically says which ports will be open so that the outside world can communicate with the machine. Click on "Security groups" on the left menu, then "Create security group". You will need one for ssh. You can call it "ssh", description "ssh". In the bottom of the screen, select "ssh" form the options, and click on "save". In the source, you can put your computer's IP if you want to make sure only you can log onto the computer (I didn't bother).


I also defined another security policy for the Matlab server. I decided (arbitrarily) to use ports 9000-9100 for my application. So repeat the process, but call it "Matlab listeners", description "Matlab listeners", and then down the bottom, select "Custom", "TCP protocol", From port 9000, To port 9100, source 0.0.0.0/0 (i.e. the whole internet) and click on save.



Now we are ready to select an "image". Under images (on the left menu), select AMIs. AMIs are Amazon Machine Images. Under viewing, I chose "64-bit", "Ubuntu" and in the text field "Lucid" (the name of the latest Ubuntu release). I chose to use Ubuntu because I am familiar with it, and I can easily install the same version on my computer to compile the code (and be confident that both are using the same libraries, etc). This will give you a list of images that other people has created and share publically. An added bonus of using Ubuntu (or other open source OS) is that it is free, so nothing else to pay (apart from the AWS fees).

I selected ami-9c2957. Click with the right mouse and select "Launch instance".

In step 1, I selected an "extra large". For now, it is not so important, but when you are actually using it for optimisation you probably want to work out the best trade off between more machines and more cores / machine (and cost!)


For instance details (step 2), I just left the defaults


For create key pair (step 3), select the key that you created earlier:
For security groups (step 4), select the "ssh" and "Matlab listeners"
At step 5, review and make sure everything is OK, then press "Launch". Congratulations, you have launched your first instance (may it be the first of many).

Now click on "instances" on the left menu, and your instance should appear (it may take a little time for it to start up). Right mouse click on it and press "Connect". It will give you instructions on how to ssh to your server. Rather than connecting to root@XXXXX, connect instead to ubuntu@XXXXX. This is because Ubuntu doesn't like you logging in as root. If you have linux / OSX, you can ssh from any terminal window. If using Windows, try PuTTY.

Once logged in, I updated the server:

apt-get update
apt-get upgrade


I then copied over the Matlab MCR: (found on my Ubuntu distribution in: /opt/MATHWORKS_R2010A/toolbox/compiler/deploy/glnxa64/MCRInstaller.bin)
using sftp (in another window), you could also use SCP (replace the 111-111-111-111 with the address of your instance)

sftp -oIdentityFile=~/.ec2/singapore-key.pem ubuntu@ec2-111-111-111-111.ap-southeast-1.compute.amazonaws.com
sftp> put MCRInstalller.bin


The Matlab MCR is 200+ MB, so it takes a while . . . The MCR is needed to run compiled Matlab programs on computers that do not have Matlab installed.

Run the installer:

sudo ./MCRInstaller.bin -console

(to run it without the gui)

Press enter a few times (defaults for everything should be fine).

Install necessary software:
sudo apt-get install zip unzip ruby openssl libopenssl-ruby curl libxpm4 libxt6 libxmu6 libxp6

Download Amazon AMI tools:
curl http://s3.amazonaws.com/ec2-downloads/ec2-ami-tools.zip > ec2-ami-tools.zip

Install them:

mkdir ec2
cp ec2-ami-tools.zip ec2
cd ec2
unzip ec2-ami-tools.zip
ln -s ec2-ami-tools-* current


edit .bashrc file (e.g. nano ~/.bashrc) and add to the end:

export EC2_AMITOOL_HOME=~/ec2/current
export PATH=${PATH}:~/ec2/current/bin


Also make a matlab directory to store the files
mkdir matlab

In order that the servers will always run the latest version, I copied the server code into an Amazon S3 store, and when the servers are run, it will copy the latest version each time.
The easiest way to create a "bucket" is with the management console (under the S3 tab), I called mine "jasonfriedman.software".
Now, copy the compiled Matlab code onto that was created in Part 1. Using the management console, there is an "upload" button. I uploaded the two filess needed, socket_server and run_socket_server.sh
I made each of them public (right click on the files) so that the EC2 instances can download them. If you then select properties, you can get the url of the file (which you will need).
Then I wrote a small perl script to count the number of processors and run that number of servers. It looks at /proc/cpuinfo to count the number, this is not very robust but should do for Amazon EC2 instances. At the beginning, it also downloads the latest version of the servers from the S3 store (as it is stored also on AWS, the transfer is quick and free). Write the script using your favourite text editor and put it in matlab/runservers.


#!/usr/bin/perl -w

system('wget https://s3.amazonaws.com/jasonfriedman.software/socket_server -O /home/ubuntu/matlab/socket_server');
system('wget https://s3.amazonaws.com/jasonfriedman.software/run_socket_server.sh -O /home/ubuntu/matlab/run_socket_server.sh');
system('chmod a+x /home/ubuntu/matlab/run_socket_server.sh /home/ubuntu/matlab/socket_server');
my $numCPUs = `cat /proc/cpuinfo | grep processor | wc -l `;
chomp($numCPUs);

print "There are $numCPUs CPUs\n";

# Now run an instance for each of the CPUs
for (my $i = 1; $i<= $numCPUs; $i++) {
        my $port = $i + 9000;
        system("/home/ubuntu/matlab/run_socket_server.sh /opt/MATLAB/MATLAB_Compiler_Runtime/v713/ $port &");
}


The next step is to make the instance run the Matlab servers by itself when it starts up. This will make it easy to start up many servers.

We do this by adding one line to /etc/rc.local, which is run on each reboot:
/home/ubuntu/matlab/runservers
(put it one line before the last line (exit 0))

The final step is to save the instance so that you don't need to go through all this installing each time. If you have used EBS, you can do it with the management console - just right click on the instance and select "Create Image (AMI)". Then next time, you can run this image as you left it (rather than running someone else's image). If not, then instructions on how to do it are here: http://instantbadger.blogspot.com/2009/09/how-to-create-and-save-ami-image-from.html
This tutorial continues in Part 3.

Note: some of the instructions on this page were modified from:
http://robrohan.com/2009/01/30/saving-a-customised-linux-amazon-instance-ec2-and-s3/

4 comments:

  1. Hi Jason,

    Thank you for an interesting article! Have you tried the Techila solution? Techila presented it in the Microsoft TechDays The presentation slides can be found in http://cid-4f013ad0e7321227.office.live.com/self.aspx/TD2011fi/techdays2011-techila%5E_final.pdf Techila with EC2 or Azure gives MATLAB codes a pretty nice speed-up, the presentation mentions a 45622% acceleration factor.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. Hi Jason,
    Your post is awesome and very helpful to me. Though, when I tried to install MCR by `sudo ./MCRInstaller.bin -console, I failed on my EC2 (via putty) while the same thing succeeded on my Unbuntu virtualbox machine.

    When running to the part `Extracting package...` the installation halted suddenly without any message to be output.

    Do you have any idea for me?
    Best regards,
    Nam

    ps. My AIM is ami-60582132 (Ubuntu 11.04 Natty)

    ReplyDelete
  4. Hi Kome,

    Sorry, I'm not sure what is causing the problem, are you sure the installer is for the right version (32 or 64 bit)? Other than that I can only recommend you ask Mathworks for help.

    ReplyDelete