Noisy accumulation

Wednesday, February 1, 2012

How to decompose 2D trajectory data into submovements using matlab

There is significant evidence that movements are made up of elementary building blocks (see Flash & Hochner, 2005 for a review). When looking at arm reaching movements, or drawing movements, these building blocks are often called "submovements". If we assume that these submovements have a particular form, then we can decompose an arbitrary movement trajectory into its constituent submovements.

Here, I will assume that the submovements are minimum jerk submovements, that is, they minimize mean squared jerk (other options are available). These submovements are described as 5th order polynomials with 3 free parameters (which is the minimum possible) - amplitude, duration, and starting time. The equation for the position (relative to the starting point) and the velocity are given by:

where D is the duration, A is the amplitude and t0 the starting time, and t0 < t < t0 + D. The Matlab function MJxy.m computes this velocity profile.

We can have more than one submovement being executed at a time, in which case we just add the change of positions of both submovements. Our goal is to take a trajectory and decompose it one or more submovements. To do this, we need to find the 3 parameters (or 4 if we are working in 2D, where we have an extra amplitude parameter) that describe each submovement (and decide how many submovements we need for a particular movement). To decide how good a reconstruction is, we need to define a cost, which is based on the difference between the observed trajectory and the reconstructed trajectory. Here will we used the squared difference (MSE). Minimising the MSE is equivalent to minimising the root mean squared error (RMSE), we use the MSE for computational reasons. Once we have the cost, we need to vary the parameters to find the minimum cost (i.e., best fit).

Rather than just minimising the difference between the observed and predicted velocity profiles, we also include a term for the tangential velocity, which we calculate for each submovement and then sum. This is to prevent the algorithm finding two submovements which start at approximately the same time, but have approximately equal and opposite amplitudes. While the resultant velocity will be approximately zero (and so won't affect the MSE), from a theoretical point of view, these seem highly unlikely. So our cost function E is:

where Fxi are Fyi are the predicted ith submovement velocities in the x and y directions, and Gx and Gy are the observed x and y velocities. The cost function, in Matlab, can be found in the file calculateerrorMJxy.m. Due to the potentially large number of parameters (4 per submovement), the optimisation can be very slow. In order to help Matlab along, I calculated the gradient (partial derivatives of the error function) and the Hessian (second-order partial derivatives), which otherwise Matlab has to estimate numerically. These expressions are long and complicated, so I used the program maxima to find them [the maxima code, and a perl script to turn the maxima output into legal matlab code, and some code to check that there are no mistakes in calculating the gradient and Hessian].

To minimise this function in Matlab, we can use fmincon (part of the Optimization toolbox). We use this rather than fminsearch so that we can set constraints on some of the variables, in particular:

0 <= t0 < tf - 0.167
0.167 < D < 1

These enforce that submovements need to have a duration of at least 167 ms. We also constrain the amplitudes to reasonable values. After running the optimization and getting back the parameters and the error, we need to decide how many submovements we want. I usually just use a threshold (0.02), and select the minimum number of submovements that give an error of less than 0.02. Before running the optimization, we need to select a starting "guess". Here, I randomly select a starting point within the legal ranges for each parameter. However, we may end up in a local minimum. To minimize the chance of this occurring, we start from 10 different starting points, as suggested in Rohrer & Hogan (2006).

The function decompose.m performs the decomposition (calls fmincon, etc). I have provided an example (example.m) to demonstrate how to use the code (which needs a sample data file: sampledata.mat). The output of the example program is shown below:

In this case, two submovements were needed to get the reconstruction error less than 0.02. In this case, this consisted of a small submovement to the right (in green, starting at 378 ms with an amplitude of [0.088,0.125] and duration of 345 ms), followed by another submovement at 551 ms (in red, with an amplitude of [-0.432,0.346], and a duration of 689ms). Note that these numbers which describe the submovements are taken from the best fit parameters.

All files needed to run the demo can be downloaded here in a zip file.

Wednesday, December 22, 2010

Using Amazon EC2 to speed up matlab optimisation III: Getting it all working together, and some speed results

OK, this is the final part of the tutorial
In part 1, you should have created the matlab server to listen for commands. In part 2, you should have started an EC2 instance, installed the necessary software, and saved it for future use.
In part 3, we will write a client in Matlab to send off optimisation jobs to the servers to run and get back the results.
I have implemented this as a matlab class, @socket_client.
The code is available to download
The constructor (socket_client.m) just takes as an argument the location of the ssh key for accessing the machines. To use the default location, the constructor is run as follows:

s = socket_client();

The next step is to update the server list:

[s,servers] = updateserverlist(s);

The program updateserverlist.m calls the program findinstances.m, which uses the program ec2-describe-instances, which should have been installed as part of the EC2 API. In this way, all servers that have been started (as described in part 2) will be utilised.
Then a set of jobs can be constructed, for example:

% Set up 20 jobs
for k=1:20
   joblist(k).command = codes.decompose;
   joblist(k).arguments = {a(k).time,a(k).vel,[],[],[]};
end

Then finally they can be run:

[results,finishtimes]  = runjobs(s,joblist,1);

The runjobs code keeps track of what each server is doing. When a job is finished it saves the results, and assigns it the next job from the queue.
And that is it! I'll be happy to hear if you get this working yourself or have an alternative solution to this problem.

Using Amazon EC2 to speed up matlab optimisation Part II: Setting up an Ubuntu EC2 instance to run the compiled Matlab code

In this tutorial (part 2 of 3), I go through the steps to set up an Amazon EC2 instance to run compiled Matlab code. You need to go through this procedure the first time to install your software. Subsequent times, you can just re-run (many times simultaneously if you like) your instance.

Before running this, you will need to set up an Amazon AWS account (including giving them your credit card details to pay for this).

I have used mostly the AWS management console because it is easy to use.

The first step is to log onto the AWS management console:
https://console.aws.amazon.com/s3/home

and select the EC2 tab. Under region, select whichever region is closest to you (for fastest performance - I chose the Asia-Pacific one, in Singapore).

Before starting the instance, there is some work to do first!

You need to create a key - Select "key-pairs" on the bottom of the left menu), give it a name, then save it somewhere where you will remember (you will need it later!). The key ensures that only you can log into your server.

You also need to define some "security policies". This basically says which ports will be open so that the outside world can communicate with the machine. Click on "Security groups" on the left menu, then "Create security group". You will need one for ssh. You can call it "ssh", description "ssh". In the bottom of the screen, select "ssh" form the options, and click on "save". In the source, you can put your computer's IP if you want to make sure only you can log onto the computer (I didn't bother).

I also defined another security policy for the Matlab server. I decided (arbitrarily) to use ports 9000-9100 for my application. So repeat the process, but call it "Matlab listeners", description "Matlab listeners", and then down the bottom, select "Custom", "TCP protocol", From port 9000, To port 9100, source 0.0.0.0/0 (i.e. the whole internet) and click on save.

Now we are ready to select an "image". Under images (on the left menu), select AMIs. AMIs are Amazon Machine Images. Under viewing, I chose "64-bit", "Ubuntu" and in the text field "Lucid" (the name of the latest Ubuntu release). I chose to use Ubuntu because I am familiar with it, and I can easily install the same version on my computer to compile the code (and be confident that both are using the same libraries, etc). This will give you a list of images that other people has created and share publically. An added bonus of using Ubuntu (or other open source OS) is that it is free, so nothing else to pay (apart from the AWS fees).

I selected ami-9c2957. Click with the right mouse and select "Launch instance".

In step 1, I selected an "extra large". For now, it is not so important, but when you are actually using it for optimisation you probably want to work out the best trade off between more machines and more cores / machine (and cost!)

For instance details (step 2), I just left the defaults

For create key pair (step 3), select the key that you created earlier:

For security groups (step 4), select the "ssh" and "Matlab listeners"

At step 5, review and make sure everything is OK, then press "Launch". Congratulations, you have launched your first instance (may it be the first of many).

Now click on "instances" on the left menu, and your instance should appear (it may take a little time for it to start up). Right mouse click on it and press "Connect". It will give you instructions on how to ssh to your server. Rather than connecting to root@XXXXX, connect instead to ubuntu@XXXXX. This is because Ubuntu doesn't like you logging in as root. If you have linux / OSX, you can ssh from any terminal window. If using Windows, try PuTTY.

Once logged in, I updated the server:

apt-get update

apt-get upgrade

I then copied over the Matlab MCR: (found on my Ubuntu distribution in: /opt/MATHWORKS_R2010A/toolbox/compiler/deploy/glnxa64/MCRInstaller.bin)
using sftp (in another window), you could also use SCP (replace the 111-111-111-111 with the address of your instance)

sftp -oIdentityFile=~/.ec2/singapore-key.pem ubuntu@ec2-111-111-111-111.ap-southeast-1.compute.amazonaws.com

sftp> put MCRInstalller.bin

The Matlab MCR is 200+ MB, so it takes a while . . . The MCR is needed to run compiled Matlab programs on computers that do not have Matlab installed.

Run the installer:

sudo ./MCRInstaller.bin -console

(to run it without the gui)

Press enter a few times (defaults for everything should be fine).

Install necessary software:
sudo apt-get install zip unzip ruby openssl libopenssl-ruby curl libxpm4 libxt6 libxmu6 libxp6

Download Amazon AMI tools:
curl http://s3.amazonaws.com/ec2-downloads/ec2-ami-tools.zip > ec2-ami-tools.zip

Install them:



mkdir ec2

cp ec2-ami-tools.zip ec2

cd ec2

unzip ec2-ami-tools.zip

ln -s ec2-ami-tools-* current

edit .bashrc file (e.g. nano ~/.bashrc) and add to the end:



export EC2_AMITOOL_HOME=~/ec2/current

export PATH=${PATH}:~/ec2/current/bin

Also make a matlab directory to store the files
mkdir matlab

In order that the servers will always run the latest version, I copied the server code into an Amazon S3 store, and when the servers are run, it will copy the latest version each time.
The easiest way to create a "bucket" is with the management console (under the S3 tab), I called mine "jasonfriedman.software".
Now, copy the compiled Matlab code onto that was created in Part 1. Using the management console, there is an "upload" button. I uploaded the two filess needed, socket_server and run_socket_server.sh
I made each of them public (right click on the files) so that the EC2 instances can download them. If you then select properties, you can get the url of the file (which you will need).
Then I wrote a small perl script to count the number of processors and run that number of servers. It looks at /proc/cpuinfo to count the number, this is not very robust but should do for Amazon EC2 instances. At the beginning, it also downloads the latest version of the servers from the S3 store (as it is stored also on AWS, the transfer is quick and free). Write the script using your favourite text editor and put it in matlab/runservers.



#!/usr/bin/perl -w



system('wget https://s3.amazonaws.com/jasonfriedman.software/socket_server -O /home/ubuntu/matlab/socket_server');

system('wget https://s3.amazonaws.com/jasonfriedman.software/run_socket_server.sh -O /home/ubuntu/matlab/run_socket_server.sh');

system('chmod a+x /home/ubuntu/matlab/run_socket_server.sh /home/ubuntu/matlab/socket_server');

my $numCPUs = `cat /proc/cpuinfo | grep processor | wc -l `;

chomp($numCPUs);



print "There are $numCPUs CPUs\n";



# Now run an instance for each of the CPUs

for (my $i = 1; $i<= $numCPUs; $i++) {

        my $port = $i + 9000;

        system("/home/ubuntu/matlab/run_socket_server.sh /opt/MATLAB/MATLAB_Compiler_Runtime/v713/ $port &");

}

The next step is to make the instance run the Matlab servers by itself when it starts up. This will make it easy to start up many servers.

We do this by adding one line to /etc/rc.local, which is run on each reboot:
/home/ubuntu/matlab/runservers
(put it one line before the last line (exit 0))

The final step is to save the instance so that you don't need to go through all this installing each time. If you have used EBS, you can do it with the management console - just right click on the instance and select "Create Image (AMI)". Then next time, you can run this image as you left it (rather than running someone else's image). If not, then instructions on how to do it are here:


http://instantbadger.blogspot.com/2009/09/how-to-create-and-save-ami-image-from.html

This tutorial continues in Part 3.

Note: some of the instructions on this page were modified from:
http://robrohan.com/2009/01/30/saving-a-customised-linux-amazon-instance-ec2-and-s3/

Using Amazon EC2 to speed up matlab optimisation I: Writing a socket interface in Matlab to send / receive the commands

The aim in this tutorial (part 1 of 3) is to have a small program running in Matlab on your computer, which will send off requests to a program (compiled Matlab code) running on an Amazon EC2 server or servers.

I am using sockets to do the communication, and in Matlab I use the free msocket toolbox to do this (you will need to download it and add it to your matlab path). The benefit of this toolbox is that it allows you to send matlab variables between machines. In this case, I send a matlab structure containing the command, and the parameters.

The architecture I use is to have a single server running on every core of the target machine. The server will wait for a connection, once it receives one, it will run the desired program, and return the results.

The source code for the entire program can be found here:
socketserver.m. It will also require the program messagecodes.m

It specifies which port to listen on:



socket = mslisten(port);

Then there is a endless loop that waits for a connection



% Keep listening until a connection is received

sock = -1;

while sock == -1

sock = msaccept(socket,0.0000001);

drawnow;

end

Once a connection has been accepted, a confirmation is returned to the creator:



m.accepted = 1;

mssend(sock,m);

and another loop is started to wait to receive commands:



success = -1;

while success<0
   [received,success] = msrecv(sock,0.0000001);
   drawnow;
end

I use a "switch" command to execute the appropriate command (in this example,
there is only one, but there is no reason not to have multiple possible commands).

In this case, it is executing the "decompose" command (an optimisation program I have written).
After running, it sends back the result in the rv variable:



switch received.command

case {codes.decompose}

[time,vel,numsubmovements,method,algorithm] = deal(received.arguments{:});

[rv.best,rv.bestresult,rv.bestfitresult] = ... 

decompose(time,vel,numsubmovements,method,algorithm);

mssend(sock,rv);

Once a client has finished running the program, it can close the socket, which is dealt
with by the server as follows:



case {codes.closesocket}

msclose(sock);

break;

The break causes it to leave the innermost while loop, and wait again for a new connection.

In order to use this code on Amazon EC2 (without a license server), it is necessary to first compile it. You will need to have a license for the Matlab compiler (available on the
computer doing the compiling, but not on the one running the final program). Note that you will
need to compile this on a machine similar to the one you are planning on running it on
(e.g., I compiled mine on a 64-bit ubuntu machine). I installed ubuntu as a Virtualbox
image as I don't have a "real" ubuntu machine available.

Then from inside matlab, it is as simple as to run:



mcc -m socket_server

and matlab will compile it for you. If this is the first time using mcc, you may
have to answer some questions. Part 3 described how to upload this server to EC2.

Now for the client. The client has to connect to the server:



sock = msconnect(address,port);

It then can send commands to the server:



m.command = codes.decompose;

m.arguments{1} = time;

m.arguments{2} = vel;

m.arguments{3} = numsubmovements;

m.arguments{4} = method;

m.arguments{5} = algorithm;

success = mssend(sock,m);

It then needs to wait for a result:



[thisrv,success] = msrecv(sock);

Then the return value can be used as desired.

Part 2 continues by explaining how to setup Amazon EC2 to run the server component.

Part 3 of the tutorial will describe an automated way to run many servers and collect the results.

Using Amazon EC2 to speed up matlab optimisation

I run lots of optimisation programs in Matlab as part of my research. One major problem is that they can be very slow, especially if you have lots of variables. One solution is to use more computer hardware to run the procedure faster. The more the better. Amazon offer their EC2 service, which allows you basically to rent computers by the hour. So rather than running your Matlab software on one computer, you can rent a lot of computers say for a few hours or a day and get the optimisation run much, much quicker.

I received an academic research grant from Amazon (Thanks!) which consisted of $3500 credit for their AWS services (including EC2). Mathworks have published a "white paper" on how to use EC2 with Matlab, but it relies on having available licenses for the instances of Matlab running on the EC2 servers, and those servers being able to access your license server. Here at MACCS the license server has a limited number of licenses, and they are behind the university firewall, so there is no way for the EC2 instances to use them.

My solution was to use the Matlab compiler to compile the optimisation part of my work into a stand-alone component. Then, I will get my computer running matlab to connect to the computer(s) running on EC2, send them commands, and get the results. I chose to do this using a socket interface.

These tutorials, split into three parts, will explain the process I went through (mostly so that I can remember how to do it next time!):