Archive for the ‘Programming’ Category


VLSCI news (my summer internship)

January 10, 2011

The Victorian Life Sciences Computation Initiative (VLSCI) is a Victorian state government-funded organisation providing grants of computer time for high performance life sciences computing.  I have been fortunate enough to win a summer research internship with the partner  IBM research collaboratory.

More soon.


Charm/Charm++ programming language

December 28, 2010

I have recently started developing a numerical solver using the Charm++ package. Charm++ is designed to abstract parallel computing from the typical MPI paradigm. The programming language is based on asynchronous communication and the construction of multiple objects per processor core (‘chares’). The system implements dynamic load balancing via the migration of chares across processors.

The paradigm itself is efficient and powers some applications which scale to extreme levels (e.g. NAMD & OpenAtom) and has a number of very useful features (visualisation, profiling & debugging), however, it may be a mess to start with for the developer.

The manual on the charm++ website is a great start along with the examples within the charm source directory. Additional resources include the webcasts of tutorial lectures. However the language desperately needs a clear API to support development using charm++.

An API will make development easier especially when using associated libraries and their limitations (i.e. C++ STL support and Barrier support).

More to come as I get further along.


mpi4py parallel IO example

September 23, 2010

For about 9 months I have  been running python jobs in parallel using mpi4py and NumPy. I had to write a new algorithm with MPI  so I decided to do the IO in parallel. Below is a small example of reading data in parallel. Mpi4py is lacking examples. It is not pretty, however, it does work.

import mpi4py.MPI as MPI
import numpy as np
class Particle_parallel():
    """ Particle_parallel - distributed reading of x-y-z coordinates.

    Designed to split the vectors as evenly as possible except for rounding
    ont the last processor.

    File format:
        32bit int :Data dimensions which should = 3
        32bit int :n_particles
        64bit float (n_particles) : x-coordinates
        64bit float (n_particles) : y-coordinates
        64bit float (n_particles) : z-coordinates
    def __init__(self, file_name,comm):
        self.comm = comm
        self.rank = self.comm.Get_rank()
        self.size = self.comm.Get_size()
        self.data_type_size = 8
        self.mpi_file = MPI.File.Open(self.comm, file_name)
        self.data_dim = np.zeros(1, dtype = np.dtype('i4'))
        self.n_particles = np.zeros(1, dtype = np.dtype('i4'))
        self.file_name = file_name
        self.debug = True

    def info(self):
        """ Distrubute the required information for reading to all ranks.

        Every rank must run this funciton.
        Each machine needs data_dim and n_particles.
        # get info on all machines
        self.mpi_file.Read_all([self.data_dim, MPI.INT])
        self.mpi_file.Read_all([self.n_particles, MPI.INT])
        self.data_start = self.mpi_file.Get_position()
    def read(self):
        """ Read data and return the processors part of the coordinates to:
        assert self.data_dim != 0
        # First establish rank's vector sizes
        default_size = np.ceil(self.n_particles / self.size)
        # Rounding errors here should not be a problem unless
        # default size is very small
        end_size = self.n_particles - (default_size * (self.size - 1))
        assert end_size >= 1
        if (self.rank == (self.size - 1)):
            self.proc_vector_size = end_size
            self.proc_vector_size = default_size
        # Create individual processor pointers
        x_start = int(self.data_start + self.rank * default_size *
        y_start = int(self.data_start + self.rank * default_size *
                self.data_type_size +  self.n_particles *
                self.data_type_size * 1)
        z_start = int(self.data_start + self.rank * default_size *
                self.data_type_size + self.n_particles *
                self.data_type_size * 2)
        self.x_proc = np.zeros(self.proc_vector_size)
        self.y_proc = np.zeros(self.proc_vector_size)
        self.z_proc = np.zeros(self.proc_vector_size)
        # Seek to x
        if self.debug:
            print 'MPI Read'
        self.mpi_file.Read([self.x_proc, MPI.DOUBLE])
        if self.rank:
            print 'MPI Read done'
        self.mpi_file.Read([self.y_proc, MPI.DOUBLE])
        self.mpi_file.Read([self.z_proc, MPI.DOUBLE])
        return self.x_proc, self.y_proc, self.z_proc
    def Close(self):

Compiling OpenFOAM 1.7.x for OS X 10.6

September 16, 2010

This is a short guide for installing the Developer version of OpenFOAM for Snow Leopard. I have tried to include all details.

What you need:

A mac with OS 10.6 and  approximately 10 GB of HD space.

Preliminary steps:

Install the OS X developer tools

Install GCC 4.3 , 4.4 or 4.5 from either Macports or Fink. (4.5 will only work with 1.7.x) and git. I will presume 4.5

Once this you may start building OpenFOAM. The Mac PS file system is not file sensitive by default. Therefore you need to make a case-sensitive disk image for OpenFOAM.

Open Disk Utility (/Applications/Utilities/Disk Utility)

Menu > File > New > Blank disk image ..

You may save the image wherever you want, however, name the image ‘OpenFOAM’.

Using the drop box change the format to ‘Mac OS Extended (Case sensitive)’

Change the size to at least 5 GB. you may increase this later if required.

Create the image and close Disk Utility.

To keep the installation nice and clean we are going to mount the image at $HOME/OpenFOAM

This is the default OpenFOAM install site which will make your life easier in the long run.

To do this add the following to your .bashrc file (if you don’t have one you will need to create one):

hdiutil attach "/path/to/your/disk_image.dmg" -mountpoint "$HOME/OpenFOAM" > /dev/null

This will mount the image when your first open the terminal from now on.
Also add the following which sources the OpenFOAM bash files. This will create errors until you download OpenFOAM.

. $HOME/OpenFOAM/OpenFOAM-1.7.x/etc/bashrc

After you have saved your .bashrc file open a new window in the terminal. You will get the following error:

-bash: /Users/yourusername/OpenFOAM/OpenFOAM-1.7.x/etc/bashrc: No such file or directory

Download the following files and move them to $Home/OpenFOAM;
The 1.7.1 Third party software pack .
The openFOAM 1.7.x patch by Bernhard Gschaider.

The third party patch Bernhard Gschaider.

Check the thread for any updates.

Move these files to $HOME/OpenFOAM. Now we need to edit OpenFOAM-1.7.x-Mac_v2.patch. Open up the file in a text editor and check that the versions of gcc / g++ match what you have installed.
If you installed from macports you will have gcc-mp-4.5 and g++-mp-4.5 Whereas from fink it is gcc-fsf-4.5 and g++-fsf-4.5. Search through the file for ‘-mp-‘ and make sure the version and distribution strings match what you have installed.

At the terminal execute the following:

git clone git://
tar -xfz ThirdParty-1.7.1.gtgz
mv ThirdParty-1.7.1 ThirdParty-1.7.x
cd ./ThirdParty-1.7.x
patch -p1 <../ThirdParty-1.7-Mac.patch
cd ../OpenFOAM.1.7.x
patch -p1 <../OpenFOAM-1.7.x-Mac_v2.patch
. $HOME/OpenFOAM/OpenFOAM-1.7.x/etc/bashrc

This should give you a working OpenFOAM distributions with a few exceptions:
foamToTec360 does not work
parafoam does not work. To address this do the following:
Download and install the Paraview application.
In your case directories ‘touch’ the foam file.
i.e. in a case called ‘isofoam_case’

touch isofoam_case.foam

Open this file with the binary install of Paraview.



Python and VTK

September 8, 2010

I recently have been working on moving data gathered in vitro as the geometric basis for some computational fluid dynamics (CFD). simulations I am running. The simulations are solved using openFOAM, therefore, I import the geometry as a series of .STL files.

The idea is that the data provided to me will be able to describe where the solid–fluid boundary is. From this I should be able to generate an .STL surface. The most reliable way (I have observed) to do this, given the data I am provided, is to generate a volume such that solid and fluid phases are distinguishable. This allows a iso-surface (and from this an .STL) to be generated.

I can employ Tecplot or Paraview to do this assuming I have an appropriate data file. Rather than painstakingly duplicate the VTK data format IO for paraview I decided to use the VTK python bindings and generate the files, and later the contours, myself.

VTK is an excellent tool. The python bindings are comprehensive and despite the package size I managed to get things moving without too much trouble. The interface to NumPy arrays allows it to interface nicely with any python based calculations I had. The errors messages were informative and the Doxygen documentation has decent descriptions for many classes. All of the classes even have help available in the interpreter. This is somewhat hidden (you need to use dir to get the available functions and ask for help for each of them individually).

The downsides: Python examples are sparse compared with C++ / tcl and some of the classes have very similar functions with slightly unpredictable behaviour i.e.

# vtk_data is of type vtk.vtkImageData()

im_FFT = vtk.vtkImageFFT()



The examples (and to some extent the book) presume that you have a compatible data file to start with. There are no examples of how to bring in large quantities of data from another part of a program.

Recommendations: Get yourself a copy of the vtk book for the first few days of working with VTK. It introduces concepts in a straightforward manner and increased my understanding substantially. After you are familiar with VTK it is not required.

Next project: Tecplot (.plt) to .vt* converter.. I have a very limited version working, however, it requires work to be robust.


Binary data and Python: Just use NumPy!

April 6, 2010

To post-process some CFD data, I have manipulated binary files generated for Tecplot with Python. The challenge here is how to import large vectors of binary numbers into NumPy ndarrays while processing binary metadata.

This task appears straightforward as shown below:

import numpy as np
import struct

file_in = 'strange_binary_format.dat'
fd = open(file_in,'rb')
# buffer data -- this is bulky
buffer =
# read 1000 doubles from the buffer from byte "position" forward
position = 0
no_of_doubles = 1000
read_format = str(no_of_doubles) + 'd'
read_size = struct.calcsize(read_format)
# put data into numpy arrays . . this is very slow and memory intensive
# might be due to struct.unpack returning as a tuple of floats
numpy_data =np.array(struct.unpack(read_format, buffer[position:
            (position + read_size)]))

However, this method is inefficient (and possibly cause memory leaks!). The struct.unpack function returns a tuple with 1000 individual floats resulting in significant overhead. The result is both memory intensive and slow. A later attempt is shown below:

import numpy as np
import struct

file_in = 'strange_binary_format.dat'
fd = open(file_in,'rb')
position = 0
no_of_doubles = 1000
# move to position in file,0)

# straight to numpy data (no buffering) 
numpy_data = np.fromfile(fd, dtype = np.dtype('d'), count = no_of_doubles)

The NumPy function fromfile is significantly more efficient in terms of  both time and memory.

From this experience I have a rule for numerically intensive computing with Python: NumPy / SciPy functions will almost always be faster!


Python for scientific computing

March 27, 2010

I am currently undertaking a Ph.D. where i am researching blood flow using an academic computational fluid dynamics (CFD) code (Viper). Like many numerical investigations the pre-processing and post-processing is as important as the algorithm itself. Up until late early this year MATLAB was my language of choice for processing data, however, I have recently embraced Python (particularly NumPy) as a fantastic alternative.

Matlab is a nice clean language for dealing with vectors, however, even with the alternative of octave the vendor lock-in becomes terrible when you want to scale up your code for production runs on clusters. Python in contrast is completely free and scientific computation implemented by several  fantastic open source packages such as SciPy and  NumPy which easily duplicate the core functionality of Matlab in a pythonic, object orientated fashion. Additionally unlike Matlab the overhead of starting an interpreter is small and Python is almost universally available on *nix systems.

Yet the numerical capacity of Python is not the primary reason I have come to like the language. The core features of the language such as:

  • Clean code and indentation based control
  • Fantastic support for file I/O
  • Huge standard library including integrated support for debugging
  • A fantastic community with plenty of practical resources and superb documentation

All said the transition (despite some moments of pain) has sharpened my programming and extended my abilities appreciably (including writing my first decent MPI code).