Archive for April, 2010

h1

Failing Bourne/Bash line continuations

April 8, 2010

When writing any code to keep the code width reasonable it is necessary to use line continuations. The unix shell is not my favourite programming environment, however,  I wanted to wrap an extremely long line from a colleagues script.

For example

#!/bin/bash
# Without line continuation
my extremely long code line featuring many long file names and directories
# With line continuation
my extremely long code line featuring\
many long file names and directories

This seems relatively simple, however, I ran into some difficulty. One error was simple, the line continuation sequence is \ followed by a EOL character. It is easy to forget to white spaces between the \ and the EOL character.

The second problem is one many resources fail to mention. The newline character must be a UNIX EOL character (see Newline). Unfortunately the script had been first edited under windows. This resulted in a failure to recognise the combination as a line continuation.

Solution:
Run script through dos2unix or similar tool.

Advertisements
h1

Binary data and Python: Just use NumPy!

April 6, 2010

To post-process some CFD data, I have manipulated binary files generated for Tecplot with Python. The challenge here is how to import large vectors of binary numbers into NumPy ndarrays while processing binary metadata.

This task appears straightforward as shown below:

import numpy as np
import struct

file_in = 'strange_binary_format.dat'
fd = open(file_in,'rb')
# buffer data -- this is bulky
buffer = fd.read()
# read 1000 doubles from the buffer from byte "position" forward
position = 0
no_of_doubles = 1000
read_format = str(no_of_doubles) + 'd'
read_size = struct.calcsize(read_format)
# put data into numpy arrays . . this is very slow and memory intensive
# might be due to struct.unpack returning as a tuple of floats
numpy_data =np.array(struct.unpack(read_format, buffer[position:
            (position + read_size)]))

However, this method is inefficient (and possibly cause memory leaks!). The struct.unpack function returns a tuple with 1000 individual floats resulting in significant overhead. The result is both memory intensive and slow. A later attempt is shown below:

import numpy as np
import struct

file_in = 'strange_binary_format.dat'
fd = open(file_in,'rb')
position = 0
no_of_doubles = 1000
# move to position in file
fd.seek(position,0)

# straight to numpy data (no buffering) 
numpy_data = np.fromfile(fd, dtype = np.dtype('d'), count = no_of_doubles)

The NumPy function fromfile is significantly more efficient in terms of  both time and memory.

From this experience I have a rule for numerically intensive computing with Python: NumPy / SciPy functions will almost always be faster!