Tuesday, June 24, 2008

Python and CUDA

CUDA is Nvidia's api for leveraging the power of the GPU for parallel processing. The cuda api is in C and can be daunting to use. The following how to shows how to use PyCuda to access this powerful API from your python code.

First install PyCuda. You can fetch the latest package from http://pypi.python.org/pypi/pycuda.

Before you can use Cuda you must initialize the device the same way as you would in your C program.


import pycuda.driver as pycuda

pycuda.init()
assert cuda.Device.count() >= 1

cudaDev = cuda.Device(0)
cudaCTX = dev.make_context()


For a cuda program the basic methodolgy is to copy from system memory to devices memory, perform processing, then copy data back from the device to the system. PyCuda provides facilities to do this.

First let's create a numpy array of data that we wish to transfer:

import numpy
a = numpy.random.randn(4,4)
a = a.astype(numpy.float32)
a_gpu = cuda.mem_alloc(a.size * a.dtype.itemsize)
pycuda.memcpy_htod(a_gpu, a)



We now have our data on the device, we need to instruct the GPU to execute our Kernel. A Kernel, when talking about CUDA, is the actual code that will be executed on the GPU. PyCuda requires that you write the kernel in C and pass it to the device.

For example here is a kernel that adds one to the value of each element.


mod = cuda.SourceModule("""
__global__ void addOne(float *a)
{
int idx = threadIdx.x + threadIdx.y*4;
a[idx]+= 1;
}
""")



Now tell the device to execute our kernel.

func = mod.get_function("addOne")
func(a_gpu, block=(4,4,1))


Lastly we copy the contents from the device back to system memory and print the results.


a_addOne = numpy.empty_like(a)
pycuda.memcpy_dtoh(a_doubled, a_gpu)
print a_doubled
print a




Other Resources:

1 comment:

click here said...

Finally I have found something which helped me. Thank you!

visit us