Wednesday, April 14, 2010

A naive GPU accelaration way and results

In the CPU algorithm, the key calculation step can be seen as 3 loops:

Loop 1 ( N times, where N is the number of pixels in the out put picture)
{
     Loop 2 ( M times, where M is the number of pixels the input texture have)
    {
         Loop3 (c times, where c is the size of the neighbor)
        {
            simple instructions of multiplication and addition. (O(1))
        }
    }
}

Considered that the loop2 is a full search to find the nearest neighbor, I simply drag all the loop into GPU. I save the input texture patch into the device memory, and calculate the distances betweent a specific output pixel's neighbor and the pixels' neighbors in input texture parallelly.

The performance has been improved obviously as following table:


 We can see GPU rocks from the performance comparison table, but it's still slow. That result pushes me to look for a better GPU method to accelarate.

No comments:

Post a Comment