Wednesday, April 14, 2010

Get Stuck in constructing K-D tree in GPU

I'm still working on that...

Keep going!

A naive GPU accelaration way and results

In the CPU algorithm, the key calculation step can be seen as 3 loops:

Loop 1 ( N times, where N is the number of pixels in the out put picture)
{
     Loop 2 ( M times, where M is the number of pixels the input texture have)
    {
         Loop3 (c times, where c is the size of the neighbor)
        {
            simple instructions of multiplication and addition. (O(1))
        }
    }
}

Considered that the loop2 is a full search to find the nearest neighbor, I simply drag all the loop into GPU. I save the input texture patch into the device memory, and calculate the distances betweent a specific output pixel's neighbor and the pixels' neighbors in input texture parallelly.

The performance has been improved obviously as following table:


 We can see GPU rocks from the performance comparison table, but it's still slow. That result pushes me to look for a better GPU method to accelarate.

CPU synthesis method results

I have implemented the CPU method (most naive one, single solution, full search mode) and have some synthesis result as following:
And the results seems to be pretty nice. The noise like pixels on the first several rows and columns are caused by the noise initialization.

Also, I got some bad results and analyzed why they look bad.
 
This one looks terrible because I chose a wrong size of L-shape neighborhood when I was synthesizing. The proper size should be chosen according to the features of the texture. Take the bad result as an example, the distance between to centers of the circles is about 26 pixels, and the neighbor size I chose is 15, at times it chose a pure black neighborhood as the best matched one, and it never converges. In the "good results" figures, I chose the neighborhood size as 30, and it gave me a satisfying result.













This one looks disgusting, and what's surprising me is it used the same parameters as the one in the "good results" figures. I figured it's influenced by the initialize noise, some noise make it not converge, maybe I can fix this by using another initialize mode, I will do it in parallel with implementing the GPU methods.
 
 
 
 
 
 
 
 

Something about openCV

OpenCV (Open Source Computer Vision) is a library of programming functions for real time computer vision.  I wanna use some data structures and methods in it to make the program simpler to write.

OpenCV is released under a BSD license, it is free for both academic and commercial use. The library has >500 optimized algorithms (see figure below). It is used around the world, has >2M downloads and >40K people in the user group.

In case I forget, I move the installation & configuration guild here:


  1. Create a temporary directory, which we denote as <cmake_binary_dir>, where you want to put the generated Makefiles, VisualStudio, Xcode or Eclipse etc. project files as well the object files and output binaries. You can do it using CMake GUI.

    1. If you use CMake GUI, execute "Configure" to do the initial configuration, then adjust any options, then press "Configure" again and then press "Generate". For the best performance, and if your compiler and the platform supports it, it is recommended to turn on SSE2 optimization (on Mac and Windows it is turned on by default) and OpenMP support. Also, if you want to build Python wrappers, samples or the reference manual in PDF, you should explicitly turn it on.

    2. If you are using command line, enter the <cmake_binary_dir> and type

      • cmake [<some optional parameters...>] <path to the OpenCV source directory>
        For example, if you downloaded the project to ~/projects/opencv, you can do the following:
        cd ~/projects/opencv # the directory containing INSTALL, CMakeLists.txt etc.
            mkdir release
            cd release
            cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D BUILD_PYTHON_SUPPORT=ON ..
        
      • that will generate makefiles for the Release configuration and will put them in ~/projects/opencv/release directory.
      • Another example for Windows users (assuming the .exe extracted files to C:\OpenCV2.1\)
        cd C:\OpenCV2.1 # the directory containing INSTALL, CMakeLists.txt etc.
            mkdir release
            cd release
            cmake -D:CMAKE_BUILD_TYPE=RELEASE C:\OpenCV2.1

      • Note that the use of the colon after the -D is required on Windows Vista (include if cmake is giving errors on other Windows distro's as well). If you are using MS Visual Studio cmake exits with an error message "error PRJ0003: Error spawning 'cmd.exe', make sure that you have the correct VC++ Executable Directories set.

  2. Using TBB. If you have TBB installed (see the Prerequisites), turn on USE_TBB option, adjust the TBB include and library paths if needed. You should see the following message in the CMake output:

    • USE TBB:  YES

  3. Using IPP. If you have IPP installed on your machine and want to build OpenCV with IPP support, check if IPP is turned on in CMake configuration options and if it was detected. First, look at the CMake output. If it prints

    • USE IPP:   NO

    • It means that IPP was not turned on or it was not detected. In this case turn it on (USE_IPP=ON) and pass the correct path to IPP shared libraries (IPP_PATH=<...>), like in the example below. While OpenCV currently uses static IPP libraries, it derives their path from the supplied path to the shared/dynamic IPP libraries.
      cmake -D:CMAKE_BUILD_TYPE=RELEASE -D:USE_IPP=ON -D:IPP_PATH="C:\Program Files\Intel\IPP\6.1.031\ia32\bin" C:\OpenCV2.1
    • (It's also easy to do the same using CMake GUI)
    • If you did everything right, you will see the following in the CMake output:
      USE IPP:   <ipp_path>
    • If there are multiple IPP versions installed on the machine (not necessarily all of them are in the system path) and you want to use the particular one, different from what CMake has found, just specify the correct IPP_PATH.


    1. If you generated project files for VisualStudio, Xcode, Eclipse etc., run the respective IDE, open the OpenCV top-level project/solution/workspace etc. and build it.
    2. If you generated makefiles, then enter the created temporary directory and run make/nmake utility. Then you can optionally run "sudo make install" (Linux, MacOSX).
  4. If you did not run "make install", you should let your system know where to find the generated libraries.

    1. In Windows you should add <cmake_binary_dir>/bin/debug and <cmake_binary_dir>/bin/release

      • to the system path (My Computer--[Right button click]-->Properties->Advanced->Environment Variables->Path).

    2. In Linux you should add <cmake_binary_dir>/lib[/debug|/release] to /etc/ld.so.conf or to LD_LIBRARY_PATH, e.g.:

      • export LD_LIBRARY_PATH=~/projects/opencv/release/lib:$LD_LIBRARY_PATH
            sudo ldconfig

    3. In MacOSX you should add <cmake_binary_dir>/lib[/debug|/release] to DYLD_LIBRARY_PATH.
  5. Note that the step is not needed to run OpenCV samples, because:
    1. on Windows both OpenCV DLLs and the samples are placed into the same directory
    2. on Linux/MacOSX CMake embeds the correct library paths into the executables




Introduce the main algorithm I will use

The main algorithm I will use refers to this paper

<Fast texture synthesis using tree-structured vector quantization
Li-Yi Wei, Marc Levoy (SIGGRAPH '00)>

The reasons why I'd like to use this algorithm are as following:

1. This algorithm is a general one which can deal with most of the input texture patches.
2. It obtains nice results.
3. It's simple to implement. (LOL)
4. The speed of this algorithm is really slow and has potential to improve.

The algorithm can be illustrated using the following figure.


(a) is the input texture and (b)-(d) show different synthesis stages of the output image. Pixels in the output image are assigned in a raster scan ordering. The value of each output pixel p is determined by comparing its spatial neighborhood with all neighborhoods in the input texture. The input pixel with the most similar neighborhood will be assigned to the corresponding output pixel. Neighborhoods crossing the output image boundaries (shown in (b) and (d)) are handled toroidally. Although the output image starts as a random noise, only the last few rows and columns of the noise are actually used. For clarity, we present the unused noise pixels as black. (b) synthesizing the first pixel, (c) synthesizing the middle pixel, (d) synthesizing the last pixel.

Here's the pseudo code of the algorithm.

Research timeline

There are 6 weeks from Mar 19th. to Apr 30th.

My schedule will follow the following Gantt Chart:







Project Milestones

1. Completed all background reading (Before Mar 27th)
2. Proposed the CPU Software framework is functioning (Before April 10th.)
3. Proposed the GPU Software framework is functioning (Before April 17th.)
4. Simple base case example works to prove proposed soft-ware is functioning well (Before April 24th)
5. Collect all necessary data and images (Before April 30th)

Project Final Deliverables

1. A decent implementation of the classic texture synthesis algorithm working in CPU.
2. A GPU accelerated texture synthesis program.
3. A report includes the comparison of CPU & GPU results.

Project Future Tasks

If I had 6 extra months to go with this project, I will:

1. Extend this texturing method into 3D textures;
2. Research on some other texture synthesis methods like Procedural noise using sparse Gabor convolution;
3. Compare the pro’s and con’s between GPU accelerated L-shape neighbor matching algorithm and other texture synthesis algorithms.

Project Proposal

The goal of this project is to accelerate an classic algorithm in GPU. The analysis model and sampling method will be unchanged, the only changed place is in the core function in the algorithm which is called FindBestMatch.

Anticipated Approach

Implement the Wei-Levoy’s L-shape neighbor matching algorithm[WM00]. This algorithm can be divided into 3 phases: Phase1: initialize the output texture with noise. Phase2: For each unsynthesized pixel in the output texture, make a L-shape neighbor for it, compare this neigh-bor with every neighbor of the pixels in the input texture. Phase3: Pick the best matched pixel in the input texture to the output texture.

We can accelerate it in phase1 and phase2: phase1: we can use a quick noise function in GPU, in-stead of using the CPU initialize method. Although this approach won’t affect the synthesis speed that much. phase2: we can set up a K-D tree of all input pixels’ neighbors in GPU.[ZHWG08], and compared the L-shape neighbors parallel.

Complete both the algorithm and its GPU acceleration method, compare the results.

Target Platforms

Operating System: Microsoft Windows;
Hardware requirement: NVIDIA Graphics card with CUDA supported.

Evaluation Criteria

Suppose I have implemented my approach, and its fully functioning. I will set up a table to compare the synthesis speed of the original CPU implementation and GPU acce-leration which is simple and nature. The synthesis results of the two methods are supposed to be the same, but if there are anything different, I will point out to show the different output texture here and try to explain.

Reference

[WM00] WEI, L.-Y., AND LEVOY, M. 2000. Fast texture synthesis using tree-structured vector quantization. Proceedings of SIGGRAPH 2000 (July), 479–488. ISBN 1-58113-208-5.

[ZHWG08] Kun Zhou , Qiming Hou , Rui Wang , Baining Guo, Real-time KD-tree construc-tion on graphics hardware, ACM SIGGRAPH Asia 2008 papers, December 10-13, 2008, Singapore