Making a digital human usually needs lots of image processing. Size of image is normally greater than 3000 x 5000 and around 10 images of these, combining takes few seconds or minutes.
When I implement something I usually implement logic as quickly as possible and then optimize it. I checked my image processing logic took around 2 seconds.
for ( int y = 0; y < height; ++y ) {
for ( int x = 0; x < width; ++x ) {
{
// processing logic here
}
}
<<Traditional 2D image processing driver code>>
my processing logic was simple, just grab pixel from different textures and then combine it and put it on some specific texture. Processing logic could be parallelized. so I changed it and profiled.
Original logic took 2 seconds and new parallelized logic took 0.3 seconds which is 7 times faster!
How to change
first of you should include ppl.h.
#include <ppl.h>
parallel_for is the function what we are going to use.
parallel_for(0, 100, [](int value) {
// processing logic here, order of value is not determined. this block is called in parallel
});
lambda is called 100 times and order is not determined. like we saw in the above code blocks. Traditional 2D image processing takes x, y for accessing pixel. but parallel_for takes 1D value. we can use mapping function. (this is also traditional converting logic in gaming industry)
const int linearCount = height * width;
parallel_for(0, linearCount, [&](int linearIndex) {
int y = linearIndex / width;
int x = linearIndex % width;
// processing logic here.
} );
using / and % operator, we can get x, y from 1D value. and whole logic can be parallelized.
Try and apply it to your project!
This is very simple and easy way to improve performance :)
Another tip
Recently performance counting in c++ is really easy.
auto t1 = std::chrono::high_resolution_clock::now();
// logic
auto t2 = std::chrono::high_resolution_clock::now();
auto t = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
No comments:
Post a Comment