Skip to content

Change compareSubImg_thr_sse2 to use the intrinsic method _mm_sad_epu8#99

Open
gabilan wants to merge 1 commit into
georgmartius:masterfrom
gabilan:abs_opt
Open

Change compareSubImg_thr_sse2 to use the intrinsic method _mm_sad_epu8#99
gabilan wants to merge 1 commit into
georgmartius:masterfrom
gabilan:abs_opt

Conversation

@gabilan
Copy link
Copy Markdown
Contributor

@gabilan gabilan commented Nov 9, 2020

Change for how SAD (Sum of Absolute Differences) is computed in the SSE optimized path. There is a built-in intrinsic for computing SAD (PSADBW). By switching to the intrinsic, we see improvement in motion estimation/detection performance by 2x. All current tests pass -- maybe more tests are needed.

…8 for computing SAD. Improves motion estimation/detection performance by 2x.
@georgmartius
Copy link
Copy Markdown
Owner

Hi,
wow, this is interesting.
We really need to make sure this is correct. It looks fantastic if we can reduce the code to that small amount. Can you create more tests where the corner cases are tested, for instance different field sizes. I just want to make sure it really does the right thing.

@gabilan
Copy link
Copy Markdown
Contributor Author

gabilan commented Nov 9, 2020

I don't think it's necessary to explore field size testing because the field size is forced to be a multiple of 16. The code below is in the two places where field->size is determined. I am a bit concerned about test coverage -- but only because this method accounts for the majority of CPU time.

#if defined(USE_SSE2) || defined(USE_SSE2_ASM)
  fieldSize     = (fieldSize / 16 + 1) * 16;
  fieldSizeFine = (fieldSizeFine / 16 + 1) * 16;
#endif

@georgmartius
Copy link
Copy Markdown
Owner

Let me check the code myself. Give me a few days, I am very busy at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants