parallelCanny(const Mat &_dx, const Mat &_dy, Mat &_map, std::deque<uchar*> &borderPeaksParallel, int _low, int _high, bool _L2gradient) : src(_dx), src2(_dy), map(_map), _borderPeaksParallel(borderPeaksParallel), low(_low), high(_high), aperture_size(0), L2gradient(_L2gradient) { #if CV_SIMD128 haveSIMD = hasSIMD128(); if(haveSIMD) _map.create(src.rows + 2, (int)alignSize((size_t)(src.cols + CV_MALLOC_SIMD128 + 1), CV_MALLOC_SIMD128), CV_8UC1); else #endif _map.create(src.rows + 2, src.cols + 2, CV_8UC1); map = _map; map.row(0).setTo(1); map.row(src.rows + 1).setTo(1); mapstep = map.cols; needGradient = false; cn = src.channels(); }
void FAST_t(InputArray _img, std::vector<KeyPoint>& keypoints, int threshold, bool nonmax_suppression) { Mat img = _img.getMat(); const int K = patternSize/2, N = patternSize + K + 1; int i, j, k, pixel[25]; makeOffsets(pixel, (int)img.step, patternSize); #if CV_SIMD128 const int quarterPatternSize = patternSize/4; v_uint8x16 delta = v_setall_u8(0x80), t = v_setall_u8((char)threshold), K16 = v_setall_u8((char)K); bool hasSimd = hasSIMD128(); #if CV_TRY_AVX2 Ptr<opt_AVX2::FAST_t_patternSize16_AVX2> fast_t_impl_avx2; if(CV_CPU_HAS_SUPPORT_AVX2) fast_t_impl_avx2 = opt_AVX2::FAST_t_patternSize16_AVX2::getImpl(img.cols, threshold, nonmax_suppression, pixel); #endif #endif keypoints.clear(); threshold = std::min(std::max(threshold, 0), 255); uchar threshold_tab[512]; for( i = -255; i <= 255; i++ ) threshold_tab[i+255] = (uchar)(i < -threshold ? 1 : i > threshold ? 2 : 0); AutoBuffer<uchar> _buf((img.cols+16)*3*(sizeof(int) + sizeof(uchar)) + 128); uchar* buf[3]; buf[0] = _buf.data(); buf[1] = buf[0] + img.cols; buf[2] = buf[1] + img.cols; int* cpbuf[3]; cpbuf[0] = (int*)alignPtr(buf[2] + img.cols, sizeof(int)) + 1; cpbuf[1] = cpbuf[0] + img.cols + 1; cpbuf[2] = cpbuf[1] + img.cols + 1; memset(buf[0], 0, img.cols*3); for(i = 3; i < img.rows-2; i++) { const uchar* ptr = img.ptr<uchar>(i) + 3; uchar* curr = buf[(i - 3)%3]; int* cornerpos = cpbuf[(i - 3)%3]; memset(curr, 0, img.cols); int ncorners = 0; if( i < img.rows - 3 ) { j = 3; #if CV_SIMD128 if( hasSimd ) { if( patternSize == 16 ) { #if CV_TRY_AVX2 if (fast_t_impl_avx2) fast_t_impl_avx2->process(j, ptr, curr, cornerpos, ncorners); #endif //vz if (j <= (img.cols - 27)) //it doesn't make sense using vectors for less than 8 elements { for (; j < img.cols - 16 - 3; j += 16, ptr += 16) { v_uint8x16 v = v_load(ptr); v_int8x16 v0 = v_reinterpret_as_s8((v + t) ^ delta); v_int8x16 v1 = v_reinterpret_as_s8((v - t) ^ delta); v_int8x16 x0 = v_reinterpret_as_s8(v_sub_wrap(v_load(ptr + pixel[0]), delta)); v_int8x16 x1 = v_reinterpret_as_s8(v_sub_wrap(v_load(ptr + pixel[quarterPatternSize]), delta)); v_int8x16 x2 = v_reinterpret_as_s8(v_sub_wrap(v_load(ptr + pixel[2*quarterPatternSize]), delta)); v_int8x16 x3 = v_reinterpret_as_s8(v_sub_wrap(v_load(ptr + pixel[3*quarterPatternSize]), delta)); v_int8x16 m0, m1; m0 = (v0 < x0) & (v0 < x1); m1 = (x0 < v1) & (x1 < v1); m0 = m0 | ((v0 < x1) & (v0 < x2)); m1 = m1 | ((x1 < v1) & (x2 < v1)); m0 = m0 | ((v0 < x2) & (v0 < x3)); m1 = m1 | ((x2 < v1) & (x3 < v1)); m0 = m0 | ((v0 < x3) & (v0 < x0)); m1 = m1 | ((x3 < v1) & (x0 < v1)); m0 = m0 | m1; int mask = v_signmask(m0); if( mask == 0 ) continue; if( (mask & 255) == 0 ) { j -= 8; ptr -= 8; continue; } v_int8x16 c0 = v_setzero_s8(); v_int8x16 c1 = v_setzero_s8(); v_uint8x16 max0 = v_setzero_u8(); v_uint8x16 max1 = v_setzero_u8(); for( k = 0; k < N; k++ ) { v_int8x16 x = v_reinterpret_as_s8(v_load((ptr + pixel[k])) ^ delta); m0 = v0 < x; m1 = x < v1; c0 = v_sub_wrap(c0, m0) & m0; c1 = v_sub_wrap(c1, m1) & m1; max0 = v_max(max0, v_reinterpret_as_u8(c0)); max1 = v_max(max1, v_reinterpret_as_u8(c1)); } max0 = v_max(max0, max1); int m = v_signmask(K16 < max0); for( k = 0; m > 0 && k < 16; k++, m >>= 1 ) { if(m & 1) { cornerpos[ncorners++] = j+k; if(nonmax_suppression) curr[j+k] = (uchar)cornerScore<patternSize>(ptr+k, pixel, threshold); } } } } } } #endif for( ; j < img.cols - 3; j++, ptr++ ) { int v = ptr[0]; const uchar* tab = &threshold_tab[0] - v + 255; int d = tab[ptr[pixel[0]]] | tab[ptr[pixel[8]]]; if( d == 0 ) continue; d &= tab[ptr[pixel[2]]] | tab[ptr[pixel[10]]]; d &= tab[ptr[pixel[4]]] | tab[ptr[pixel[12]]]; d &= tab[ptr[pixel[6]]] | tab[ptr[pixel[14]]]; if( d == 0 ) continue; d &= tab[ptr[pixel[1]]] | tab[ptr[pixel[9]]]; d &= tab[ptr[pixel[3]]] | tab[ptr[pixel[11]]]; d &= tab[ptr[pixel[5]]] | tab[ptr[pixel[13]]]; d &= tab[ptr[pixel[7]]] | tab[ptr[pixel[15]]]; if( d & 1 ) { int vt = v - threshold, count = 0; for( k = 0; k < N; k++ ) { int x = ptr[pixel[k]]; if(x < vt) { if( ++count > K ) { cornerpos[ncorners++] = j; if(nonmax_suppression) curr[j] = (uchar)cornerScore<patternSize>(ptr, pixel, threshold); break; } } else count = 0; } } if( d & 2 ) { int vt = v + threshold, count = 0; for( k = 0; k < N; k++ ) { int x = ptr[pixel[k]]; if(x > vt) { if( ++count > K ) { cornerpos[ncorners++] = j; if(nonmax_suppression) curr[j] = (uchar)cornerScore<patternSize>(ptr, pixel, threshold); break; } } else count = 0; } } } } cornerpos[-1] = ncorners; if( i == 3 ) continue; const uchar* prev = buf[(i - 4 + 3)%3]; const uchar* pprev = buf[(i - 5 + 3)%3]; cornerpos = cpbuf[(i - 4 + 3)%3]; ncorners = cornerpos[-1]; for( k = 0; k < ncorners; k++ ) { j = cornerpos[k]; int score = prev[j]; if( !nonmax_suppression || (score > prev[j+1] && score > prev[j-1] && score > pprev[j-1] && score > pprev[j] && score > pprev[j+1] && score > curr[j-1] && score > curr[j] && score > curr[j+1]) ) { keypoints.push_back(KeyPoint((float)j, (float)(i-1), 7.f, -1, (float)score)); } } }
void spatialGradient( InputArray _src, OutputArray _dx, OutputArray _dy, int ksize, int borderType ) { CV_INSTRUMENT_REGION() // Prepare InputArray src Mat src = _src.getMat(); CV_Assert( !src.empty() ); CV_Assert( src.type() == CV_8UC1 ); CV_Assert( borderType == BORDER_DEFAULT || borderType == BORDER_REPLICATE ); // Prepare OutputArrays dx, dy _dx.create( src.size(), CV_16SC1 ); _dy.create( src.size(), CV_16SC1 ); Mat dx = _dx.getMat(), dy = _dy.getMat(); // TODO: Allow for other kernel sizes CV_Assert(ksize == 3); // Get dimensions const int H = src.rows, W = src.cols; // Row, column indices int i = 0, j = 0; // Handle border types int i_top = 0, // Case for H == 1 && W == 1 && BORDER_REPLICATE i_bottom = H - 1, j_offl = 0, // j offset from 0th pixel to reach -1st pixel j_offr = 0; // j offset from W-1th pixel to reach Wth pixel if ( borderType == BORDER_DEFAULT ) // Equiv. to BORDER_REFLECT_101 { if ( H > 1 ) { i_top = 1; i_bottom = H - 2; } if ( W > 1 ) { j_offl = 1; j_offr = -1; } } // Pointer to row vectors uchar *p_src, *c_src, *n_src; // previous, current, next row short *c_dx, *c_dy; int i_start = 0; int j_start = 0; #if CV_SIMD128 && CV_SSE2 if(hasSIMD128()) { uchar *m_src; short *n_dx, *n_dy; // Characters in variable names have the following meanings: // u: unsigned char // s: signed int // // [row][column] // m: offset -1 // n: offset 0 // p: offset 1 // Example: umn is offset -1 in row and offset 0 in column for ( i = 0; i < H - 1; i += 2 ) { if ( i == 0 ) p_src = src.ptr<uchar>(i_top); else p_src = src.ptr<uchar>(i-1); c_src = src.ptr<uchar>(i); n_src = src.ptr<uchar>(i+1); if ( i == H - 2 ) m_src = src.ptr<uchar>(i_bottom); else m_src = src.ptr<uchar>(i+2); c_dx = dx.ptr<short>(i); c_dy = dy.ptr<short>(i); n_dx = dx.ptr<short>(i+1); n_dy = dy.ptr<short>(i+1); v_uint8x16 v_select_m = v_uint8x16(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xFF); // Process rest of columns 16-column chunks at a time for ( j = 1; j < W - 16; j += 16 ) { // Load top row for 3x3 Sobel filter v_uint8x16 v_um = v_load(&p_src[j-1]); v_uint8x16 v_up = v_load(&p_src[j+1]); // TODO: Replace _mm_slli_si128 with hal method v_uint8x16 v_un = v_select(v_select_m, v_uint8x16(_mm_slli_si128(v_up.val, 1)), v_uint8x16(_mm_srli_si128(v_um.val, 1))); v_uint16x8 v_um1, v_um2, v_un1, v_un2, v_up1, v_up2; v_expand(v_um, v_um1, v_um2); v_expand(v_un, v_un1, v_un2); v_expand(v_up, v_up1, v_up2); v_int16x8 v_s1m1 = v_reinterpret_as_s16(v_um1); v_int16x8 v_s1m2 = v_reinterpret_as_s16(v_um2); v_int16x8 v_s1n1 = v_reinterpret_as_s16(v_un1); v_int16x8 v_s1n2 = v_reinterpret_as_s16(v_un2); v_int16x8 v_s1p1 = v_reinterpret_as_s16(v_up1); v_int16x8 v_s1p2 = v_reinterpret_as_s16(v_up2); // Load second row for 3x3 Sobel filter v_um = v_load(&c_src[j-1]); v_up = v_load(&c_src[j+1]); // TODO: Replace _mm_slli_si128 with hal method v_un = v_select(v_select_m, v_uint8x16(_mm_slli_si128(v_up.val, 1)), v_uint8x16(_mm_srli_si128(v_um.val, 1))); v_expand(v_um, v_um1, v_um2); v_expand(v_un, v_un1, v_un2); v_expand(v_up, v_up1, v_up2); v_int16x8 v_s2m1 = v_reinterpret_as_s16(v_um1); v_int16x8 v_s2m2 = v_reinterpret_as_s16(v_um2); v_int16x8 v_s2n1 = v_reinterpret_as_s16(v_un1); v_int16x8 v_s2n2 = v_reinterpret_as_s16(v_un2); v_int16x8 v_s2p1 = v_reinterpret_as_s16(v_up1); v_int16x8 v_s2p2 = v_reinterpret_as_s16(v_up2); // Load third row for 3x3 Sobel filter v_um = v_load(&n_src[j-1]); v_up = v_load(&n_src[j+1]); // TODO: Replace _mm_slli_si128 with hal method v_un = v_select(v_select_m, v_uint8x16(_mm_slli_si128(v_up.val, 1)), v_uint8x16(_mm_srli_si128(v_um.val, 1))); v_expand(v_um, v_um1, v_um2); v_expand(v_un, v_un1, v_un2); v_expand(v_up, v_up1, v_up2); v_int16x8 v_s3m1 = v_reinterpret_as_s16(v_um1); v_int16x8 v_s3m2 = v_reinterpret_as_s16(v_um2); v_int16x8 v_s3n1 = v_reinterpret_as_s16(v_un1); v_int16x8 v_s3n2 = v_reinterpret_as_s16(v_un2); v_int16x8 v_s3p1 = v_reinterpret_as_s16(v_up1); v_int16x8 v_s3p2 = v_reinterpret_as_s16(v_up2); // dx & dy for rows 1, 2, 3 v_int16x8 v_sdx1, v_sdy1; spatialGradientKernel<v_int16x8>( v_sdx1, v_sdy1, v_s1m1, v_s1n1, v_s1p1, v_s2m1, v_s2p1, v_s3m1, v_s3n1, v_s3p1 ); v_int16x8 v_sdx2, v_sdy2; spatialGradientKernel<v_int16x8>( v_sdx2, v_sdy2, v_s1m2, v_s1n2, v_s1p2, v_s2m2, v_s2p2, v_s3m2, v_s3n2, v_s3p2 ); // Store v_store(&c_dx[j], v_sdx1); v_store(&c_dx[j+8], v_sdx2); v_store(&c_dy[j], v_sdy1); v_store(&c_dy[j+8], v_sdy2); // Load fourth row for 3x3 Sobel filter v_um = v_load(&m_src[j-1]); v_up = v_load(&m_src[j+1]); // TODO: Replace _mm_slli_si128 with hal method v_un = v_select(v_select_m, v_uint8x16(_mm_slli_si128(v_up.val, 1)), v_uint8x16(_mm_srli_si128(v_um.val, 1))); v_expand(v_um, v_um1, v_um2); v_expand(v_un, v_un1, v_un2); v_expand(v_up, v_up1, v_up2); v_int16x8 v_s4m1 = v_reinterpret_as_s16(v_um1); v_int16x8 v_s4m2 = v_reinterpret_as_s16(v_um2); v_int16x8 v_s4n1 = v_reinterpret_as_s16(v_un1); v_int16x8 v_s4n2 = v_reinterpret_as_s16(v_un2); v_int16x8 v_s4p1 = v_reinterpret_as_s16(v_up1); v_int16x8 v_s4p2 = v_reinterpret_as_s16(v_up2); // dx & dy for rows 2, 3, 4 spatialGradientKernel<v_int16x8>( v_sdx1, v_sdy1, v_s2m1, v_s2n1, v_s2p1, v_s3m1, v_s3p1, v_s4m1, v_s4n1, v_s4p1 ); spatialGradientKernel<v_int16x8>( v_sdx2, v_sdy2, v_s2m2, v_s2n2, v_s2p2, v_s3m2, v_s3p2, v_s4m2, v_s4n2, v_s4p2 ); // Store v_store(&n_dx[j], v_sdx1); v_store(&n_dx[j+8], v_sdx2); v_store(&n_dy[j], v_sdy1); v_store(&n_dy[j+8], v_sdy2); } } } i_start = i; j_start = j; #endif int j_p, j_n; uchar v00, v01, v02, v10, v11, v12, v20, v21, v22; for ( i = 0; i < H; i++ ) { if ( i == 0 ) p_src = src.ptr<uchar>(i_top); else p_src = src.ptr<uchar>(i-1); c_src = src.ptr<uchar>(i); if ( i == H - 1 ) n_src = src.ptr<uchar>(i_bottom); else n_src = src.ptr<uchar>(i+1); c_dx = dx.ptr<short>(i); c_dy = dy.ptr<short>(i); // Process left-most column j = 0; j_p = j + j_offl; j_n = 1; if ( j_n >= W ) j_n = j + j_offr; v00 = p_src[j_p]; v01 = p_src[j]; v02 = p_src[j_n]; v10 = c_src[j_p]; v11 = c_src[j]; v12 = c_src[j_n]; v20 = n_src[j_p]; v21 = n_src[j]; v22 = n_src[j_n]; spatialGradientKernel<short>( c_dx[0], c_dy[0], v00, v01, v02, v10, v12, v20, v21, v22 ); v00 = v01; v10 = v11; v20 = v21; v01 = v02; v11 = v12; v21 = v22; // Process middle columns j = i >= i_start ? 1 : j_start; j_p = j - 1; v00 = p_src[j_p]; v01 = p_src[j]; v10 = c_src[j_p]; v11 = c_src[j]; v20 = n_src[j_p]; v21 = n_src[j]; for ( ; j < W - 1; j++ ) { // Get values for next column j_n = j + 1; v02 = p_src[j_n]; v12 = c_src[j_n]; v22 = n_src[j_n]; spatialGradientKernel<short>( c_dx[j], c_dy[j], v00, v01, v02, v10, v12, v20, v21, v22 ); // Move values back one column for next iteration v00 = v01; v10 = v11; v20 = v21; v01 = v02; v11 = v12; v21 = v22; } // Process right-most column if ( j < W ) { j_n = j + j_offr; v02 = p_src[j_n]; v12 = c_src[j_n]; v22 = n_src[j_n]; spatialGradientKernel<short>( c_dx[j], c_dy[j], v00, v01, v02, v10, v12, v20, v21, v22 ); } } }