GitHub - jialinding/ee180lab2

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
Makefile		Makefile
README		README
baxter.avi		baxter.avi
main.cpp		main.cpp
pc.cpp		pc.cpp
pc.h		pc.h
sobel_alg.h		sobel_alg.h
sobel_calc.cpp		sobel_calc.cpp
sobel_calc_intrinsics.cpp		sobel_calc_intrinsics.cpp
sobel_mt.cpp		sobel_mt.cpp
sobel_st.cpp		sobel_st.cpp

Repository files navigation

David Pan (napdivad), Jialin Ding (jding09)
group18

Optimizations:
1) We added compiler flags to the Makefile in order to encourage auto-vectorization. In particular, the flags we added were ‘-mfpu=neon -O3 -ftree-vectorize -funsafe-math-optimizations -c’. As a result, the single-threaded implementation sped up to around 20 fps.

2) We restructured the code to encourage the compiler to vectorize. This was primarily implemented in the grayScale and sobelCalc functions. In particular, we used const local variables as the end conditions for our loops, which allows the compiler to determine the length of our loop. We also gave the compiler information about the size of our loops by adding ‘& ~3’ to the end condition, which tells the compiler that our loop length is divisible by 4. We also used local arrays in our key loops instead of the Mat inputs, which for mysterious reasons also sped up the fps. We hypothesize that this is because the compiler is more familiar with local arrays, and therefore is able to vectorize local arrays more easily than it can vectorize objects from OpenCV like Mat. All key loops were vectorized except for the loop that calculated the x-convolution in sobelCalc. It was unclear what was preventing that loop from vectorizing, since it was structurally very similar to the loop that calculated the y-convolution. We tried reordering the additions and multiplications involved in the calculations of each loop, but in the end we were unable to determine the cause of this discrepancy. Due to these changes, the single-threaded implementation sped up to 39 fps.

3) We tried using intrinsics as well on grayScale and sobelCalc. Our attempt can be seen in the attached file sobel_calc_intrinsics.cpp. However, this did not result in a significant increase in fps, likely due to poor implementation. We could’ve tried restructuring the code in order to perform fewer loads/stores or to maximize locality. In the end, we decided pure code restructuring resulted in higher fps than intrinsics.

4) We implemented multithreading by giving half of each image to each thread. The two threads would then call grayScale and sobelCalc in parallel, each processing half of the image. In order to synchronize the two threads, we created two barriers -- one for grayScale and one for sobelCalc. We forced the threads to synchronize four times -- before calling grayScale, after returning from grayScale, before calling sobelCalc, and after returning from sobelCalc. The multi-threaded implementation sped up to 56 fps.

About

No description, website, or topics provided.

Readme

Activity

0 stars

3 watching

0 forks

Report repository

Releases

No releases published

Packages

No packages published

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Makefile

Makefile

README

README

baxter.avi

baxter.avi

main.cpp

main.cpp

pc.cpp

pc.cpp

pc.h

pc.h

sobel_alg.h

sobel_alg.h

sobel_calc.cpp

sobel_calc.cpp

sobel_calc_intrinsics.cpp

sobel_calc_intrinsics.cpp

sobel_mt.cpp

sobel_mt.cpp

sobel_st.cpp

sobel_st.cpp

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

jialinding/ee180lab2

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages