SIMD popcount

Sample programs for my article http://0x80.pl/articles/sse-popcount.html

Introduction

Subdirectory original contains code from 2008 --- it is 32-bit and GCC-centric. The root directory contains fresh C++11 code, written with intrinsics and tested on 64-bit machine.

As usual type make to compile programs, then you can invoke:

verify/verify_avx/verify_avx2 --- program tests if all non-lookup implementations counts bits properly.
speed/speed_avx/speed_avx2 --- program tests different implementations of popcount procedure; please read help to find all options (run the program without arguments).

You can also run make run, make run_avx or make run_avx2 to run speed for all available implementations compiled for given architecture (more or less: -msse, -mavx, -mavx2).

Available implementations in the new version

procedure	description
lookup-8	lookup in std::uint8_t[256] LUT
lookup-64	lookup in std::uint64_t[256] LUT
bit-parallel	naive bit parallel method
bit-parallel-optimized	a bit better bit parallel
bit-parallel-mul	bit-parallel with fewer instructions
harley-seal	Harley-Seal popcount (4th iteration)
sse-bit-parallel	SSE implementation of bit-parallel-optimized (unrolled)
sse-bit-parallel-original	SSE implementation of bit-parallel-optimized
sse-bit-parallel-better	SSE implementation of bit-parallel with fewer instructions
sse-harley-seal	SSE implementation of Harley-Seal
sse-lookup	SSSE3 variant using pshufb instruction (unrolled)
sse-lookup-original	SSSE3 variant using pshufb instruction
avx2-lookup	AVX2 variant using pshufb instruction (unrolled)
avx2-lookup-original	AVX2 variant using pshufb instruction
avx2-harley-seal	AVX2 implementation of Harley-Seal
cpu	CPU instruction popcnt (64-bit variant)
sse-cpu	load data with SSE, then count bits using popcnt
avx2-cpu	load data with AVX2, then count bits using popcnt
builtin-popcnt	builtin for popcnt
builtin-popcnt32	builtin for popcnt (32-bit variant)
builtin-popcnt-unrolled	unrolled builtin-popcnt
builtin-popcnt-unrolled32	unrolled builtin-popcnt32
builtin-popcnt-unrolled-errata	unrolled builtin-popcnt avoiding false-dependency
builtin-popcnt-unrolled-errata-manual	unrolled builtin-popcnt avoiding false-dependency (asembly code)
builtin-popcnt-movdq	builtin-popcnt where data is loaded via SSE registers
builtin-popcnt-movdq-unrolled	builtin-popcnt-movdq unrolled
builtin-popcnt-movdq-unrolled_manual	builtin-popcnt-movdq unrolled (assembly code)

Acknowledgments

Kim Walisch (@kimwalisch) wrote Harley-Seal scalar implementation.
Simon Lindholm (@simonlindholm) added unrolled versions of procedures.
Dan Luu (@danluu) agreed to include his procedures (builint-*) into this project. More details in Dan's article Hand coded assembly beats intrinsics in speed and simplicity

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
original		original
results		results
scripts		scripts
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.rst		README.rst
config.h		config.h
function_registry.cpp		function_registry.cpp
popcnt-all.cpp		popcnt-all.cpp
popcnt-avx2-cpu.cpp		popcnt-avx2-cpu.cpp
popcnt-avx2-harley-seal.cpp		popcnt-avx2-harley-seal.cpp
popcnt-avx2-lookup.cpp		popcnt-avx2-lookup.cpp
popcnt-avx512-harley-seal.cpp		popcnt-avx512-harley-seal.cpp
popcnt-bit-parallel-scalar.cpp		popcnt-bit-parallel-scalar.cpp
popcnt-builtin.cpp		popcnt-builtin.cpp
popcnt-cpu.cpp		popcnt-cpu.cpp
popcnt-harley-seal.cpp		popcnt-harley-seal.cpp
popcnt-lookup.cpp		popcnt-lookup.cpp
popcnt-sse-bit-parallel-better.cpp		popcnt-sse-bit-parallel-better.cpp
popcnt-sse-bit-parallel.cpp		popcnt-sse-bit-parallel.cpp
popcnt-sse-cpu.cpp		popcnt-sse-cpu.cpp
popcnt-sse-harley-seal.cpp		popcnt-sse-harley-seal.cpp
popcnt-sse-lookup.cpp		popcnt-sse-lookup.cpp
speed.cpp		speed.cpp
sse_operators.cpp		sse_operators.cpp
verify.cpp		verify.cpp

License

Eppie/sse-popcount

Folders and files

Latest commit

History

Repository files navigation

SIMD popcount

Introduction

Available implementations in the new version

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Languages