kmeans-gpu

A baseline K-Means program on GPU

Basic version (master branch): cuda kernel

Calculate the distance between each point and cluster centroids, and find the nearest centroids;
For each centroid, calculate the number of its belonging points, and sum up their coordinates.
The averaging (a final division) is done outside the kernel.

GPU architecture-related optimization:

Coalesced memory access;
Shared memory -- save centroids into shared memory by tiles;
Prefetching;

If define SYNCOUNT in the Makefile, '__syncthreads_count' would be used to replace an atomic sumation.

Versions and branches:

master: Pure hand-coded version, without using CuBLAS. dist = (x_i - c_i)^2;
v2: (x_i - c_i)^2 = x_i^2 + c_i^2 - 2x_i*c_i, use cublasSgemm() and cublasSnrm2() to compute each tearm, finding cublasSnrm2() is slow;
v3: Using the diagonals of (x^t * x), computed by cublasSgemm(), to substitute cublasSnrm2(), limited by the points size;
v4: Self-written vec_norm(), sharding vector to get smaller cublasSgemm(), solve v3's limit on size.

Usage:

./kmeans-gpu [switches] -i filename -n num\_clusters
    -i filename    : file containing data to be clustered
    -n num_clusters: number of clusters (K must > 1)
    -t threshold   : threshold value (default 0.0010)
    -c iteration   : end after iterations

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
Debug		Debug
baseline_cpu_seq		baseline_cpu_seq
doc		doc
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
data_gen.py		data_gen.py
file_io.cpp		file_io.cpp
file_io.h		file_io.h
kmeans.cu		kmeans.cu
kmeans.h		kmeans.h
main.cpp		main.cpp
mtime.cpp		mtime.cpp
mtime.h		mtime.h

License

yige-hu/kmeans_parallel

Folders and files

Latest commit

History

Repository files navigation

kmeans-gpu

A baseline K-Means program on GPU

Basic version (master branch): cuda kernel

GPU architecture-related optimization:

Versions and branches:

Usage:

About

Resources

License

Stars

Watchers

Forks

Languages