GitHub - weeitb/HPCWordCloud: Paralelized Word Cloud algorithm

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
hashmap		hashmap
mpi		mpi
openmp-mpi		openmp-mpi
openmp		openmp
resources		resources
sequential		sequential
.gitignore		.gitignore
readme.txt		readme.txt

Repository files navigation

-----------------------Repository Structure-------------------------------------

HPCWordCloud/ Top level folder containing repository
hashmap/ source code related to hashmap implementation.
src/ C Source files for hashmap
include/ C Header files for hashmap
test/ code for testing hashmap independently
sequential/ Code for running word count sequentially.
src/Main.c Sequential source code entry point. Timing is
performed here.
run-sequential.bash Bash file for submitting sequential
job to lsf on Discovery Cluster.
run-results/ Output files of runs performed for sequential
timing measurements.
Makefile Build script for building sequential configuration
of word counting.
resources/ Sample text to parse.
bibleChapters/ Text files for measuring run-time.
openmp/ source code related to hashmap implementation.
src/ C Source files for openmp utility functions that use the
openmp hashmap code. Includes Main.c which is the
openmp entry point to the algorithm that contains timing
calculations.
include/ C Header files for openmp utility functions
test/ code for testing utility functions independently
run-openmp.bash Bash file for submitting parallel job to lsf on
Discovery Cluster.
run-results/ Output files of runs performed for parallel timing
measurements.
Makefile build script for building openmp parallel configuration
of word counting.
mpi/ source code related to hashmap implementation.
src/ C Source files for openmp utility functions that use the
parallel mpi hashmap code. Includes Main.c which is the
mpi entry point to the algorithm that contains timing
calculations.
include/ C Header files for openmp utility functions
test/ code for testing utility functions independently
run-mpi.bash Bash file for submitting parallel job to lsf on
Discovery Cluster.
run-results/ Output files of runs performed for parallel timing
measurements.
Makefile build script for building mpi parallel configuration of
word counting.

----------------------------Running the source code-----------------------------
SEQUENTIAL:
The following steps describe how to run the Serial code to reproduce results
[1.] Download the Project Update Sequential Code release for the stable branch of
the codeline for this project submission at:
https://hithub.com/weeitb/HPCWordCloud
[2.] Enter the directory HPCWordCloud/sequential
[3.] Call 'make'
[4.] The code can then be called through the command line, or issuing a job
through the batch script. Call the code on the command line as such (after
requesting an interactive node):
./sequential <path to directory of texts> (good) | (bad) <hashmap array size>

A specific example being:
./sequential ../resources/bibleChapters/ good 1000

Alternatively, the code can be rerun by issuing a batch job through the following
steps:
[i] Edit lines 7 and 8 of run-sequential.bash to set the working directory
to the current directory the script is located in.
[ii] Submit a job to the discovery cluster queue through the command:
'bsub $<$ run-sequential.bash'
[iii] Output logs will be written in the current directory to output_log

OpenMP:
The following steps describe how to run the Parallel code to reproduce results
in the report.
[1.] Download the Project Update Paralllel 1 Code release for the stable branch
of the codeline for this project submission at:
https://hithub.com/weeitb/HPCWordCloud
[2.] Enter the directory HPCWordCloud/openmp
[3.] Call 'make'
[4.] The code can then be called through the command line, or issuing a job
through the batch script. Call the code on the command line as such (after
requesting an interactive node):
./openmp <path to directory of texts> (good) | (bad) <hashmap array size>

A specific example being:
./openmp ../resources/bibleChapters/ good 2000

Alternatively, the code can be rerun by issuing a batch job through as such:
[i] Edit lines 7 and 8 of run-openmp.bash to set the working directory to
the current directory the script is located in.
[ii] Submit a job to the discovery cluster queue through the command
'bsub $<$ run-openmp.bash'

[iii] Output logs will be written in the current directory to output_log

MPI:
The following steps describe how to run the mpi results to reproduce results shown
in the report.
[1.] Download the Project Update Paralllel 2 Code release for the stable branch of the codeline for this project submission. See references for url.
[2.] Enter the directory HPCWordCloud/mpi
[3.] Call 'make'
[4.] The code can then be called through issuing a job through the batch script by following the steps below:
[i] Edit lines 7 and 8 of run-mpi.bash to set the working directory to the current directory the script is located in.
[ii] Submit a job to the discovery cluster queue through the command
'bsub $<$ run-mpi.bash'
[iii] Output logs will be written in the current directory to output_file

OPENMP + MPI:
The following section describes the repository structure for OpenMP+MPI relevant files. Refer to the serial section to see details on the OpenMP,
and MPI code, which is used as a base for the hybrid implementation.

HPCWordCloud/ Top level folder containing repository
openmp-mpi/ source code related to openmp-mpi implementation.
src/Main.c Openmp + mpi entry point.
run-openmp-mpi.bash Bash file for submitting sequential job to lsf on Discovery Cluster.
run-results/ Output files of runs performed for parallel timing measurements.
Makefile script for building openmp+mpi parallel configuration of word counting.
Running source code:
The following steps describe how to run the openmp-mpi results to reproduce results shown above.

[1.] Download the Project Update Paralllel 3 Code release for the stable branch of the codeline for this project submission. See references for url.
[2.] Enter the directory HPCWordCloud/openmp-mpi
[3.] Call 'make'
[4.] The code can then be called through issuing a job through the batch script by following the steps below:
[i] Edit lines 7 and 8 of run-mpi.bash to set the working directory to the current directory the
script is located in.
[ii] Submit a job to the discovery cluster queue through the command\\
'bsub $<$ run-openmp-mpi.bash'
[iii] Output logs will be written in the current directory to output\_file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hashmap

hashmap

mpi

mpi

openmp-mpi

openmp-mpi

openmp

openmp

resources

resources

sequential

sequential

.gitignore

.gitignore

readme.txt

readme.txt

Repository files navigation

About

Releases 4

Packages

Languages

weeitb/HPCWordCloud

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages