Skip to content

hjelmn/libhio

 
 

Repository files navigation

# -*- Mode: sh; sh-basic-offset:2 ; indent-tabs-mode:nil -*-
#
# Copyright (c) 2014-2016 Los Alamos National Security, LLC.  All rights
#                         reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#

HIO Readme
==========

Last updated 2016-04-06

See file NEWS for a description of changes to hio.  Note that this README
refers to various LANL clusters that have been used for testing HIO. Using
HIO in other environments may require some adjustments.


Building
--------

HIO builds via a standard autoconf/automake build.  So, to build:

1) Untar
2) cd to root of tarball
3) module load needed compiler or MPI environment 
4) ./configure
5) make 

Additional generally useful make targets include clean and docs.  make docs
will build the HIO API document, but it requires doxygen and various latex
packages to run, so you may prefer to use the document distributed in file
design/libhio_api.pdf.

Our target build environments include gcc with OpenMPI on Mac OSX for unit
test and gcc, Intel and Cray compilers on LANL Cray systems and TOSS clusters
with OpenMPI or Cray MPI.  

Included with HIO is a build script named hiobuild.  It will perform all of
the above steps in one invocation.  The HIO development team uses it to launch
builds on remote systems.  You may find it useful; a typical invocation might 
look like:

./hiobuild -c -s PrgEnv-intel PrgEnv-gnu

hiobuild will also create a small script named hiobuild.modules.bash that
can be sourced to recreate the module environment used for build.


Testing
-------

HIO's tests are in the test subdirectory.  There is a simple API test named
test01 which can also serve as a coding example.  Additionally, other tests
are named run02, run03, etc.  Theses tests are able to run in a variety of
environments:

1) On Mac OSX for unit testing
2) On a non-DataWarp cluster in interactive or batch mode
3) On one of the Trinity systems with DataWarp in interactive or batch mode

run02 and run03 are N-N and N-1 tests (respectively). Options help can be
displayed by invoking with a -h option.  These tests use a common script 
named run_setup to process options and establish the testing environment.
They invoke hio using a program named xexec which is driven by command strings
contained in the runxx test scripts.

A typical usage to submit a test DataWarp batch job on the small LANL test system
named buffy might look like:

cd <tarball>/test
./run02 -s m -r 32 -n 2 -b 

Options used:
  -s m    ---> Size medium (200 MB per rank)
  -r 32   ---> Use 32 ranks
  -n 2    ---> Use 2 nodes
  -b      ---> Submit a batch job  

The runxx tests will use the hiobuild.modules.bash files saved by hiobuild
(if available) to reestablish the same module environment used at build
time.

A multi-job submission script to facilitate running a large number of tests
with one command  is available.  A typical usage for a fairly thorough test
on a large system like Trinity might look like:

run_combo -t ./run02 ./run03 ./run12 -s x y z -n 32 64 128 256 512 1024 -p 32 -b

This will submit 54 jobs (3 x 3 x 6) with all combinations of the specified
tests and parameters.  The job scripts and output will be in the test/run
subdirectory.


Simple DataWarp Test Job
------------------------

The HIO source contains a script test/dw_simple_sub.sh that will submit a
simple, small scale test job on a system with Moab/DataWarp integration.  See
the comments in the file for instructions and a more detailed description.


Step by step procedure for building and running HIO tests on LANL system Trinity
--------------------------------------------------------------------------------

This procedure is accurate as of 2016-03-02 with HIO.1.2.0.4.

1) Get the distribution tarball libhio-1.2.0.1.tar.gz from oine of the following:
   a) tr-login1:~cornell/hio/libhio-1.2.0.1.tar.gz
   b) yellow /usr/projects/hio/user/rel/libhio-1.2.0.1.tar.gz
   c) By request from Cornell Wright - cornell@lanl.gov
   d) Download from Github << need location >>

2) Untar

3) cd <dir>/libhio-1.2       ( <dir> is where you untarred HIO )

4) ./hiobuild -cf -s PrgEnv-cray,PrgEnv-gnu -l craype-haswell 

   At the end of the build you will see:

   nid00070 ====[HIOBUILD_RESULT_START]===()===========================================
   nid00070 buildhio : Checking /cray_home/cornell/libhio/libhio-1.2/hiobuild.out for build problems
   41:configure: WARNING: using cross tools not prefixed with host triplet
   268:Warning:
   nid00070 buildhio : Checking for build target files
   nid00070 buildhio : Build errors found, see above.
   nid00070 ====[HIOBUILD_RESULT_END]===()=============================================

   Ideally, the two warning messages would not be present, but at the moment, they can be ignored.

5) cd test

6) ./run_combo -t ./run02 ./run03 ./run12 ./run20 -s z y x -n 1024 512 256 128 64 32 16 -p 32 -b

   This will create 84 job scripts in the libhio-1.2/test/run directory and submit the jobs.
   Msub messages are in the cooresponding .jobid files in the same directory. Job output is
   directed to corresponding .out files.  The number and mix of jobs is controlled by the
   parameters. Issue run_combo -h for more information.

7) After the jobs complete, issue the following:

   grep -c "RESULT: SUCCESS" run/*.out

   If all jobs ran OK, grep should show 84 files with a count of 1.  Like this:

   cornell@tr-login1:~/pgm/hio/tr-gnu/libhio-1.2/test> grep -c "RESULT: SUCCESS" run/*.out
   run/job.20160108.080917.out:1
   run/job.20160108.080927.out:1
   run/job.20160108.080936.out:1
   run/job.20160108.081422.out:1
     . . . .
   run/job.20160108.082133.out:1
   run/job.20160108.082141.out:1

   Investigate any missing job output or counts of 0.

8) Resources for better understanding and/or modifying these procedures:

   libhio-1.2/README
   libhio-1.2/hiobuild -h
   libhio-1.2/test/run_combo -h
   libhio-1.2/test/run_setup -h
   libhio-1.2/test/run02, run03, run12, run20
   libhio-1.2/test/xexec -h
   libhio-1.2/design/libhio_api.pdf

9) Additional test commands, check the results the same way as above.

   Very simple small single job Moab/DataWarp test:

     ./run02 -s t -n 1 -r 1 -b

   Alternate multi job test suitable for small test system Gadget:

     ./run_combo -t ./run02 ./run03 ./run12 ./run20 -s t s m l -n 1 2 4 1 2 4 -p 32 -b

   Additional many job submission contention test

     ./run90 -p 5 -s t -n 1 -b

     This test submits two jobs that each submit two additional jobs.  Job
     submission continues until the -p parameter is exhausted.  So, the
     total number of jobs is given by (p^2) - 2.  Be cautious about increasing
     the -p parameter.  Since this is only a job submission test, the normal
     scan for RESULT: SUCCESS is not applicable.  Simply wait for the queue to
     empty and look for the expected number of .sh and .out files in the run
     directory.  If there are any .sh files without corresponding .out files,
     look for errors via checkjob -v on the job IDs in the .jobid file.

--- End of README ---

About

libhio is a library intended for writing data to hierarchical data store systems.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 78.0%
  • Shell 12.5%
  • M4 6.5%
  • TeX 2.2%
  • Makefile 0.8%