Skip to content

COMBINE-lab/staden-io_lib

 
 

Repository files navigation

			IO_LIB VERSION 1.14.8
			=====================

Io_lib is a library of file reading and writing code to provide a general
purpose trace file (and Experiment File) reading interface. The programmer
simply calls the (eg) read_reading to create a "Read" C structure with the
data loaded into memory. It has been compiled and tested on a variety
of unix systems, MacOS X and MS Windows.

The directories below here contain the io_lib code. These support the
following file formats:

	SCF trace files
	ABI trace files
	ALF trace files
	ZTR trace files
	SFF trace archives
	SRF trace archives
	Experiment files
	Plain text files
	SAM/BAM sequence files
	CRAM sequence files

These link together to form a single "libstaden-read" library supporting
all the file formats via a single read_reading (or fread_reading or
mfread_reading) function call and analogous write_reading functions
too. See the file include/Read.h for the generic 'Read' structure.

See the CHANGES for a summary of older updates or ChangeLog for the
full details.

Version 1.14.8 (22nd April 2016)
--------------

* SAM: Small speed up to record parsing.

* CRAM: Scramble now has -p and -P options to control whether to
  force the BAM auxiliary sizes (8 vs 16 vs 32-bit integer quantities)
  rather than reducing to smallest size required, and whether to
  preserve the order of auxiliary tags including RG, NM and MD.

  This latter option requires storing these values verbatim instead of
  regenerating them on-the-fly, but note this only preserves tag order
  with Scramble / Htslib.  Htsjdk will still produce these fields out
  of order.

* CRAM no longer stores data in the CORE block, permitting greater
  flexibility in choosing which fields to decode.  (This change is
  also mirrored in htslib and htsjdk.)

* CRAM: ref.fai files in a different order to @SQ headers should now
  work correctly.

* CRAM required-fields parameters no longer forces quality decoding
  when asking for sequence.

* CRAM: More robustness / safety checks during decoding; itf8 bounds
  checks, running out of memory, bounds checks in BETA codec, and
  more.

* CRAM auto-generated read names are consistent regardless of range
  queries.  They also now match those produced by htslib.

* A few compiler warnings in cram_dump / cram_size have gone away.
  Many small CRAM code tweaks to aid comparisons to htslib.  It should
  also be easier to build under Microsoft Visual Studio (although no
  project file is provided).

* CRAM: the rANS codec should now be slightly faster at decoding.

* CRAM bug fix: removed potential (but unobserved) possibility of
  8-bit quantities stored as a 16-bit value in BAM being converted
  incorrectly within CRAM.

* SAM bug fix: no more complaining about "unknown" sort order.



Building
========


Zlib
----

This code makes heavy use of the Deflate algorithm, assuming a Zlib
interface.  The native Zlib bundled with most systems is now rather
old and better optimised versions exist for certain platforms
(e.g. using the SSE instructions on Intel and AMD CPUs).

Therefore the --with-zlib=/path/to/zlib configure option may be used
to point to a different Zlib.  I have tested it with the vanilla zlib,
Intel's zlib and CloudFlare's Zlib.  Of the three it appears the
CloudFlare one has the quickest implementation, but mileage may vary
depending on OS and CPU.  

CloudFlare: https://github.com/cloudflare/zlib
Intel:      https://github.com/jtkukunas/zlib
Zlib-ng:    https://github.com/Dead2/zlib-ng

The Zlib-ng one needs configuring with --zlib-compat and when you
build Io_lib you will need to define -DWITH_GZFILEOP too.  It also
doesn't work well when used in conjunction with LD_PRELOAD. Therefore
I wouldn't recommend it for now.

If you are using the CloudFlare implementation, you may also want to
disable the CRC implementation in this code if your CloudFlare zlib
was built with PCLMUL support as their implementation is faster.
Otherwise the CRC here is quicker than Zlib's own version.
Building io_lib with the internal CRC code disabled is done
with ./configure --disable-own-crc (or CFLAGS=-UIOLIB_CRC).


Linux
-----

We use the GNU autoconf build mechanism.

To build:

1. ./configure

"./configure --help" will give a list of the options for GNU autoconf. For
modifying the compiler options or flags you may wish to redefine the CC or
CFLAGS variable.

Eg (in sh or bash):
   CC=cc CFLAGS=-g ./configure

2. make (or gmake)

This will build the sources.

CFLAGS may also be changed a build time using (eg):
    make 'CFLAGS=-g ...'

3. make install

The default installation location is /usr/local/bin and /usr/local/lib. These
can be changed with the --prefix option to "configure".


Windows
-------

Under Microsoft Windows we recommend the use of MSYS and MINGW as a
build environment.

These contain enough tools to build using the configure script as per
Linux. Visit http://sourceforge.net/projects/mingw/files/ and
download/install Automated MinGW Installer (eg MinGW-5.1.4.exe), MSYS
Base System (eg MSYS-1.0.11.exe) and MSYS Supplementary Tools (eg
msysDTK-1.0.1.exe).

If you wish to use Microsoft Visual Studio you may need to add the
MSVC_includes subdirectory to your C include search path.  This
adds several missing header files (eg unistd.h and sys/time.h) needed
to build this software.  We do not have a MSVC project file available.

In this case you will also need to copy io_lib/os.h.in to io_lib/os.h
and either remove the @SET_ENDIAN@ and adjacent @ lines (as these are
normally filled out for you by autoconf) or add -DNO_AUTOCONF to your
compiler options.

The code should also build cleanly under a cross-compiler.  We use
    ./configure \
            --host=x86_64-w64-mingw32 \
            --prefix=$DIST \
            --with-io_lib=$DIST \
            --with-tcl=$DIST/lib \
            --with-tk=$DIST/lib \
            --with-tklib=$DIST/lib/tklib0.5 \
            --with-zlib=$DIST \
            LDFLAGS=-L$DIST/lib

with $DIST being pre-populated with already built and installed 3rd
party dependencies, some from MSYS mentioned above.



MacOS X
-------

The configure script should work by default, but if you are attempting
to build FAT binaries to work on both i386 and ppc targets you'll need
to disable dependency tracking. Ie:

    CFLAGS="-arch i386 -arch ppc" LDFLAGS="-arch i386 -arch ppc" \
      ../configure --disable-dependency-tracking

About

GitHub clone of SVN repo svn://svn.code.sf.net/p/staden/code/io_lib (cloned by http://svn2github.com/)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 65.4%
  • Roff 16.2%
  • Shell 10.8%
  • Makefile 6.7%
  • M4 0.6%
  • Perl 0.2%
  • C++ 0.1%