Skip to content

hzhang1234/staden-io_lib

 
 

Repository files navigation

			IO_LIB VERSION 1.13.7
			=====================

Io_lib is a library of file reading and writing code to provide a general
purpose trace file (and Experiment File) reading interface. The programmer
simply calls the (eg) read_reading to create a "Read" C structure with the
data loaded into memory. It has been compiled and tested on a variety
of unix systems, MacOS X and MS Windows.

The directories below here contain the io_lib code. These support the
following file formats:

	SCF trace files
	ABI trace files
	ALF trace files
	ZTR trace files
	SFF trace archives
	SRF trace archives
	Experiment files
	Plain text files
	SAM/BAM sequence files
	CRAM sequence files

These link together to form a single "libstaden-read" library supporting
all the file formats via a single read_reading (or fread_reading or
mfread_reading) function call and analogous write_reading functions
too. See the file include/Read.h for the generic 'Read' structure.

What's new in 1.13.7
====================

Bug fixes for CRAM, in particular scram_flagstat when running on
Cramtools output. Also made some speed ups to the multi-threading
support when running under I/O bound conditions.


What's new in 1.13.6
====================

Several more CRAM and BAM bug fixes.  CRAM has also had a major
restructuring to use more external blocks for data series, potentially
improving compression ratios.  This will become more important once
v3.0 is finalised, but it still helps v2.1 CRAM format a bit too.

This change also allows for decoding to be faster when only needing a
few columns, such as with scram_flagstat.


What's new in 1.13.5
====================

Two bug fixes to CRAM involving computation of MD5sums for both the
@SQ line and also the slice headers. See the CHANGES or ChangeLog file
for details.


What's new in 1.13.4
====================

The CRAM specification has updated to version 2.1 and now comes with
EOF blocks. This is now the default output version.  Scramble now also
has a -B option to perform the Illumina lossy 8 way quality-binning.

CRAM version 3.0 is under discussion and scramble contains some
highly experimental options to deal with this (-J for rANS /
arithmetic coding, block level CRC32s), but these are disabled by
default and should not be used except for research purposes.

Also fixed a few bugs elsewhere, most notably in BAM decoding and
index_tar.


What's new in 1.13.3
====================

Another bug fix release, primarily focused around CRAM support. The
most significant fixes here are to multi-threading (do not use
threading in 1.13.2) and handling of fetching reference sequences.

Improved robustness of code too, in particular when facing broken data.


What's new in 1.13.2
====================

This release has various improvements and bug fixes to CRAM support,
in addition to experimental multi-threading code for reading and
writing BAM/CRAM (but not SAM). By default this is not used, but use
the -t option in scramble to enable it.

Multi-threading scales well with BAM reading and writing, but CRAM
currently has diminishing returns after 4 or 5 threads.


What's new in 1.13.1
====================

This is primarily a bug fix release over 1.13.0 with all changes being
in the SAM/BAM/CRAM support.

The main new feature is the ability to store unsorted data and to
permit non-reference based encoding (albeit not very efficiently).
There is also now support for finding references by a colon separated
search path (REF_PATH environment variable), which may contain URLs in
the same manner that TRACE_PATH does.  We can also locally cache files
accessed by the MD5 sum to the REF_CACHE environment variable. 

See the CHANGES file for full details.


What's new in 1.13.0
====================

The library has acquired functions for reading and writing SAM, BAM
and CRAM formats along with a command line tool for converting between
them named "scramble".

See the scramble man page for more information.

At the time of writing the CRAM-2.0 specification is in draft form
with the official release two months away. This code is subject to any
changes that may occur between the current and official CRAM release.
Note that it should also be considered beta quality, given a relative
lack of testing and real-world experience with CRAM.

The command line tools (scramble) are unlikely to change
substantially, but the sam, bam cram and "scram" API may undergo
considerable changes.

So please consider the sam/bam/cram code as beta with the rest of
io_lib as a stable release.


Older comments
==============

In 1.12.x saw various improvements to building and linking,
specifically on Fedora and MacOS X plus the use of libtool to create
dynamic libraries.  The library name is now libstaden-read.so too, as
this was already renamed within Debian.

We removed illumina2srf and srf2illumina in this release too (they
have their own package on SourceForge now).

In 1.11.x the SRF support was added. The SRF v1.3 format specification can
be found here:

http://www.bcgsc.ca/pipermail/ssrformat/attachments/20071209/b0f865a0/ShortSequenceFormatDec9th_v_1_3-0001.doc

The ZTR specification changes involve adding some new compression
types (the general purpose XRLE2 plus some more solexa specific TSHIT
and QSHIFT methods), a region chunk (REGN) to indicate the location of
paired-end data stored in a single trace, improved meta-data support
for SMP4/SAMP chunks including specifying the baseline (OFFS meta-data
tag) and various minor tweaks. There's still a few questions in the
ZTR format itself (pending feedback), but what is implemented
currently is also what has been described in the docs/ZTR_format
file.

Finally the directory layout has been greatly simplified with the
merging of all the format directories into a single "io_lib"
directory and the programs utilising it remaining in the "progs"
subdirectory.


Building
========

Linux
-----

We use the GNU autoconf build mechanism.

To build:

1. ./configure

"./configure --help" will give a list of the options for GNU autoconf. For
modifying the compiler options or flags you may wish to redefine the CC or
CFLAGS variable.

Eg (in sh or bash):
   CC=cc CFLAGS=-g ./configure

2. make (or gmake)

This will build the sources.

CFLAGS may also be changed a build time using (eg):
    make 'CFLAGS=-g ...'

3. make install

The default installation location is /usr/local/bin and /usr/local/lib. These
can be changed with the --prefix option to "configure".

Windows
-------

Under Microsoft Windows we recommend the use of MSYS and MINGW as a
build environment.

These contain enough tools to build using the configure script as per
Linux. Visit http://sourceforge.net/projects/mingw/files/ and
download/install Automated MinGW Installer (eg MinGW-5.1.4.exe), MSYS
Base System (eg MSYS-1.0.11.exe) and MSYS Supplementary Tools (eg
msysDTK-1.0.1.exe).


MacOS X
-------

The configure script should work by default, but if you are attempting
to build FAT binaries to work on both i386 and ppc targets you'll need
to disable dependency tracking. Ie:

    CFLAGS="-arch i386 -arch ppc" LDFLAGS="-arch i386 -arch ppc" \
      ../configure --disable-dependency-tracking

About

GitHub clone of SVN repo svn://svn.code.sf.net/p/staden/code/io_lib (cloned by http://svn2github.com/)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 68.3%
  • Roff 18.7%
  • Shell 12.4%
  • Other 0.6%