Arachne: A Whole Genome Assembly Software Package

Arachne is a tool for assembling genome sequences from whole genome shotgun reads, mostly in forward-reverse pairs obtained by sequencing clone ends.

As input, Arachne expects the base calls and associated quality scores of each read (as is produced by most base-calling software, such as PHRED), as well as ancillary information about each read (in a standard format described herein).

As output, Arachne produces a list of supercontigs ("scaffolds"), each of which consists of an ordered list of contigs, all forward-oriented, and the estimates for the gaps between them within the supercontig. Base calls and quality scores are provided for each contig, along with the approximate locations of the reads which were used to build it. We also produce a summary and brief analysis of the assembly.

Many of Arachne's algorithms are described in "ARACHNE: A Whole-Genome Shotgun Assembler", Genome Research , January 2002, and "Whole-Genome Sequence Assembly for Mammalian Genomes: ARACHNE 2", Genome Research , January 2003.

We recommend that you use the current code, rather than the 2.0.1 release, as the code has been greatly improved since then.

Partial list of system requirements:

  • 1. Unix or linux
  • 2. gcc 4.1.1
  • 3. libxerces
  • 4. TeX