1. Introduction
GOBOND, which is a stand-alone scaffolding program with more accuracy to deal with complicated genome. It is the bridge to join together the data from multiple platforms. Pre-assembled contigs from one sequencing platform could be oriented and linked by pair-end/mate-pair reads from other platforms.
2. System requirements

Linux 64bit

Java version 1.6.0 or above

For solid reads (cfa2fq): gcc 4.5.0 or above

Memory: 20GB or above

3. Download

GOBAND package(Version: 1.04)                    download

GOBAND test data(Human chromosome 14)    download

4. Installation

Tar –vxf GOBOND.tar.gz

sh install

Add the ‘bin’ path to the environment variable ‘PATH’

5. Command and usage
GOBOND -i config –o out_dir
 
-------------------
config file's format
-------------------
contig=/directory/contig_file_name >> contig file
repeat=int >> if len(contig) < int, sign it to the repeat(default:200)
cpu=int >> cpu number(default:1)
[lib]
reads1=/directory/reads1_fastq_file >> solexa is reads1, solid is r3(.fq)
reads2=/directory/reads2_fastq_file >> solexa is reads2, solid is f3(.fq)
  Reads1 and reads2 should be longer than 30, if shorter, filter them and they should be paired id.
cut=int >> link cutoff(default:5)
ins=int >> insert length (default: calculated by program)
  Usually, use default and do not set it, when the lib's insert length is too large to calculate, please set it in this config file.
min=int >> min insert length(default:0)
max=int >> max insert length(default:50000)
rpt=int >> filter inner repeat region's links [0:not, 1:filter] (default:1)
type=int >> read type(default:0)
  '0': solexa pair-end reads, not reverse sequence(--> <--)
  '1': solexa mate-pair reads, reverse sequence(<-- -->)
  '2': solid mate-pair reads(--> -->)
[lib]
.
.
.
6. Output files

1, mapping/: reads mapping information.

2, coverage.txt

        C1667 245 0.99 0.99 unique
        C1529 134 0.48 0 unique
        C1663 238 4.61 0 repeat
        C1523 133 4.55 2 unique
        C1665 238 6.87 4 repeat
        C1773 1225 0.91 1 unique
Column1: contig_name
Column2: contig_length
Column3: normalized reads mapping coverage
Column4: round of coverage
Column5: is unique or repeat

3, lib1_graph

        :{-+=44}:18

        :{-+=6}:10

        :{+-=14}:173

        :{--=10}:222

        :{-+=13}:199

        :{--=53}:127

Edge information for lib1:

        :<-+=44>:18

-+:the orientation between the two contigs, 44 is the link number. 18 is the gap number between the two contigs.

4, lib1_filteredGraph

The graph edge between two contigs is filtered.

5, scaffold_1.txt

        >scaffold_1

        C1309 +

>scaffold_2

        scaffold7 + 48

        C1419 - 4

        C1773 + 920

        C1751 + 173

        C1513 - 905

        scaffold2 - 3

        C2003 - 36

        C1399 - 49

        scaffold13 - 2

        C1949 - 860

        scaffold14 -

        >scaffold_3

        C1345 +

The line begins “>” is one scaffold’s name.

For others, in one scaffold:

Column1: contigs name

Column2: the direction in reference

Column3: the gap between the next contig

6, scaffold_1.fa

        The scaffold sequence which is after lib1 scaffolding.

7, insert_2.txt

        This is the scaffold file use lib2 to insert small contigs into scaffold_1.txt scaffold. The format is the same to scaffold_1.txt.

8, insert_2.fa

        The sequence corresponded to insert_2.txt

9, scaffold_2.txt

        Same to scaffold_1.txt

10, scaffold_2.fa

        The sequence corresponded to scaffold_2.txt

11, scaffolds.txt

        The final scaffolds after all libs scaffolding and layback         reapeats.

12, scaffolds.fa

        The final sequence corresponded to scaffolds.txt

7. Example

nohup GOBOND –i config –o . > log &

config file content:

contig=/share_bio/panfs_bio2/bioinformatics/chilj/genome/fly/abyss_k41/best.fa

cpu=30

[lib]

reads1=/share_bio/panfs_bio2/bioinformatics/chilj/genome/fly/raw/s_1_1.fq

reads2=/share_bio/panfs_bio2/bioinformatics/chilj/genome/fly/raw/s_1_2.fq

min=0

max=400

type=0

[lib]

reads1=/share_bio/panfs_bio2/bioinformatics/chilj/genome/fly/raw/s_2_1.fq

reads2=/share_bio/panfs_bio2/bioinformatics/chilj/genome/fly/raw/s_2_2.fq

min=200

max=600

type=0

[lib]

reads1=/share_bio/panfs_bio2/bioinformatics/chilj/genome/fly/raw/s_3_1.fq

reads2=/share_bio/panfs_bio2/bioinformatics/chilj/genome/fly/raw/s_3_2.fq

min=300

max=700

type=0

[lib]

reads1=/share_bio/panfs_bio2/bioinformatics/chilj/genome/fly/2k/s_4_1.fq

reads2=/share_bio/panfs_bio2/bioinformatics/chilj/genome/fly/2k/s_4_2.fq

min=500

max=4000

type=1

©BIG 2012, Beijing Institute of Genomics, Chinese Academy of Sciences
No.7 Beitucheng West Road, Chaoyang District, Beijing 100029, PR China