BSA4Yeast

1) Step-by-step instructions.

2) Format specification.

1) Please specify your input data file for uploading.

2) Please wait until your input file has been uploaded.

3) Check the uploaded files.

4) Set up your experimental design.

5.1) Select and define parameters for .fastq files.

5.2) Select and define parameters for .bam files.

5.3) Select and define parameters for .map files.

6) Check your settings and click on Calculate G' button.

7) Check progress.

8) Check job status.

9) Optionally receive an e-mail notification when the analysis is complete.

10) Result files can be downloaded.

11) Each analysis can be manually stopped.

12) Stopped analyses will be displayed as "Failed".

13) Each analysis can be manually deleted.

14) In the webpage, by clicking on the "Example" tab, the user can either run an example dataset with default settings or download an example dataset and run the analysis with user-defined settings.

FASTQ format

The FASTQ file format is used to store nucleotide sequences and associated quality scores in a readable text format. In short, a typical FASTQ file uses the following four lines to encode a sequence: 1) Line 1 starts with the "@" character followed by a sequence identifier and an optional description. 2) Line 2 contains the original sequence. 3) Line 3 starts with a "+" character and optionally contains the same sequence identifier and description as in line 1. 4) Line 4 uses symbols to encode the quality values of the sequence in line 2 via ASCII-codes representing integer values. FASTQ files are often compressed and saved in the GNU zip format (an open source file compression program), with an additional .gz extension of the file name.

BAM format

The binary alignment map (BAM) file format contains the complete raw data from a DNA or RNA sequencing run. It uses a binary, lossless compression of the related uncompressed, text-based sequence alignment map (SAM) format in order to provide the raw data in a compact representation. More specifically, the BAM file consists of the following header and alignment sections: 1) The header describes general features of the complete file, e.g. the sample name, sample length, and alignment method. 2) The alignment section includes the read name, read sequence and quality, information about thealignment, and potential custom tags. Within the read name information about the chromosome, start coordinate, alignment quality, and match descriptor is encoded as a text string.

MAP format

The Map file format is the standard input format for the calculation of the G' statistic used to evaluate Bulk Segregant Analysis results. It uses a text-based format with the following four columns: The 1st column contains the name of the chromosome, the 2nd column specifies the coordinates of the described genetic marker, the 3rd column lists the number of alleles which come from the 1st parent line, the 4th column contains the number of alleles from the 2nd parent line. A simple example map file is shown below:

Please note that when using a map-file as input, only the calculation of the G-statistics is possible, but not the exonic variant annotation and scoring of variant functional impacts.

LENGTH format

The length file format is a file containing a list of chromosome lengths (in bp, one chromosome per line) and is used together with a corresponding map file (see above), when the user aims to plot the results of a Bulk Segregant QTL Analysis. The order of the chromosomes in the length file should be the same as the order in the map file. A simple example length file is shown below: