iReckon version 1.0.8 Requirements: - iReckon works on linux. You need to have at least java-1.6 and the latest version of BWA installed and added to the PATH. - For large datasets and genomes, anticipate an important memory cost and running time (It is usually around 16G and 24 hours on 8 processors for human RNA-Seq with 60M read pairs). The working directory (output directory) should have enough memory space (>100G for the previous example). - Make sure the coordinates (chromosome name, reference) are the same for the alignment file (.bam), the annotation file and the reference. Usage : java -Xmx15000M -jar iReckon-1.0.7.jar bam_file reference_genome annotations -1 reads_file [options] bam_file : Alignment file in bam format. Need to be indexed (bam.bai). Should contain split reads alignments (Like TopHat output). reference_genome : In fasta format. It must be the same reference used for creating the bam_file. annotations : Known transcripts annotations +/ Known start sites/end sites Can be preprocessed with FormatTools or Savant Genome Browser (FormatTools is available with Savant) Preprocessing can also be automatically done by iReckon (functionality tested with UCSC tab-delimited refGene files but should work with 12-columns BED format as well). GTF and other formats are not accepted Output : IReckon output is a description of the found isoforms and their quantities contained in the file output_dir/result.gtf Options : -1 reads_file : The RNA-Seq reads file in fasta or fastq format. -2 reads_file2 : The read mates file. If -2 is not used then the reads in reads_file1 have to be listed pair by pair. -o output_dir : Select the output repository . Anticipate enough memory. -m method : The tool used to align reads to isoforms. 0 for Shrimp , 2 for BWA. By default =2. -n nb_thread : Determine the number of threads. -b bias : Select the magnitude for the gene-borders bias correction. Bias correction: 0=none, 1=weak, 2=strong. Default=0. Similar to Effective Isoform Length Correction (Cufflinks) Works only for isoforms longer than 200 nt. -d : Use simultaneous duplicates removal. -q : Quick re_run. Useful if you have recently run iReckon on the same data, in the same repository(different options). -ign : Ignore splicing that do not start/end at a known splicing border site. Useless with -q. Default=false -novel : Enable/ disable novel isoforms discovery. 0=Disabled, 1=Enabled. Useless with -q. Default=Enabled -nbi nbIso_Max : Maximum number of correctly constructed isoforms per gene. If this number is exceeded heuristics will be applied to remove rare junctions in order to reduce the complexity. Useless with -q. Default=100. -minrec nbrec : Minimum number of records aligned to a gene to allow study Useless with -q. Default=10. -maxref refsize: Maximum size of a reference file to be indexed by bwa (GO) If transcriptome is bigger, it will be split. It is advisable to use higher values when enough RAM is available. (The RAM usage is determined by BWA indexing algorithm: linear with reference size) Useless with -q. Default=6 -chr chr-Name : Give a comma-separated list of the chromosomes to be investigated by iReckon. Be careful that the FPKM computed are normalized by the number of reads mapped there. Useless with -q. -start int : Used with -chr and -end to specify the location studied by iReckon. Useless unless exactly one chromosome is specified. Useless with -q. Default=1. -end int : Used with -chr and -start to specify the location studied Useless unless exactly one chromosome is specified (-chr) Useless with -q. Default=End_of_Chromosome. Example: java -Xmx15000M -jar IReckon-1.0.8.jar alignment.bam reference.fa.savant hg19.refGene.gz -1 reads.fastq -o /disk/ireckon_output/ -d -n 8 -b 2 -nbi 100 > logs.txt