inGAP-sv: structural variation detection and visualization
We developed an integrative next-generation genome analysis pipeline (inGAP), which employed a Bayesian principle to detect single nucleotide polymorphisms (SNPs), small insertion/deletions (indels). inGAP has been applied to a number of genome projects, including bacteria, yeast, plants and mammals. Here we extend this pipeline to identify and visualize large-size structural variations, including insertions, deletions, inversions and translocations.
1. What inGAP-sv can do?
á Refine short read alignment by re-aligning short reads around a putative SV.
á Detect large-size structural variations using paired end sequencing reads.
á Visualize SAM-formatted alignments and SVs.
2. How does inGAP-sv identify SVs?
á Classify mapped paired-end reads into normal/anomalous mapping types.
á Detect Ògapped regionsÓ which cannot be covered by normally mapped paired reads.
á Detect SVs based on various anomalous mapping combinations. Read qualities, mapping qualities, and ratio of paired-end reads relative to average mapping densities will be considered for the calculation of SV quality.
3. How to get started?
á inGAP-sv requires two files, a FASTA formatted reference sequence and a SAM alignment
á A PTT formatted annotation file for the reference sequence is optional.
á A demo application is preloaded in inGAP-sv.
4. WhatÕs the difference between inGAP-sv and other SV tools?
á Most of the current SV tools can only detect very short indels (e.g. 1-10bp); inGAP-sv and a few others (e.g. breakdancer) work well with large SVs (>100bp)
á inGAP-sv is a one-stop SV detector. Users can identify, visualize, annotate and manually edit SVs using inGAP-sv.
á Compared with other command-line based SV tools, visualization of paired reads in inGAP-sv can significantly reduce the false discovery rate.
5. WhatÕs the performance of inGAP-sv?
á We firstly tested inGAP-sv using simulated data with large SVs (100-1000bp) from the Yoruban genome (NA18507). inGAP-sv could successfully identify 75%-90% of large indels and >85% of inversions with high accuracy rate. Detailed evaluation is in progress.
á We also applied inGAP-sv to an Arabidopsis thaliana genome re-sequencing project. inGAP-sv have identified 815 insertions and 1000 deletions. We compared these indels to the Monsanto A. thaliana assembly, and found that 78% of the deletions could be covered by Monsanto contigs and 99% of them were correct. 71% of insertions could be covered by Monsanto contigs and 96% of them were correct.
á inGAP-sv supports parallel computing.
6. How to access inGAP-sv?
á Users can download the latest version of inGAP from http://sourceforge.net/projects/ingap/. We provide binaries for Windows, Linux, MacOS/X.
á A quick manual is available at http://schuster-33.bx.psu.edu/shared/manual.pdf .
á Main functions and work flow of inGAP-sv
á Deletions detected by inGAP-sv
á Insertions detected by inGAP-sv
á Inversions detected by inGAP-sv
á Schematic view of SVs