91成人精品-国产对白在线-欧美城天堂网地址-精品1区2区3区|www.sanghaoran.com

熱點文獻

Bioinformatics

Repeat- and Error-Aware Comparison of Deletions

Wittler, R., Marschall, T., Schonhuth, A., Makinen, V..

Motivation: The number of reported genetic variants is rapidly growing, empowered by ever faster accumulation of next-generation sequencing data. A major issue is comparability. Standards that address the combined problem of inaccurately predicted breakpoints and repeat-induce ambiguities are missing. This decisively lowers the quality of "consensus" callsets and hampers the removal of duplicate entries in variant databases, which can have deleterious effects in downstream analyses.

Results: We introduce a sound framework for comparison of deletions that captures both tool-induced inaccuracies and repeat-induced ambiguities. We present a maximum matching algorithm that outputs virtual duplicates among two sets of predictions/annotations. We demonstrate that our approach is clearly superior over ad hoc criteria, like overlap, and that it can reduce the redundancy among callsets substantially. We also identify large amounts of duplicate entries in the Database of Genomic Variants, which points out the immediate relevance of our approach.

Availability: Implementation is open source and available from https://bitbucket.org/readdi/readdi

Contact: roland.wittler@uni-bielefeld.de, t.marschall@mpi-inf.mpg.de