/img alt="Imagem da capa" class="recordcover" src="""/>
Dissertação
Ferramenta baseada em cuckoo filter para remoção de redundância em dados de sequenciadores de segunda geração (NGS - next generation sequencing)
The second-generation sequencing platforms, also known as NGS – Next Generation Sequencing, produce a great amount of data, which demands high complexity and computational cost in the processing of these data. These platforms generate duplicated reads that come from the preparation of the genomic li...
Autor principal: | GAIA, Antonio Sérgio Cruz |
---|---|
Grau: | Dissertação |
Idioma: | por |
Publicado em: |
Universidade Federal do Pará
2020
|
Assuntos: | |
Acesso em linha: |
http://repositorio.ufpa.br:8080/jspui/handle/2011/12798 |
Resumo: |
---|
The second-generation sequencing platforms, also known as NGS – Next Generation Sequencing, produce a great amount of data, which demands high complexity and computational cost in the processing of these data. These platforms generate duplicated reads that come from the preparation of the genomic library and are included in the amplification stage by PCR (Polymerase Chain Reaction). This redundancy can increase the computational requirements and processing time of subsequent analyses (for instance, de novo assembly). To reduce the computational cost of theses analyses, it is necessary to remove these reads from the data set of the sequenced organism. In this work, we present the NGSReadsTreatment, a computational tool to remove duplicated reads in paired-end or single-end data sets. The input for NGSReadsTreatment consists of reads from any sequencing platform with same or different read lengths. Its engine uses a Cuckoo Filter probabilistic structure to identify and remove redundant readings. The identification is done by comparing the reads among themselves, this way, not any pre-requisite is necessary besides the reads set. The validation of the tool was carried out by using a set of real and simulated data. To assess the efficiency of the tool, it was compared to other tools of redundancy removal. The results indicate the efficiency of the NGSReadsTreatment, for it produced the best outcome, both in the number of redundancies removed and the use of memory, in all tests done. Developed in JAVA, the NGSReadsTreatment is compatible with UNIX/Linux and Windows operating systems and has a version with a graphic interface to facilitate its use. |