/img alt="Imagem da capa" class="recordcover" src="""/>
Tese
Estratégias evolucionárias para otimização no tratamento de dados ausentes por imputação múltipla de dados
The data analysis process includes information acquisition and organization in order to obtain knowledge from them, bringing scientific advances in various fields, as well as providing competitive advantages to corporations. In this context, an ubiquitous problem in the area deserves attention, t...
Autor principal: | LOBATO, Fábio Manoel França |
---|---|
Grau: | Tese |
Idioma: | por |
Publicado em: |
Universidade Federal do Pará
2017
|
Assuntos: | |
Acesso em linha: |
http://repositorio.ufpa.br/jspui/handle/2011/7267 |
Resumo: |
---|
The data analysis process includes information acquisition and organization in order
to obtain knowledge from them, bringing scientific advances in various fields, as well as
providing competitive advantages to corporations. In this context, an ubiquitous problem in
the area deserves attention, the missing data, since most of the data analysis techniques can
not deal satisfactorily with this problem, which negatively impacts the final results. In order
to avoid the harmful effects of missing data, several studies have been proposed in the areas
of statistical analysis and machine learning, especially the study of Multiple Data Imputation,
which consists in the missing data substitution by plausible values. This methodology
can be seen as a combinatorial optimization problem, where the goal is to find candidate
values to substitute the missing ones in order to reduce the bias imposed by this issue. Metaheuristics,
in particular, methods based in evolutionary computing have been successfully
applied in combinatorial optimization problems. Despite the recent advances in this area, it
is perceived some shortcomings in the modeling of imputation methods based on evolutionary
computing. Aiming to fill these gaps in the literature, this thesis presents a description of
multiple data imputation as a combinatorial optimization problem and proposes imputation
methods based on evolutionary computing. In addition, due to the limitations found in the
methods presented in the recent literature, and the necessity of adoption of different evaluation
measures to assess the imputation methods performance, a multi-objective genetic
algorithm for data imputation in pattern classification context is also proposed. This method
proves to be flexible regarding to data types and avoid the complete-case analysis. Because
the flexibility of the proposed approach, it is also possible to use it in other scenarios such
as the unsupervised learning, multi-label classification and time series analysis. |