Dissertação

Imputação de dados baseado em otimização por enxame de partículas considerando os principais mecanismos de ausência de dados

During the knowledge discovery in database process some problems may be found, e.g. some instance of one attribute may be missing. Such issue can even cause harmful effects to the final results of the process, since directly affects the data quality of a database which some machine learning algor...

ver descrição completa

Autor principal: DIAS, Lilian de Jesus Chaves
Grau: Dissertação
Idioma: por
Publicado em: Universidade Federal do Pará 2014
Assuntos:
Acesso em linha: http://repositorio.ufpa.br/jspui/handle/2011/4617
Resumo:
During the knowledge discovery in database process some problems may be found, e.g. some instance of one attribute may be missing. Such issue can even cause harmful effects to the final results of the process, since directly affects the data quality of a database which some machine learning algorithm may be applied to. In the literature are some proposals to solve such harm; among them is the data imputation process that estimates a plausible value to fill in the missing one. Inside the area of missing value treatment, some researches were analyzed and observations were raised such as, a few utilization of synthetic datasets that simulates the main mechanisms of missingness and a tendency to use bioinspired algorithm to treat the missing values. From this scenario, the present dissertation analyses an imputation method based on particle swarm optimization, an underexplored one, and applies it to the treatment of synthetics datasets generated considering the main mechanisms of missingness, MAR, MCAR and NMAR. The results obtained when comparing the algorithm against different configurations of itself and another two treatments known in the area (KNNImpute and SVMImpute) are promising for its use as missing value treatment whereas the bioinspired method reached the bests values for the major of the experiments.