Dissertação

Otimização do processo de aprendizagem da estrutura gráfica de Redes Bayesianas em BigData

Automation at data management and analysis has been a crucial factor for companies which need efficient solutions in an each more competitive corporate world. The explosion of the volume information, which has remained increasing in recent years, has demanded more and more commitment to seek strateg...

ver descrição completa

Autor principal: FRANÇA, Arilene Santos de
Grau: Dissertação
Idioma: por
Publicado em: Universidade Federal do Pará 2014
Assuntos:
Acesso em linha: http://repositorio.ufpa.br/jspui/handle/2011/5608
Resumo:
Automation at data management and analysis has been a crucial factor for companies which need efficient solutions in an each more competitive corporate world. The explosion of the volume information, which has remained increasing in recent years, has demanded more and more commitment to seek strategies to manage and, especially, to extract valuable strategic informations from the use of data mining algorithms, which commonly need to perform exhausting queries at the database in order to obtain statistics that solve or optimize the parameters of the model of knowledge discovery selected; process which requires intensive computing to perform calculations and frequent access to the database. Given the effectiveness of uncertainty treatment, Bayesian networks have been widely used for this process, however, as the amount of data (records and/or attributes) increases, it becomes even more costly and time consuming to extract relevant information in a knowledge base. The goal of this work is to propose a new approach to optimization of the Bayesian Network structure learning in the context of BigData, by using the MapReduce process, in order to improve the processing time. To that end, it was generated a new methodology that includes the creation of an Intermediary Database, containing all the necessary probabilities to the calculations of the network structure. Through the analyzes presented at this work, it is shown that the combination of the proposed methodology with the MapReduce process is a good alternative to solve the scalability problem of the search frequency steps of K2 algorithm and, as a result, to reduce the response time generation of the network.