Trabalho de Conclusão de Curso

Data Cleaner Service - Web Service para apoiar na limpeza de dados

The large increase in the amount of data and information recently is notorious. Wrong data generates wrong information, which can significantly impact the management of organizations and people's lives. Organizations need to obtain reliable and timely information for effective business decision-m...

ver descrição completa

Autor principal: Picanço, Gabriel Rodrigues
Grau: Trabalho de Conclusão de Curso
Idioma: por
Publicado em: Brasil 2023
Assuntos:
Acesso em linha: http://repositorio.ifam.edu.br/jspui/handle/4321/1355
Resumo:
The large increase in the amount of data and information recently is notorious. Wrong data generates wrong information, which can significantly impact the management of organizations and people's lives. Organizations need to obtain reliable and timely information for effective business decision-making and, for that, it is important to invest in solutions that ensure data quality. Data Cleaning techniques make it possible to identify and correct non-compliant values based on defined rules and actions. These techniques are widely used in data preparation activities in BI (Business Intelligence) and Data Science processes to contribute to the quality of results. There are several tools on the market for data cleaning (e.g., Excel, OpenRefine and Data Wrangler), which have specific characteristics and functions, and normally allow the creation of programming scripts to carry out cleaning tasks. These scripts can take a lot of time and effort to write, and can be difficult to reuse across different tools. A solution based on a web service (Data Cleaner Service) capable of integrating with tools (e.g., Web Applications) and cleaning data through the reuse of scripts (e.g., Python) was developed. To demonstrate the solution, applications were developed using scripts with the PANDAS tool and a Web application that consumes the service. With this solution, it is expected to make a positive contribution in carrying out data cleaning tasks in several areas (e.g., finance, sales and health), reducing effort and time in these activities, promoting the exchange of experience between users and developers, and impacting the generation of effective information for decision-making.