Tese

Ciência de dados e aprendizado de máquina aplicados ao estudo de variáveis epidemiológica hanseníase na Amazônia

Leprosy is a significant public health problem that largely affects low-income populations. Although the World Health Organization (WHO) establishes guidelines for diagnosis, prevention, and treatment, disease detection faces limitations, often resulting in late or inaccurate diagnoses and leading...

ver descrição completa

Autor principal: FALCÃO, Igor Wenner Silva
Grau: Tese
Idioma: por
Publicado em: Universidade Federal do Pará 2025
Assuntos:
Acesso em linha: https://repositorio.ufpa.br/jspui/handle/2011/16790
Resumo:
Leprosy is a significant public health problem that largely affects low-income populations. Although the World Health Organization (WHO) establishes guidelines for diagnosis, prevention, and treatment, disease detection faces limitations, often resulting in late or inaccurate diagnoses and leading to serious neurological complications and multidrug-resistant cases. Therefore, early diagnosis is essential to reduce the burden of this disease. Machine learning has been widely used in several areas of science and industry, but especially in health, where it plays an essential role in the analysis and treatment of large volumes of data. In this sense, this thesis investigates the application of a model based on Data Science and Machine Learning to act in the specification of the clinical profile of possible leprosy cases in the Amazon Region and, thus, to be able to act preventively in the early diagnosis and treatment of patients under medical followup. The work takes into account clinical data of patients from a non-public dataset, collected between 2015 and 2020 in the North region of Brazil. Therefore, this thesis proposes a learning model to identify groups clinically affected by the disease using Clustering and Random Forest techniques. In the results obtained, the proposed model demonstrated efficiency in evaluating the probability of individuals being ill, achieving an accuracy of 90.39% in the performance evaluation and identifying a probability of 83.46% of an individual being ill, considering a set of epidemiological and non-generic variables. This approach offers a promising vision for the future of health, allowing the formulation of effective strategies for the early identification of possible cases.