
Mineração de dados educacionais: um estudo sobre os dados socioeconômicos na educação na base de dados do INEP

This work investigates the profiles of the third year students of the Brazilian high school of the public and private school network, in order to identify which extracurricular and interscholastic factors influence the student to have a good performance in the National High School Examination (ENEM)...

ver descrição completa

Autor principal: SANTOS, Aurea Milene Teixeira Barbosa dos
Grau: Dissertação
Idioma: por
Publicado em: Universidade Federal do Pará 2019
Acesso em linha: http://repositorio.ufpa.br/jspui/handle/2011/11265
This work investigates the profiles of the third year students of the Brazilian high school of the public and private school network, in order to identify which extracurricular and interscholastic factors influence the student to have a good performance in the National High School Examination (ENEM). In this way, two case studies were carried out, one with the socioeconomic records containing tens of thousands of samples from the students who took the exam, divided by each Brazilian region, making possible an analysis of the influential socioeconomic (extra-school) factors in each region. And the other study analyzed the attributes related to the conditions of school infrastructure offered by public (state) high schools in the state of Pará, for this study was related each note that the student obtained in the examination of enem with the base of the school census, that is, this database details the conditions of the secondary schools corresponding to each student who participated in the test of the enem both 2016. In order to reach the proposed objective, the two case studies were submitted to the process of Knowledge Discovery in Database, the educational data mining (EDM). In the MDE process, the main component analysis (PCA) technique was used in the preprocessing stage, in order to reduce the number of variables without losing the information provided by the total set, using this technique it was possible to decrease from 43 to 22 the number of variables analyzed in case study one, and 39 to 9 in the second case study, with a percentage of 0.8226% and 0.9099% respectively. This technique was used to propitiate the execution of another technique applied in the research, the Bayesian Networks, being used in the data mining stage, the choice for this technique was made possible by it to reason about uncertainties, especially in causes and effects having as presupposition the relationship of the variables and their probabilities of occurrences. Another inherent aspect is its structure, which concerns the comprehensibility of representation and results, which generate subsidies aimed at allowing specialists and users in the field to carry out more in-depth analysis on the subject treated by the data. The results showed the success of this methodology and the techniques used, the research allowed us to have a national analysis of the students of the third year of high school in Brazil, where no study performs this analysis at the Brazilian level when dealing with enem data. Strong influences of socioeconomic variables were pointed out highlighting as direct influential factors in student performance the difference if he studied in public, private or federal schools. Allied to this variable is the question of family income, if the student left or failed in elementary school, if he has access to the computer and internet in his residence and the shift in which he studied in high school, from these variables it was possible to perform inferences and analyze the probabilistic behavior of the grades obtained by the student with each one of these variables. When analyzing the influence of the school structure on the performance of the Paraense student of the public school, the variables library and science laboratory were highlighted. When analyzing only the state of Pará, it was verified that more than 80% of the students in the public network performed poorly, taking notes equal to or less than 450 in the enem, even though in their school the two variables were considered as influential.