Artigo

Uma metodologia em cascata de quatro etapas para classificar códigos NCM usando técnicas de PLN

This work aims to develop a process to classify the descriptions of products present in electronic invoices (NF-e). This classification is based on the 8 digits of the Common Mercosur Nomenclature (NCM), separated into 4 parts, Chapter, Position, Subheading and item/Subitem. The classification was p...

ver descrição completa

Autor principal: PINHEIRO, Pedro Luiz Braga
Grau: Artigo
Publicado em: 2023
Assuntos:
Acesso em linha: https://bdm.ufpa.br:8443/jspui/handle/prefix/5010
Resumo:
This work aims to develop a process to classify the descriptions of products present in electronic invoices (NF-e). This classification is based on the 8 digits of the Common Mercosur Nomenclature (NCM), separated into 4 parts, Chapter, Position, Subheading and item/Subitem. The classification was performed using the Support Vector Machine (SVM) algorithm and the Naıve Bayess algorithm together with Natural Language Processing (NLP) techniques, for processing a database of 340,000 different products. The data were divided into 80% training and 20% testing and an accuracy of 90% was obtained for a total of 98 classes.