/img alt="Imagem da capa" class="recordcover" src="""/>
Dissertação
Melhorando o desempenho da técnica de clusterização hierárquica single linkage utilizando a metaheurística GRASP
The problem of clustering (grouping) consists of, from a database, group the elements so that more queries are in the same cluster (group) and less similar elements are different clusters. There are several ways to accomplish these groupings. One of the most popular is the hierarchical, where a h...
Autor principal: | Ribeiro Filho, Napoleão Póvoa |
---|---|
Grau: | Dissertação |
Idioma: | pt_BR |
Publicado em: |
Universidade Federal do Tocantins
2018
|
Assuntos: | |
Acesso em linha: |
http://hdl.handle.net/11612/974 |
Resumo: |
---|
The problem of clustering (grouping) consists of, from a database, group the elements so
that more queries are in the same cluster (group) and less similar elements are different
clusters. There are several ways to accomplish these groupings. One of the most popular is
the hierarchical, where a hierarchical relationships between the elements is created. There
are several methods of analyzing the similarity between elements in the clustering problem.
The most common among them is the single linkage method, which brings together the
elements that are experiencing less apart. To apply the technique in question, distance
matrix is the input used. This grouping process generates the end an inverted tree known
as dendrogram. The cophenetic correlation coefficient (ccc), obtained after the construction
of the dendrogram is a measure used to evaluate the consistency of the clusters generated
and indicates how faithful he is in relation to the original data. Thus, a dendrogram gives
more consistent clusters when the ccc is closer to one (1). The clustering problem in all
its aspects, including hierarchical clustering (object of study in this work), belongs to
the class of NP-complete problems. Therefore, it is common to use heuristics for efficient
solutions to this problem. In order to generate dendrograms that result in better ccc, it is
proposed in this paper a new algorithm that uses the concepts of GRASP metaheuristic.
It is also objective of this work to implement such a solution in parallel computing in a
computer cluster, thus working with arrays larger. Tests were conducted to confirm the
performance of the proposed algorithm, comparing the results with those generated by
the software R. |