Clustering and classifying text documents a revisit to tagging integration methods

Thumbnail Image
Date
2013
Authors
Cunha,E
Álvaro Figueira
Mealha,O
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this paper we analyze and discuss two methods that are based on the traditional k-means for document clustering and that feature integration of social tags in the process. The first one allows the integration of tags directly into a Vector Space Model, and the second one proposes the integration of tags in order to select the initial seeds. We created a predictive model for the impact of the tags' integration in both models, and compared the two methods using the traditional k-means++ and the novel k-C algorithm. To compare the results, we propose a new internal measure, allowing the computation of the cluster compactness. The experimental results indicate that the careful selection of seeds on the k-C algorithm present better results to those obtained with the k-means++, with and without integration of tags.
Description
Keywords
Citation