Please use this identifier to cite or link to this item:
http://repositorio.inesctec.pt/handle/123456789/5835
Title: | Clustering and classifying text documents a revisit to tagging integration methods |
Authors: | Cunha,E Álvaro Figueira Mealha,O |
Issue Date: | 2013 |
Abstract: | In this paper we analyze and discuss two methods that are based on the traditional k-means for document clustering and that feature integration of social tags in the process. The first one allows the integration of tags directly into a Vector Space Model, and the second one proposes the integration of tags in order to select the initial seeds. We created a predictive model for the impact of the tags' integration in both models, and compared the two methods using the traditional k-means++ and the novel k-C algorithm. To compare the results, we propose a new internal measure, allowing the computation of the cluster compactness. The experimental results indicate that the careful selection of seeds on the k-C algorithm present better results to those obtained with the k-means++, with and without integration of tags. |
URI: | http://repositorio.inesctec.pt/handle/123456789/5835 http://dx.doi.org/10.5220/0004545201600168 |
metadata.dc.type: | conferenceObject Publication |
Appears in Collections: | CRACS - Articles in International Conferences |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
P-008-GVR.pdf | 901.58 kB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.