Please use this identifier to cite or link to this item:
Title: Term frequency dynamics in collaborative articles
Authors: Sérgio Nunes
Cristina Ribeiro
Gabriel David
Keywords: Document Dynamics;Term Frequency;Wikipedia
Issue Date: 2010
Abstract: Documents on the World Wide Web are dynamic entities. Mainstream information retrieval systems and techniques are primarily focused on the latest version a document, gen- erally ignoring its evolution over time. In this work, we study the term frequency dynamics in web documents over their lifespan. We use the Wikipedia as a document collec- tion because it is a broad and public resource and, more im- portant, because it provides access to the complete revision history of each document. We investigate the progression of similarity values over two projection variables, namely re- vision order and revision date. Based on this investigation we find that term frequency in encyclopedic documents – i.e. comprehensive and focused on a single topic – exhibits a rapid and steady progression towards the document’s cur- rent version. The content in early versions quickly becomes very similar to the present version of the document.
metadata.dc.type: conferenceObject
Appears in Collections:CSIG - Indexed Articles in Conferences

Files in This Item:
File Description SizeFormat 
PS-06901.pdf362.03 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.