Please use this identifier to cite or link to this item:
|Title:||Term frequency dynamics in collaborative articles|
|Keywords:||Document Dynamics;Term Frequency;Wikipedia|
|Abstract:||Documents on the World Wide Web are dynamic entities. Mainstream information retrieval systems and techniques are primarily focused on the latest version a document, gen- erally ignoring its evolution over time. In this work, we study the term frequency dynamics in web documents over their lifespan. We use the Wikipedia as a document collec- tion because it is a broad and public resource and, more im- portant, because it provides access to the complete revision history of each document. We investigate the progression of similarity values over two projection variables, namely re- vision order and revision date. Based on this investigation we find that term frequency in encyclopedic documents – i.e. comprehensive and focused on a single topic – exhibits a rapid and steady progression towards the document’s cur- rent version. The content in early versions quickly becomes very similar to the present version of the document.|
|Appears in Collections:||CSIG - Indexed Articles in Conferences|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.