The Documental Repository is composed by a valious and extense variety of documents, including informational documents from all the 13 centres. The repository is organized in Communities and Collections: the first correspond to organic entities of INESC TEC and the latter comprise the outputs of each community organized by types of documents – articles in internacional journals, articles in internacional conferences, PhD Theses, among other documentation.
Browsing Documental Repository by Subject "Document Dynamics"
Sérgio Nunes; Cristina Ribeiro; Gabriel David
Documents on the World Wide Web are dynamic entities. Mainstream information retrieval systems and techniques are primarily focused on the latest version a document, gen- erally ignoring its evolution over time. In this work, we study the term frequency dynamics in web documents over their lifespan. We use the Wikipedia as a document collec- tion because it is a broad and public resource and, more im- portant, because it provides access to the complete revision history of each document. We investigate the progression of similarity values over two projection variables, namely re- vision order and revision date. Based on this investigation we find that term frequency in encyclopedic documents – i.e. comprehensive and focused on a single topic – exhibits a rapid and steady progression towards the document’s cur- rent version. The content in early versions quickly becomes very similar to the present version of the document.