Please use this identifier to cite or link to this item: http://repositorio.inesctec.pt/handle/123456789/5331
Title: Sampling massive streaming call graphs
Authors: Shazia Tabassum
João Gama
Issue Date: 2016
Abstract: The problem of analyzing massive graph streams in real time is growing along with the size of streams. Sampling techniques have been used to analyze these streams in real time. However, it is difficult to answer questions like, which structures are well preserved by the sampling techniques over the evolution of streams? Which sampling techniques yield proper estimates for directed and weighted graphs? Which techniques have least time complexity etc? In this work, we have answered the above questions by comparing and analyzing the evolutionary samples of such graph streams. We have evaluated sequential sampling techniques by comparing the structural metrics from their samples. We have also presented a biased version of reservoir sampling, which shows better comparative results in our scenario. We have carried out rigorous experiments over a massive stream of 3 hundred million calls made by 11 million anonymous subscribers over 31 days. We evaluated node based and edge based methods of sampling. We have compared the samples generated by using sequential algorithms like, space saving algorithm for finding topK items, reservoir sampling, and a biased version of reservoir sampling. Our overall results and observations show that edge based samples perform well in our scenario. We have also compared the distribution of degrees and biases of evolutionary samples. © 2016 ACM.
URI: http://repositorio.inesctec.pt/handle/123456789/5331
http://dx.doi.org/10.1145/2851613.2851654
metadata.dc.type: conferenceObject
Publication
Appears in Collections:LIAAD - Articles in International Conferences

Files in This Item:
File Description SizeFormat 
P-00K-H7X.pdf704.66 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.