Health Twitter Big Bata Management with Hadoop Framework

Thumbnail Image
Date
2015
Authors
Cunha,J
Silva,C
Mário João Antunes
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Social media advancements and the rapid increase in volume and complexity of data generated by Internet services are becoming challenging not only technologically, but also in terms of application areas. Performance and availability of data processing are critical factors that need to be evaluated since conventional data processing mechanisms may not provide adequate support. Apache Hadoop with Mahout is a framework to storage and process data at large-scale, including different tools to distribute processing. It has been considered an effective tool currently used by both small and large businesses and corporations, like Google and Facebook, but also public and private healthcare institutions. Given its recent emergence and the increasing complexity of the associated technological issues, a variety of holistic framework solutions have been put forward for each specific application. In this work, we propose a generic functional architecture with Apache Hadoop framework and Mahout for handling, storing and analyzing big data that can be used in different scenarios. To demonstrate its value, we will show its features, advantages and applications on health Twitter data. We show that big health social data can generate important information, valuable both for common users and practitioners. Preliminary results of data analysis on Twitter health data using Apache Hadoop demonstrate the potential of the combination of these technologies. © 2015 The Authors. Published by Elsevier B.V.
Description
Keywords
Citation