Efficient Deduplication in a Distributed Primary Storage Infrastructure

dc.contributor.author João Tiago Paulo en
dc.contributor.author José Orlando Pereira en
dc.date.accessioned 2017-12-18T16:18:18Z
dc.date.available 2017-12-18T16:18:18Z
dc.date.issued 2016 en
dc.description.abstract A large amount of duplicate data typically exists across volumes of virtual machines in cloud computing infrastructures. Deduplication allows reclaiming these duplicates while improving the cost-effectiveness of large-scale multitenant infrastructures. However, traditional archival and backup deduplication systems impose prohibitive storage overhead for virtual machines hosting latency-sensitive applications. Primary deduplication systems reduce such penalty but rely on special cluster filesystems, centralized components, or restrictive workload assumptions. Also, some of these systems reduce storage overhead by confining deduplication to off-peak periods that may be scarce in a cloud environment. We present DEDIS, a dependable and fully decentralized system that performs cluster-wide off-line deduplication of virtual machines' primary volumes. DEDIS works on top of any unsophisticated storage backend, centralized or distributed, as long as it exports a basic shared block device interface. Also, DEDIS does not rely on data locality assumptions and incorporates novel optimizations for reducing deduplication overhead and increasing its reliability. The evaluation of an open-source prototype shows that minimal I/O overhead is achievable even when deduplication and intensive storage I/O are executed simultaneously. Also, our design scales out and allows collocating DEDIS components and virtual machines in the same servers, thus, sparing the need of additional hardware. en
dc.identifier.uri http://repositorio.inesctec.pt/handle/123456789/4218
dc.identifier.uri http://dx.doi.org/10.1145/2876509 en
dc.language eng en
dc.relation 5621 en
dc.relation 5602 en
dc.rights info:eu-repo/semantics/openAccess en
dc.title Efficient Deduplication in a Distributed Primary Storage Infrastructure en
dc.type article en
dc.type Publication en
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
P-00K-HDX.pdf
Size:
925.26 KB
Format:
Adobe Portable Document Format
Description: