A Case for Partitioned Bloom Filters

Paulo Sérgio Almeida

A Case for Partitioned Bloom Filters

dc.contributor.author	Paulo Sérgio Almeida	en
dc.contributor.other	5607	en
dc.date.accessioned	2024-02-05T16:38:04Z
dc.date.available	2024-02-05T16:38:04Z
dc.date.issued	2023	en
dc.description.abstract	In a partitioned Bloom Filter (PBF) the bit vector is split into disjoint parts, one per hash function. Contrary to hardware designs, where they prevail, software implementations mostly ignore PBFs, considering them worse than standard Bloom filters (SBF), due to the slightly larger false positive rate (FPR). In this paper, by performing an in-depth analysis, first we show that the FPR advantage of SBFs is smaller than thought; more importantly, by deriving the per-element FPR, we show that SBFs have weak spots in the domain: elements that test as false positives much more frequently than expected. This is relevant in scenarios where an element is tested against many filters. Moreover, SBFs are prone to exhibit extremely weak spots if naive double hashing is used, something occurring in mainstream libraries. PBFs exhibit a uniform distribution of the FPR over the domain, with no weak spots, even using naive double hashing. Finally, we survey scenarios beyond set membership testing, identifying many advantages of having disjoint parts, in designs using SIMD techniques, for filter size reduction, test of set disjointness, and duplicate detection in streams. PBFs are better, and should replace SBFs, in general purpose libraries and as the base for novel designs.	en
dc.identifier	P-00Y-K4H	en
dc.identifier.uri	https://repositorio.inesctec.pt/handle/123456789/14818
dc.language	eng	en
dc.rights	info:eu-repo/semantics/openAccess	en
dc.title	A Case for Partitioned Bloom Filters	en
dc.type		en
dc.type	Publication	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: P-00Y-K4H.pdf
Size:: 630.24 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

HASLab - Indexed Articles in Journals