HASLab - Indexed Articles in Journals
Permanent URI for this collection
Browse
Browsing HASLab - Indexed Articles in Journals by Author "5607"
Results Per Page
Sort Options
-
ItemApproaches to Conflict-free Replicated Data Types( 2025) Paulo Sérgio Almeida ; 5607Conflict-free Replicated Data Types (CRDTs) allow optimistic replication in a principled way. Different replicas can proceed independently, being available even under network partitions and always converging deterministically: Replicas that have received the same updates will have equivalent state, even if received in different orders. After a historical tour of the evolution from sequential data types to CRDTs, we present in detail the two main approaches to CRDTs, operation-based and state-based, including two important variations, the pure operation-based and the delta-state based. Intended for prospective CRDT researchers and designers, this article provides solid coverage of the essential concepts, clarifying some misconceptions that frequently occur, but also presents some novel insights gained from considerable experience in designing both specific CRDTs and approaches to CRDTs.
-
ItemApproaches to Conflict-free Replicated Data Types( 2025) Paulo Sérgio Almeida ; 5607Conflict-free Replicated Data Types (CRDTs) allow optimistic replication in a principled way. Different replicas can proceed independently, being available even under network partitions and always converging deterministically: Replicas that have received the same updates will have equivalent state, even if received in different orders. After a historical tour of the evolution from sequential data types to CRDTs, we present in detail the two main approaches to CRDTs, operation-based and state-based, including two important variations, the pure operation-based and the delta-state based. Intended for prospective CRDT researchers and designers, this article provides solid coverage of the essential concepts, clarifying some misconceptions that frequently occur, but also presents some novel insights gained from considerable experience in designing both specific CRDTs and approaches to CRDTs.
-
ItemA Case for Partitioned Bloom Filters( 2023) Paulo Sérgio Almeida ; 5607In a partitioned Bloom Filter (PBF) the bit vector is split into disjoint parts, one per hash function. Contrary to hardware designs, where they prevail, software implementations mostly ignore PBFs, considering them worse than standard Bloom filters (SBF), due to the slightly larger false positive rate (FPR). In this paper, by performing an in-depth analysis, first we show that the FPR advantage of SBFs is smaller than thought; more importantly, by deriving the per-element FPR, we show that SBFs have weak spots in the domain: elements that test as false positives much more frequently than expected. This is relevant in scenarios where an element is tested against many filters. Moreover, SBFs are prone to exhibit extremely weak spots if naive double hashing is used, something occurring in mainstream libraries. PBFs exhibit a uniform distribution of the FPR over the domain, with no weak spots, even using naive double hashing. Finally, we survey scenarios beyond set membership testing, identifying many advantages of having disjoint parts, in designs using SIMD techniques, for filter size reduction, test of set disjointness, and duplicate detection in streams. PBFs are better, and should replace SBFs, in general purpose libraries and as the base for novel designs.
-
ItemA Case for Partitioned Bloom Filters( 2022) Paulo Sérgio Almeida ; 5607
-
ItemA Case for Partitioned Bloom Filters( 2023) Paulo Sérgio Almeida ; 5607In a partitioned Bloom Filter (PBF) the bit vector is split into disjoint parts, one per hash function. Contrary to hardware designs, where they prevail, software implementations mostly ignore PBFs, considering them worse than standard Bloom filters (SBF), due to the slightly larger false positive rate (FPR). In this paper, by performing an in-depth analysis, first we show that the FPR advantage of SBFs is smaller than thought; more importantly, by deriving the per-element FPR, we show that SBFs have weak spots in the domain: elements that test as false positives much more frequently than expected. This is relevant in scenarios where an element is tested against many filters. Moreover, SBFs are prone to exhibit extremely weak spots if naive double hashing is used, something occurring in mainstream libraries. PBFs exhibit a uniform distribution of the FPR over the domain, with no weak spots, even using naive double hashing. Finally, we survey scenarios beyond set membership testing, identifying many advantages of having disjoint parts, in designs using SIMD techniques, for filter size reduction, test of set disjointness, and duplicate detection in streams. PBFs are better, and should replace SBFs, in general purpose libraries and as the base for novel designs.
-
ItemDelta State Replicated Data Types( 2018) Paulo Sérgio Almeida ; Ali Shoker ; Carlos Baquero ; 5607 ; 6172 ; 5596