Of course, there are also some obvious problems-looking for missing keys takes much longer, and the worst-case probe lengths are quite scary, even at 50% load factor.īut, we can do much better. The results are surprisingly good, considering the simplicity: Linear Probingįor existing keys, we’re already beating the separately chained tables, and with lower memory overhead! ![]() Lookup: starting from the key’s index, probe buckets in order until finding the key, reaching an empty non-tombstone bucket, or exhausting the table.Įrase: lookup the key and replace it with a tombstone. Insert: starting from the key’s index, place the key in the first empty or tombstone bucket. In the following diagrams, assume that each letter hashes to its alphabetical index. Next, let’s implement the simplest type of open addressing-linear probing. However, these results should not be considered canonical, since they depend on your standard library distribution (here, Microsoft’s). Using squirrel3 with std::unordered_map additionally makes insertions slightly slower and lookups slightly faster. Running these benchmarks on std::unordered_map produces roughly the same results as the 100% column, though with slower insertions and faster clears. However, we should not directly compare memory amplification against the open-addressed tables, as storing larger values would improve separate chaining’s ratio. Technically, the reported memory amplification in an underestimate, as it does not include heap allocator overhead. Insertions and erasures, on the other hand, are pretty slow-they involve allocation. These are respectable results, especially lookups at 100%: the CPU is able to hide most of the memory latency despite the double indirection. We won’t be able to directly compare these results against the open addressed tables, but we can get a sense of what we’re up against. To get our bearings, let’s first implement a simple separate chaining table and benchmark it at various load factors. ![]() Time to look up each element by following a Sattolo cycle 6 (2-4x).Several less interesting benchmarks were also run, all of which exhibited identical performance across equal-memory open-addressed tables and much worse performance for separate chaining. I make no claims regarding the quality or robustness of this hash function, but observe that it’s cheap, it produces the expected number of collisions in power-of-two tables, and it passes smhasher when applied bytewise. Given a hash function drawn from a universal family, inserting \(N\) keys into a table with \(K\) buckets results in an expected bucket size of \(\frac In separate chaining, a hash function is used to map each key to one of \(K\) buckets.Įach bucket holds a linked list, so to retrieve a key, one simply traverses its corresponding bucket. Most people first encounter hash tables implemented using separate chaining, a model simple to understand and analyze mathematically. ![]() When prioritizing deterministic performance over memory efficiency, two-way chaining is also a good choice.Ĭode for this article may be found on GitHub. Your default hash table should be open-addressed, using Robin Hood linear probing with backward-shift deletion.
0 Comments
Leave a Reply. |