Cuckoo hashing is named after a type of bird which lays its eggs in other birds’ nests, and whose chicks push out the other eggs from the nest after they hatch. Cuckoo hashing uses an array of size n, allowing up to n items to be stored in the hash table. For simplicity, we only consider storing keys in the hash table and ignore any associated data.
We first describe the insert operation, which is the most complex. Each key k can be stored into one of t > 1 locations in the table, where t is generally a small number such as 2. These locations are given by h1(k), h2(k), ..., ht(k), where h1, h2, ..., ht are fixed hash functions. To insert k, we try to store it in location h1(k). If h1(k) was previously empty, then we are done. However, if there was a key k1 previously stored in h1(k), then we need to evict k1 and store it elsewhere in the table. In particular, suppose h1(k) = hi(k1); that is, location h1(k) is the i'th location where k1 can be stored. Then we store k1 in the next possible location hi+1(k1); if i + 1 > t, we wrap around and store k1 at location h1(k1). Then, similar to before, we check for the new location for k1 was previously empty. If so, we are done. Otherwise, if the location was previously occupied by a key k2, and if this location is the i2'th possible location for k2, i.e. hi2(k2), we evict k2 and store it at its next possible location hi2+1(k2) (wrapping around if necessary). Storing k2 may cause another eviction, requiring us to repeat the previous procedure.
Eventually, the insertion operation either terminates with an insertion into an empty location, or leads to a long chain of evictions. Worse still, the chain can sometimes cycle back on itself and cause an infinite loop. In order to avoid overly long eviction chains, we place an upper bound M on (log n) on the length of a chain. If an insertion causes more than M evictions, we declare a failure using the current hash functions h1, ..., ht. We then restart the insertion process by picking a new set of hash functions of h1', ..., ht', and trying to insert all the keys using the new functions. That is, we rebuild the entire hash table using a new set of hash functions. We repeat this process until all the keys are successfully inserted using some set of hash functions. An illustration of the insertion process can be seen at the Wikipedia page Cuckoo hashing.
please make sure CUDA and OpenMP environment is properly set. This software is ordinary.
make
The script is set.
./task1
./task2
./task3
./task4
will execute the following tasks respectively:
-
Insertsion Test Create a hash table of size 2^25 in GPU global memory, where each table entry stores a 32-bit integer. Insert a set of 2^s random integer keys into the hash table, for s = 10,11, ... 24.
-
Lookup Test Insert a set S of 2^24 random keys into a hash table of size 2^25, then perform lookups for the following sets of keys S0, ..., S10. Each set Si should contain 2^24 keys, where (100 - 10i) percent of the keys are randomly chosen from S, and the remainder are random 32-bit keys. For example, S0 should contain only random keys from S, while S5 should 50% random keys from S, and 50% completely random keys.
-
Capacity Test Fix a set of n = 2^24 random keys, and measure the time to insert the keys into hash tables of sizes 1.1n, 1.2n, ..., 2n. Also, measure the insertion times for hash tables of sizes 1.01n, 1.02n and 1.05n. Terminate the experiment if it takes too long and report the time used.
-
Eviction Test Using n = 2^24 random keys and a hash table of size 1.4n, experiment with different bounds on the maximum length of an eviction chain before restarting. Which bound gives the best running time for constructing the hash table? Note however you are not required to find the optimal bound.
I run the benchmark on OS: Ubuntu 20.04.4 LTS x8664, with Intel Xeon Gold 6342 (96t/48c) @ 3.500GHz CPU. I ran my benchmark on NVIDIA RTX 3090.
mops means Million Operations per Second. One operation means one insertion or lookup.
Fig 1. Insertsion test when t=2 (t is the number of hash function used)
Fig 2. Insertsion test when t=3 (t is the number of hash function used)
Fig 3. Lookup test when t=2 (t is the number of hash function used) Red point is invalid because the table wasn't constructed successfully.
Fig 4. Lookup test when t=3 (t is the number of hash function used) Red point is invalid because the table wasn't constructed successfully.
Fig 5. Capacity test when t=3 (t is the number of hash function used).
Fig 6. Capacity test when t=3 (t is the number of hash function used).
Fig 7. Capacity test when t=3 (t is the number of hash function used). The result t=2 is meaningless because all is invalid accroding to the lookup tests.






