There is already a lot of work published, that details numerous established ways to implement hash tables. The WiKiPedia article on the broad class of hash-tables, is already a good starting point to become familiar with them.
But just to make this interesting, I undertook the mental exercise of designing a hash table, which when grown, does not require that the contents of the original hash table be rearranged in any way, but in which the expansion is achieved, by simply appending zeroes to the original hash table, and then changing one of the parameters, with which the new, bigger hash table is accessed.
My exercise led me to the premise, that each size of my hash table be a power of two, and it seems reasonable to start with 2^16 entries, aka 65536 entries, giving the hash table an initial Power Level (L1) of 16. There would be a pointer I name P to a hash-table position, and two basic Operations to arrive at P:
- The key can be hashed, by multiplying by the highest prime number smaller than 2^L1 – This would be the prime number 65521 – within the modulus of 2^L1, and yielding Position P0.
- The current Position can have added the number A, which is intentionally some power of two lower than L1, within the modulus of 2^L, successively yielding P1, P2, P3, etc…
Because the original multiplier is a prime number lower than 2^L1, and the latter, the initial modulus of the hash table, a power of two, consecutive key-values will only lead to a repetition in P0 after 2^L key-values. But, because actual keys are essentially random, a repetition of P0 is possible by chance, and the resolution of the resulting hash-table collision is the main subject in the design of any hash table.
By default, each position of the hash table contains a memory address, which is either NULL, meaning that the position is empty, or which points to an openly-addressable data-structure, from which the exact key can be retrieved again. When this retrieved exact key does not match the exact key used to perform the lookup, then Operation (2) above needs to be performed on P, and the lookup-attempt repeated.
But, because A is a power of 2 which also fits inside the modulus 2^L, we know that the values of P will repeat themselves exactly, and how small A is, will determine how many lookup-attempts can be undertaken, before P repeats, and this will also determine how many entries can be found at any one bucket. Effectively, if A = 2^14, then 2^L / A = 2^2, so that a series of 4 positions would form the maximum bucket-size. If none of those entries is NULL, the bucket is full, and if an attempt is made to insert a new entry to a full bucket, the hash-table must be expanded.
Finding the value for a key will predictably require, that all the positions following from one value of P0 be read, even if some of them were NULL, until an iteration of P reveals the original key.
It is a trivial fact that eventually, some of the positions in this series will have been written to by other buckets, because keys will be random again, thus leading to (their own values of P) which coincide with (current nA + P0) . But, because the exact key will be retrieved from non-NULL addresses before those are taken to have arisen from the key being searched for, those positions will merely reflect a performance-loss, not an accuracy-loss.
Because it is only legal, for 1 key to lead to a maximum of 1 value, as is the case with all hash tables, before an existing key can be set to a new value, the old key must be found and be deleted. In order for this type of hash table to enforce that policy, its own operation to insert a key would need to be preceded by an operation to retrieve it, if only to return the appropriate error-code upon a success in retrieving it. During this precursory scan, the existence and position of the first NULL address can also be stored, so that the actual insertion of the new key can take place in one step.
Any defined operation to delete a key would follow according to the same logic – It must be found first, and if not found, cause the appropriate error-code to be returned. Any hash table with more than one entry for the same key is corrupted, but can conceivably be cleaned, if the program that uses it sends multiple commands to delete the same key, or one command to purge it…
Growing the Hash Table