c - Efficient hash function wrt. string concatenation? -
i'm experimenting info structures in toy language, , since lot of text processing i'd string operations fast. i'm storing strings immutable ropes cached lengths , hashes. goal hash function performed when rope's leaf node instantiated. there existing hash functions out there can this?
my rope construction looks this:
#define hashlen 128 #define hashmul 2 typedef unsigned long hashtype; typedef struct rope rope; // k&r2 hash function hashtype hash(const char *s) { hashtype h = 0; while (*s != '\0') { h = h * hashmul + *s++; } homecoming h; // won't modded until actual table access performed } struct rope { unsigned weight; hashtype hash; rope *left; // if null, indicates r->right char * void *right; // if r->left non-null, r->right rope * };
and concatenation function looks this:
// integer exponentiation--by squaring hashtype iexp(hashtype k, unsigned n); rope *newrope(const char *s) { rope *r = (rope *) malloc(sizeof(rope)); r->weight = strlen(s); r->hash = hash(s); r->left = null; r->right = (void *) s; homecoming r; } rope *ropecat(rope *left, rope *right) { rope *r = (rope *) malloc(sizeof(rope)); r->weight = left->weight + right->weight; r->hash = left->hash * iexp(hashmul, right->weight) + right->hash; r->left = left; r->right = (void *) right; homecoming r; }
a larger prime number improve hashmul
(k&r2 uses 31, i've heard 33 "best"), i'm concerned rapid growth of hash function long strings, might wrap around ulong_max
, break hash mechanism, chose number might wrap more smoothly. setting hashmul
2 lets left->hash * iexp(hashmul->hash, right->weight)
rewritten left->hash << right->weight
, , setting hashlen
128 or binary number lets me utilize r->hash & (hashlen-1)
instead of r->hash % hashlen
.
so that's outline of i'm trying do. i'm looking hash function that's fast, sort-of distributive in constant time, , has reasonable distribution. naive k&r2 hash function work fine here, or should utilize different hashmul
, or should hash function entirely?
c string performance data-structures hash
No comments:
Post a Comment