HASHTABLE(3) HASHTABLE(3) NAME hashtable - keyed sets and maps SYNOPSIS #include "hashtable.h" #include "hashtable.c" struct hashtable *hashtable_new(unsigned size); void hashtable_delete(struct hashtable *table); struct hashtable *hashtable_initialise(struct hashtable *table); struct hashtable *hashtable_release(struct hashtable *table); struct hashtable_entry *hashtable_insert(struct hashtable *table, char *key, unsigned length); struct hashtable_entry *hashtable_find(struct hashtable *table, char *key); DESCRIPTION These functions provide hash tables that use the method of coalescing lists [1]. They are particularly suitable for building symbol tables. The hashtable_new() function allocates and returns a pointer to an empty hash table structure that can contain size entries. When the structure is no longer needed, the hashtable_delete() function can be used to destroy it. Programs that want to allocate a hash table struc- ture in automatic or static storage can use hashtable_initialise() to prepare it for use and subsequently hashtable_release() to reclaim any resources it has used. (The functions hashtable_new() and hashtable_delete() perform this initialisation and reclamation implic- itly.) The hashtable_insert() function creates a new entry in table with the given key, growing the table to make room if necessary. It does not check if an entry with the key already exists in the table. The hashtable_find() function searches table for an entry with the given key and returns a pointer to it, or null if no such entry exists. The length argument specifies the number of bytes in the key. The default type of key is char * and the length should not include any terminaing nul character. Two data structures are declared by the header file. Several members of these structures are potentially useful to the client, as follows. struct hashtable_entry { char *hashtable_entry_key; unsigned hashtable_entry_length; }; struct hashtable { unsigned int hashtable_capacity; struct hashtable_entry **hashtable_entries; }; The hashtable_entry_key and hashtable_entry_length are copies of those provided to the hashtable_insert function at the time when the entry was created. The hashtable_capacity gives the size of (one more than the maximum valid index into) hashtable_entries. Each element of hashtable_entries is either null or a pointer to a hashtable_entry structure. The client must never modify the contents of these structures directly, with the exception of any additional members that were added to them explicitly (see the next section). TYPICAL USE AND BASIC CUSTOMISATION Programs that wish only to manage a set of unique strings can just include hashtable.h to declare the API, and add hashtable.c to the list of source files. The types and functions described in this manual will then be available. Simple customisation is possible by defining preprocessor macros to change default types and names. HASHTABLE_NAME The tag given to the hashtable structure, and the prefix prepended to the hash table functions. The default is 'hash- able'. HASHTABLE_ENTRY_NAME The tag given to the hashtable entry structure. The default is 'hashable_entry'. HASHTABLE_MEMBERS Additional members to be added to the hashtable structure. They are placed at the beginning of the structure and initialised to zero. HASHTABLE_ENTRY_MEMBERS Additional members to be added to the hashtable entry structure. They are placed at the beginning of the structure and will be initialised to zero. These symbols affect the declarations made in hashtable.h and the implementations of associated functions in hashtable.c. The same set of definitions must be in effect when including the header file and compiling the source file. The easiest way to ensure this is to include hashtable.c in the program after the necessary definitions. For example, the following complete program tallies the number of occurences of each unique word passed on the command line. #define HASHTABLE_NAME SymbolTable #define HASHTABLE_ENTRY_NAME Symbol #define HASHTABLE_ENTRY_MEMBERS int tally; #include "hashtable.c" #include <stdio.h> struct Symbol *lookup(struct SymbolTable *t, char *key) { unsigned int len= strlen(key); struct Symbol *found= SymbolTable_find(t, key, len); if (!found) found= SymbolTable_insert(t, key, len); return found; } int main(int argc, char **argv) { unsigned int i; struct SymbolTable *symbols= SymbolTable_new(argc * 2); while (--argc) { ++argv; struct Symbol *symbol= lookup(symbols, *argv); symbol->tally++; } for (i= 0; i < symbols->hashtable_capacity; ++i) { struct Symbol *s; if ((s= symbols->hashtable_entries[i])) printf("%7i %s\n", s->tally, s->hashtable_entry_key); } SymbolTable_delete(); return 0; } ADVANCED CUSTOMISATION The following preprocessor symbols affect only the implementation in hashtable.c, unless otherwise noted. HASHTABLE_KEY_T The type of a key. The default type is 'char *'. If the key type is not 'pointer to integer' then appropriate copy, compare and hash functions should also be specified (see below). Note that this symbol affects both hashtable.c and hashtable.h. HASHTABLE_KEY_COPY The function used to copy a key. The default is value is 'strdup'. Note that the implemenation will reclaim the storage for copied keys by calling free(), so the copy function should use malloc() (or one of its cousins) to allocate memory for copied keys. HASHTABLE_KEY_COMPARE The function used to compare two keys. This function should return non-zero if the two keys are equal. The default value is '!strcmp'. HASHTABLE_FRACTION Controls when a table grows to make room for new keys. It is specified as a fraction of the total capacity of the table. When this fractional capacity is exceeded, the table is grown by HASHTABLE_FACTOR (see below). The default value is '1/2'. Note that the fractional value must not be enclosed in parentheses, otherwise the compiler will round down to zero and the table will grow every time a key is inserted. HASHTABLE_FACTOR The factor by which the capacity of the table is increased every time it is grown. The default value is '3'. HASHTABLE_HASH_FUNCTION The name of the function that generates a numeric hash for a key. The signature of this function must be unsigned int HASHTABLE_HASH_FUNCTION ( HASHTABLE_KEY_T key, unsigned int length ) and it must always return the same value when given equivalent keys. The symbols that affect hashtable.h are undefined automatically at the end of that file. In the unlikely event they are useful beyond that file, defining HASHTABLE_KEEP_DEFINITIONS will preserve them. All sym- bols affecting either hashtable.h or hashtable.c are unconditionally undefined at the end of hashtable.c. This permits the C file to be included more than once in a parent source file, to create implementa- tions for multiple table types. RETURN VALUE The hashtable_new() function returns a pointer to the newly-allocated hash table, or null if memory could not be allocated. The hashtable_initiliase(), and hashtable_release() functions return table. The hashtable_insert() function returns a pointer to the newly-created table entry, or null if required memory cannot be allocated. The hashtable_find() function returns a pointer to the existing table entry corresponding to key, or null if the key does not exist in the table. CAVEATS AND BUGS There is no way to remove a key from a table. You should not feel upset about this. The implementation trusts the client not to free any of the pointers returned by hashable_insert() and hashable_find(). The contents of the table may be rearranged during insertion. Pointers returned from hashable_insert() and hashable_find() are therefore invalid after (and iteration is unstable over) insertions. SEE ALSO hcreate(3) [1] Donald Knuth, The Art of Computer Programming, Algorithm 6.4C. The latest version of this software and documentation: http://piumarta.com/software/hashtable AUTHOR The software and this manual page were written by Ian Piumarta (first- name at last-name dot com). Please send bug reports and suggestions for improvements to the author at the above address. HASHTABLE(3)