Module src/hash/hash-generate-simple.c

vim:sw=4:sts=4 Tool to generate a memory efficient hash table for a given input set. (C) by Wolfgang Oertl 2005, 2008 Terminology: - hash table: a data set that maps keys to data. - entry: a key/data pair - hash value: a 32 bit integer computed from a hash key. - hash size: the number of buckets. - bucket: consists of zero or more entries. - collision: multiple key/data pairs in one bucket. Limitations: - The max. size of a value is 2^12-1 bytes (4095 bytes) - The combined size of all values is limited to 2^20 bytes (1 MB) - keys cannot contain NUL bytes (this could be fixed easily; data can have NUL bytes). Version history: 2005-07-18 first version 2005-07-19 better command line parsing, many options, help. 2008-03-03 more comments 2008-08-27 reworked into a Lua library component. No options, which were unused anyway, instead use sensible defaults. 2008-09-12 new bucket layout and chaining mechanism. Hash algorithm for inserts: - compute the hash value from the key - calculate the bucket number from the hash value - store the entry into the bucket (linked list) - when done, use empty buckets as overflows - output the data in the bucket order - output the buckets Algorithm for lookup: - compute the hash value from the key - calculate the bucket number from the hash value - get the bucket contents. If it is an overflow bucket -> not found - compare the hash value (part). If equal -> found - while a "next overflow bucket" is set, go there and repeat at prev step - no more overflow buckets: not found. Layout of a bucket (64 bits in total): Bits Content --------------------------------------- 1 set if this is an overflow bucket 15 high bits of the hash value 16 next overflow bucket (0=none; 1-based) 12 data length 20 data offset

Functions

local _bits_required (value) Compute how many bits are required to store the given value.
local _build_hash_table (L, ifile) Build the hash table with the data from the input file.
local _count_lines (f) Determine the number of lines in the file.
local _distribute (L) A hash table has been created.
local _output_buckets (L, ofile) Output the array of hash buckets, each one with a hash value, data offset, and an overflow bucket number.
local _output_values (L, ofile) For each bucket, output the value, and store the offsets into the buckets.
local _preprocess_line (s, key_len, data_ptr, data_len, line) Each line consists of a key and a data string, which are comma separated.
local _show_statistics () Show some statistics
generate_hash_simple (L, datafile_name, prefix1, ofname) Generate a hash table to map the key,value pairs found in the specified datafile, and write a compileable C file to the output file.
local special_strlen (s) This is a sort of strlen, but detects \ooo escape sequences and counts them as just one byte.


Functions

local _bits_required (value)
Compute how many bits are required to store the given value.

Parameters

  • value:
In file: src/hash/hash-generate-simple.c line 175
local _build_hash_table (L, ifile)
Build the hash table with the data from the input file.

Parameters

  • L:
  • ifile:
In file: src/hash/hash-generate-simple.c line 191
local _count_lines (f)
Determine the number of lines in the file.

Parameters

  • f:
In file: src/hash/hash-generate-simple.c line 467
local _distribute (L)
A hash table has been created. There are now empty buckets as well as overfilled buckets (i.e. more than one entry). Distribute entries into the empty buckets.

Parameters

  • L:
In file: src/hash/hash-generate-simple.c line 240
local _output_buckets (L, ofile)
Output the array of hash buckets, each one with a hash value, data offset, and an overflow bucket number.

Parameters

  • L:
  • ofile:
In file: src/hash/hash-generate-simple.c line 358
local _output_values (L, ofile)
For each bucket, output the value, and store the offsets into the buckets. Duplicates of values can appear. Store just one copy and use the same value in multiple buckets. Keep a sorted list of already seen values; do a binary search for each new value. New values are appended and qsort is called.

Parameters

  • L:
  • ofile:
In file: src/hash/hash-generate-simple.c line 299
local _preprocess_line (s, key_len, data_ptr, data_len, line)
Each line consists of a key and a data string, which are comma separated. Both key and data could contain non-printable characters encoded as \nnn; they are written as-is in the output, but for offset calculation the byte count required for it will be needed.

Parameters

  • s: Input line (starting with the key)
  • key_len: (output) Byte count of the key
  • data_ptr: (output) Start of data part of the line
  • data_len: (output) Byte count of the data
  • line:

Return value:

0 on success, -1 otherwise In file: src/hash/hash-generate-simple.c line 132
local _show_statistics ()
Show some statistics In file: src/hash/hash-generate-simple.c line 396
generate_hash_simple (L, datafile_name, prefix1, ofname)
Generate a hash table to map the key,value pairs found in the specified datafile, and write a compileable C file to the output file.

Parameters

  • L:
  • datafile_name:
  • prefix1:
  • ofname:
In file: src/hash/hash-generate-simple.c line 502
local special_strlen (s)
This is a sort of strlen, but detects \ooo escape sequences and counts them as just one byte.

Parameters

  • s:

Return value:

Length of the string In file: src/hash/hash-generate-simple.c line 102

Valid XHTML 1.0!