You are viewing okeuday

Michael Truog - Erlang Trie Data Structure
Erlang Trie Data Structure
I created a trie implementation that can outperform the standard Erlang dict implementation (20% faster lookups) while providing the same dict module interface functions. A trie data structure is a form of radix sort invented by Edward Fredkin. My implementation is a PATRICIA trie, since it stores the key suffix on a leaf node (PATRICIA - Practical Algorithm to Retrieve Information Coded in Alphanumeric, D.R.Morrison (1968)), rather than creating extra deeply nested nodes with only single children nodes. The trie implementation also can provide a way of implementing Burstsort which is quickest for sorting strings, though the performance might degrade due to Erlang being a functional language (since mutable state will not be cache-efficient like it would otherwise be in an imperative language).

The trie implementation does have both the "foldl" function and the "foldr" function for alphabetical traversals of the trie. The "map" function currently does a traversal in reverse alphabetical order. The changes to the trie data structure are likely to be slow, since the trie structure relies on tuples (setting a tuple is slow). The lookups on a trie data structure are regarded as the most important operations.

My test of the trie implementation lookup speed relied on the locally installed wordlist (98569 words that have an average of 9.5 characters). The same operations were performed on the trie data structure and the dict data structure, 10 times for both the HiPE installation and the non-HiPE. Unfortunately, the HiPE results were slower than the non-HiPE runs, possibly because my usage of integers is an uncommon case. For 10 HiPE runs, the trie took 541655.2 µs and the dict took 623698.5 µs for a 1.15 speedup factor and a 13.2 % improvement. For 10 non-HiPE runs, the trie took 505989.0 µs and the dict took 635995.5 µs for a 1.26 speedup factor and a 20.4 % improvement. The wordlist was fully stored and then fully accessed, using "is_key", "fetch", and "find". The benchmark results (from the test_speed function) are from Linux version 2.6.32-21, Ubuntu 10.04 with an Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz.

(Updated the HiPE results on 1/2/2011 after a proper Erlang compilation, i.e., when --enable-hipe fails on configure, there is no error)
1 comment or Leave a comment
From: zoarre Date: January 6th, 2011 02:24 pm (UTC) (Link)


well done. :)

1 comment or Leave a comment