Michael Truog (okeuday) wrote,
Michael Truog

Erlang Trie Data Structure

I created a trie implementation that can outperform the standard Erlang dict implementation (20% faster lookups) while providing the same dict module interface functions. A trie data structure is a form of radix sort invented by Edward Fredkin. My implementation is a PATRICIA trie, since it stores the key suffix on a leaf node (PATRICIA - Practical Algorithm to Retrieve Information Coded in Alphanumeric, D.R.Morrison (1968)), rather than creating extra deeply nested nodes with only single children nodes. The trie implementation also can provide a way of implementing Burstsort which is quickest for sorting strings, though the performance might degrade due to Erlang being a functional language (since mutable state will not be cache-efficient like it would otherwise be in an imperative language).

The trie implementation does have both the "foldl" function and the "foldr" function for alphabetical traversals of the trie. The "map" function currently does a traversal in reverse alphabetical order. The changes to the trie data structure are likely to be slow, since the trie structure relies on tuples (setting a tuple is slow). The lookups on a trie data structure are regarded as the most important operations.

My test of the trie implementation lookup speed relied on the locally installed wordlist (98569 words that have an average of 9.5 characters). The same operations were performed on the trie data structure and the dict data structure, 10 times for both the HiPE installation and the non-HiPE. Unfortunately, the HiPE results were slower than the non-HiPE runs, possibly because my usage of integers is an uncommon case. For 10 HiPE runs, the trie took 541655.2 µs and the dict took 623698.5 µs for a 1.15 speedup factor and a 13.2 % improvement. For 10 non-HiPE runs, the trie took 505989.0 µs and the dict took 635995.5 µs for a 1.26 speedup factor and a 20.4 % improvement. The wordlist was fully stored and then fully accessed, using "is_key", "fetch", and "find". The benchmark results (from the test_speed function) are from Linux version 2.6.32-21, Ubuntu 10.04 with an Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz.

(Updated the HiPE results on 1/2/2011 after a proper Erlang compilation, i.e., when --enable-hipe fails on configure, there is no error)

  • trie retest

    The older trie (speed) testing didn't experiment with Erlang compilation options and used Erlang R14B01 (newest, at the time). To examine how…

  • v4 UUIDs

    After updating the Erlang uuid implementation to fix various bugs, I added uuid:get_v4_urandom/0 function to utilize the 2006 Wichmann-Hill…

  • A Better Erlang Priority Queue

    The other priority queue implementations were meant to avoid the O(N) task of finding a priority within the implementation used by both Riak and…

  • Post a new comment


    default userpic

    Your reply will be screened

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 1 comment