trie retest

The older trie (speed) testing didn't experiment with Erlang compilation options and used Erlang R14B01 (newest, at the time). To examine how performance has changed, I wanted to rerun the tests with Erlang R15B03 on newer hardware. I have tested various HiPE optimization levels (Erlang configuration: --enable-threads --enable-smp-support --enable-kernel-poll --enable-hipe --enable-native-libs) and running without optimization (Erlang configuration: --enable-threads --enable-smp-support --enable-kernel-poll --disable-hipe). The hardware for these tests is: Core i7 2670QM 2.2GHz 4 cores, 8 hyper-threads, L2:4×256KB L3:6MB RAM:8GB:DDR3-1333MHz, Sandy Bridge-HE-4 (Socket G2), Linux 3.2.0-29-generic x86_64, Ubuntu 12.04.1 LTS. To understand the Erlang compilation options for each optimization level, look at the makefile.

The older results and the summary showed similar performance between the trie and the process dictionary with larger tries (50000 elements), for the trie:find/2 operation (shown as "get" in the results). The new testing only deals with N as 1000, 10000, and 100000 instead of 1000, 10000, and 50000, to look at orders of magnitude difference. When doing the new testing, the smaller tries (1000 elements) showed similar performance between the trie and the process dictionary (i.e., the ideal usage changed). The difference became more dramatic with HiPE optimization level 1, such that the trie outperformed the process dictionary for N as 1000 and 10000. Outperforming the Erlang process dictionary is very surprising and atypical since the process dictionary is normally the fastest storage available, but it is normally avoided since it eliminates the source code's referential transparency making the code harder to test, maintain, and understand.

Using the trie data structure instead of the process dictionary allows you to provide a scope for your data, since an Erlang process only has a single shared process dictionary (i.e., one process dictionary per Erlang process). The trie data structure also allows the state to be saved as Erlang terms, so it can easily be put in ets, a database, or sent between processes.

While it is unclear if HiPE is used within production Erlang systems (by people that need stability), it is good to see that the trie can benefit from the simplest optimizations available to Erlang (HiPE optimization level 1). The exact source of the trie performance change between R14B01 and R15B03 is also unclear because of how many changes influence the Erlang runtime system, but the trend from larger tries being faster to smaller tries being faster helps to make the trie data structure more applicable to common tasks, as the best solution for storing associative lookup data with string keys (where a string is a list of integers, not a binary). Feel free to try the string_key test within your own software development environment.

make OPTIMIZE=0
erl -noshell -pz ebin -s run test -s init stop
TEST string_key
N == 1000 (10 runs)
              aadict get:      431.8 us (  1.4), set:     1116.5 us (  1.5)
                dict get:      443.7 us (  1.5), set:     1791.7 us (  2.4)
   ets (ordered_set) get:      791.9 us (  2.6), set:     1497.6 us (  2.0)
           ets (set) get:      364.4 us (  1.2), set:      890.3 us (  1.2)
ets x10 read (ordere get:      655.3 us (  2.2)
  ets x10 read (set) get:      593.7 us (  2.0)
            gb_trees get:      459.4 us (  1.5), set:     1518.7 us (  2.0)
              hashtl get:      915.7 us (  3.0), set:     1270.2 us (  1.7)
  process dictionary get:      316.6 us (  1.0), set:      741.5 us (  1.0)
              rbdict get:      420.0 us (  1.4), set:      940.5 us (  1.3)
                trie get:      302.9 us (  1.0), set:      978.1 us (  1.3)
N == 10000 (10 runs)
              aadict get:     8903.2 us (  1.4), set:    20652.3 us (  2.0)
                dict get:     8144.9 us (  1.3), set:    26353.0 us (  2.6)
   ets (ordered_set) get:     9488.3 us (  1.5), set:    12515.1 us (  1.2)
           ets (set) get:     8149.0 us (  1.3), set:    10223.1 us (  1.0)
ets x10 read (ordere get:     7549.7 us (  1.2)
  ets x10 read (set) get:     6636.2 us (  1.1)
            gb_trees get:     9630.4 us (  1.6), set:    24495.7 us (  2.4)
              hashtl get:     9724.2 us (  1.6), set:    16015.6 us (  1.6)
  process dictionary get:     6178.1 us (  1.0), set:    14000.4 us (  1.4)
              rbdict get:     8522.1 us (  1.4), set:    18301.5 us (  1.8)
                trie get:     7215.9 us (  1.2), set:    18760.3 us (  1.8)
N == 100000 (10 runs)
              aadict get:   217818.2 us (  2.9), set:   363481.8 us (  3.1)
                dict get:   120559.7 us (  1.6), set:   702588.1 us (  6.0)
   ets (ordered_set) get:   140960.1 us (  1.9), set:   160380.2 us (  1.4)
           ets (set) get:    92217.0 us (  1.2), set:   117812.8 us (  1.0)
ets x10 read (ordere get:    94970.5 us (  1.3)
  ets x10 read (set) get:    77114.3 us (  1.0)
            gb_trees get:   240592.4 us (  3.2), set:   413380.8 us (  3.5)
              hashtl get:   123689.9 us (  1.7), set:   162366.1 us (  1.4)
  process dictionary get:    74274.1 us (  1.0), set:   145834.0 us (  1.2)
              rbdict get:   216569.4 us (  2.9), set:   329517.9 us (  2.8)
                trie get:   128545.4 us (  1.7), set:   469294.1 us (  4.0)

make OPTIMIZE=1
erl -noshell -pz ebin -s run test -s init stop
TEST string_key
N == 1000 (10 runs)
              aadict get:      364.6 us (  2.5), set:      682.8 us (  1.2)
                dict get:      338.6 us (  2.3), set:     1243.5 us (  2.3)
   ets (ordered_set) get:      575.0 us (  3.9), set:      795.2 us (  1.4)
           ets (set) get:      344.1 us (  2.3), set:      572.9 us (  1.0)
ets x10 read (ordere get:      569.3 us (  3.8)
  ets x10 read (set) get:      675.6 us (  4.6)
            gb_trees get:      404.0 us (  2.7), set:     1024.3 us (  1.9)
              hashtl get:      488.2 us (  3.3), set:      571.5 us (  1.0)
  process dictionary get:      228.3 us (  1.5), set:      783.6 us (  1.4)
              rbdict get:      350.9 us (  2.4), set:      551.9 us (  1.0)
                trie get:      148.2 us (  1.0), set:      637.4 us (  1.2)
N == 10000 (10 runs)
              aadict get:     8243.8 us (  1.5), set:    14899.0 us (  1.3)
                dict get:     7905.4 us (  1.5), set:    22639.6 us (  2.0)
   ets (ordered_set) get:     9647.8 us (  1.8), set:    12273.3 us (  1.1)
           ets (set) get:     8741.0 us (  1.6), set:    11131.5 us (  1.0)
ets x10 read (ordere get:     8152.5 us (  1.5)
  ets x10 read (set) get:     7511.2 us (  1.4)
            gb_trees get:     8854.4 us (  1.6), set:    17573.0 us (  1.6)
              hashtl get:     8872.8 us (  1.6), set:    14108.0 us (  1.3)
  process dictionary get:     6474.1 us (  1.2), set:    15250.0 us (  1.4)
              rbdict get:     7876.7 us (  1.5), set:    12443.8 us (  1.1)
                trie get:     5395.0 us (  1.0), set:    13855.8 us (  1.2)
N == 100000 (10 runs)
              aadict get:   203331.7 us (  3.2), set:   290739.7 us (  2.8)
                dict get:   115243.7 us (  1.8), set:   636958.5 us (  6.0)
   ets (ordered_set) get:   138594.7 us (  2.1), set:   156941.5 us (  1.5)
           ets (set) get:    83570.1 us (  1.3), set:   105417.7 us (  1.0)
ets x10 read (ordere get:    95338.7 us (  1.5)
  ets x10 read (set) get:    68988.9 us (  1.1)
            gb_trees get:   228740.4 us (  3.5), set:   334276.2 us (  3.2)
              hashtl get:   106109.2 us (  1.6), set:   119801.5 us (  1.1)
  process dictionary get:    64525.8 us (  1.0), set:   137005.9 us (  1.3)
              rbdict get:   199559.6 us (  3.1), set:   257816.9 us (  2.4)
                trie get:    95711.2 us (  1.5), set:   426041.9 us (  4.0)

make OPTIMIZE=2
erl -noshell -pz ebin -s run test -s init stop
TEST string_key
N == 1000 (10 runs)
              aadict get:      473.4 us (  3.3), set:      763.9 us (  1.4)
                dict get:      435.0 us (  3.0), set:     1547.7 us (  2.9)
   ets (ordered_set) get:      574.4 us (  4.0), set:      805.5 us (  1.5)
           ets (set) get:      339.6 us (  2.3), set:      589.4 us (  1.1)
ets x10 read (ordere get:      623.7 us (  4.3)
  ets x10 read (set) get:      585.3 us (  4.0)
            gb_trees get:      408.5 us (  2.8), set:     1014.8 us (  1.9)
              hashtl get:      492.9 us (  3.4), set:      597.2 us (  1.1)
  process dictionary get:      222.7 us (  1.5), set:      791.6 us (  1.5)
              rbdict get:      352.2 us (  2.4), set:      540.2 us (  1.0)
                trie get:      144.9 us (  1.0), set:      775.0 us (  1.4)
N == 10000 (10 runs)
              aadict get:     8240.3 us (  1.6), set:    14352.5 us (  1.4)
                dict get:     7820.1 us (  1.5), set:    22096.4 us (  2.1)
   ets (ordered_set) get:     9841.4 us (  1.9), set:    12463.4 us (  1.2)
           ets (set) get:     8539.8 us (  1.6), set:    10536.7 us (  1.0)
ets x10 read (ordere get:     8252.2 us (  1.6)
  ets x10 read (set) get:     7503.8 us (  1.4)
            gb_trees get:     8956.5 us (  1.7), set:    17869.8 us (  1.7)
              hashtl get:     8733.6 us (  1.7), set:    15034.7 us (  1.4)
  process dictionary get:     6167.1 us (  1.2), set:    14606.7 us (  1.4)
              rbdict get:     7936.7 us (  1.5), set:    12769.4 us (  1.2)
                trie get:     5220.3 us (  1.0), set:    14063.3 us (  1.3)
N == 100000 (10 runs)
              aadict get:   213701.7 us (  3.1), set:   299393.6 us (  2.7)
                dict get:   118862.2 us (  1.7), set:   637653.9 us (  5.7)
   ets (ordered_set) get:   131884.9 us (  1.9), set:   151626.0 us (  1.4)
           ets (set) get:    90851.1 us (  1.3), set:   112281.9 us (  1.0)
ets x10 read (ordere get:    90603.8 us (  1.3)
  ets x10 read (set) get:    74600.5 us (  1.1)
            gb_trees get:   234143.5 us (  3.4), set:   337062.8 us (  3.0)
              hashtl get:    99279.3 us (  1.4), set:   118642.3 us (  1.1)
  process dictionary get:    69277.4 us (  1.0), set:   141672.6 us (  1.3)
              rbdict get:   206904.5 us (  3.0), set:   265790.1 us (  2.4)
                trie get:   100507.5 us (  1.5), set:   437189.1 us (  3.9)

make OPTIMIZE=3
erl -noshell -pz ebin -s run test -s init stop
TEST string_key
N == 1000 (10 runs)
              aadict get:      465.3 us (  2.6), set:      790.3 us (  1.4)
                dict get:      424.2 us (  2.3), set:     1523.1 us (  2.7)
   ets (ordered_set) get:      569.2 us (  3.1), set:      798.2 us (  1.4)
           ets (set) get:      333.5 us (  1.8), set:      567.9 us (  1.0)
ets x10 read (ordere get:      598.3 us (  3.3)
  ets x10 read (set) get:      615.6 us (  3.4)
            gb_trees get:      509.1 us (  2.8), set:     1186.2 us (  2.1)
              hashtl get:      464.7 us (  2.6), set:      585.9 us (  1.0)
  process dictionary get:      224.2 us (  1.2), set:      817.1 us (  1.4)
              rbdict get:      449.9 us (  2.5), set:      639.7 us (  1.1)
                trie get:      181.9 us (  1.0), set:      839.4 us (  1.5)
N == 10000 (10 runs)
              aadict get:     8498.7 us (  1.6), set:    15190.0 us (  1.4)
                dict get:     7812.6 us (  1.5), set:    21764.6 us (  2.1)
   ets (ordered_set) get:     9503.9 us (  1.8), set:    12374.8 us (  1.2)
           ets (set) get:     8636.0 us (  1.7), set:    10541.9 us (  1.0)
ets x10 read (ordere get:     8148.1 us (  1.6)
  ets x10 read (set) get:     7765.6 us (  1.5)
            gb_trees get:     8986.5 us (  1.7), set:    17331.9 us (  1.6)
              hashtl get:     9372.1 us (  1.8), set:    15798.2 us (  1.5)
  process dictionary get:     6215.1 us (  1.2), set:    14557.3 us (  1.4)
              rbdict get:     7963.2 us (  1.5), set:    12940.2 us (  1.2)
                trie get:     5199.3 us (  1.0), set:    14114.5 us (  1.3)
N == 100000 (10 runs)
              aadict get:   203343.0 us (  3.0), set:   302129.4 us (  2.7)
                dict get:   109768.0 us (  1.6), set:   610699.3 us (  5.4)
   ets (ordered_set) get:   126994.1 us (  1.9), set:   144943.7 us (  1.3)
           ets (set) get:    87807.5 us (  1.3), set:   113148.6 us (  1.0)
ets x10 read (ordere get:    81821.7 us (  1.2)
  ets x10 read (set) get:    72250.2 us (  1.1)
            gb_trees get:   228222.0 us (  3.3), set:   336378.5 us (  3.0)
              hashtl get:    90128.9 us (  1.3), set:   128015.4 us (  1.1)
  process dictionary get:    68322.9 us (  1.0), set:   141096.2 us (  1.2)
              rbdict get:   196716.5 us (  2.9), set:   266625.2 us (  2.4)
                trie get:    99331.1 us (  1.5), set:   422425.9 us (  3.7)

make OPTIMIZE=4
erl -noshell -pz ebin -s run test -s init stop
TEST string_key
N == 1000 (10 runs)
              aadict get:      364.5 us (  2.4), set:      681.8 us (  1.1)
                dict get:      341.7 us (  2.3), set:     1184.9 us (  1.8)
   ets (ordered_set) get:      586.1 us (  3.9), set:      867.0 us (  1.3)
           ets (set) get:      336.7 us (  2.2), set:      648.8 us (  1.0)
ets x10 read (ordere get:      567.0 us (  3.8)
  ets x10 read (set) get:      559.8 us (  3.7)
            gb_trees get:      507.6 us (  3.4), set:     1226.4 us (  1.9)
              hashtl get:     1015.4 us (  6.7), set:     2740.5 us (  4.2)
  process dictionary get:      226.9 us (  1.5), set:      731.0 us (  1.1)
              rbdict get:      461.8 us (  3.1), set:      658.7 us (  1.0)
                trie get:      150.8 us (  1.0), set:      757.3 us (  1.2)
N == 10000 (10 runs)
              aadict get:     7712.4 us (  1.5), set:    15373.6 us (  1.5)
                dict get:     7464.2 us (  1.4), set:    21264.0 us (  2.1)
   ets (ordered_set) get:     8903.2 us (  1.7), set:    11940.7 us (  1.2)
           ets (set) get:     8213.1 us (  1.6), set:    10315.6 us (  1.0)
ets x10 read (ordere get:     7535.7 us (  1.4)
  ets x10 read (set) get:     6949.5 us (  1.3)
            gb_trees get:     8625.9 us (  1.6), set:    17152.0 us (  1.7)
              hashtl get:    15289.7 us (  2.9), set:    30531.0 us (  3.0)
  process dictionary get:     5988.8 us (  1.1), set:    14283.2 us (  1.4)
              rbdict get:     7701.0 us (  1.5), set:    14704.0 us (  1.4)
                trie get:     5275.3 us (  1.0), set:    14217.7 us (  1.4)
N == 100000 (10 runs)
              aadict get:   198179.8 us (  2.9), set:   292331.5 us (  2.7)
                dict get:   109865.6 us (  1.6), set:   552816.3 us (  5.0)
   ets (ordered_set) get:   135377.6 us (  2.0), set:   155204.8 us (  1.4)
           ets (set) get:    88240.8 us (  1.3), set:   110088.4 us (  1.0)
ets x10 read (ordere get:    97416.2 us (  1.4)
  ets x10 read (set) get:    74857.3 us (  1.1)
            gb_trees get:   221469.0 us (  3.2), set:   338019.2 us (  3.1)
              hashtl get:   101314.6 us (  1.5), set:   212521.6 us (  1.9)
  process dictionary get:    68286.0 us (  1.0), set:   139829.3 us (  1.3)
              rbdict get:   194049.9 us (  2.8), set:   263348.2 us (  2.4)
                trie get:    94791.4 us (  1.4), set:   399552.5 us (  3.6)

v4 UUIDs

After updating the Erlang uuid implementation to fix various bugs, I added uuid:get_v4_urandom/0 function to utilize the 2006 Wichmann-Hill algorithm for generating uniform pseudo-random numbers. The main reason to utilize this algorithm, is for the 124 maximum bits of randomness when compared with the 45 maximum bits of randomness available with the normal Erlang OTP random module. Not just the amount of pseudo-randomness is better, but also the period is increased, to be more competitive with the Erlang OTP crypto module usage (i.e., OpenSSL usage). There are likely to be better pseudo-randomness generators available, but this particular choice seemed like a good fit for Erlang, due to Erlang's native big integer support. The main goal is to provide the pseudo-randomness within pure Erlang source code, without the need for C/C++ integration. I consider the uuid:get_v4_urandom/0 function to be experimental, but I look forward to its use after more testing. The tests show both the speed of pseudo-randomness and uuid creation, to see how the various functions compare. The benchmark was ran (with R14B04, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu) (i.e., "the older dependencies" machine):
TEST uuid_creation
N == 100000 (10 runs)
                  v1 get:    79487.3 us (  1.0)
                  v3 get:   131336.8 us (  1.7)
    v4 crypto strong get:   201665.8 us (  2.5)
      v4 crypto weak get:   201813.2 us (  2.5)
    v4 random bigint get:   209130.1 us (  2.6)
    v4 random native get:   322730.1 us (  4.1)
  v4 random_wh06_int get:   154083.6 us (  1.9)
                  v5 get:   152191.5 us (  1.9)

TEST pseudo_randomness
N == 10000 (10 runs)
crypto:rand_uniform/ get:    61465.2 us ( 27.1)
        erlang:now/0 get:     2781.2 us (  1.2)
      os:timestamp/0 get:     2264.4 us (  1.0)
    random:uniform/1 get:     6971.9 us (  3.1)
random_wh06_int:unif get:    13464.4 us (  5.9)

A Better Erlang Priority Queue

The other priority queue implementations were meant to avoid the O(N) task of finding a priority within the implementation used by both Riak and RabbitMQ. The previous results showed that "priority_queue" is best for using 1 priority, "pqueue" is best for using 2 to 41 priorities, and "pqueue3" is best for using 64 or more priorities. Both the "pqueue" data structure and the "pqueue3" data structure benefited from O(1) lookup time within an Erlang tuple. However, the "pqueue" data structure has a static limitation of 41 priorities, whereas the "pqueue3" data structure supports an arbitrary number of priorities (at creation time). To merge the best of both data structures, I created the "pqueue4" data structure, which has a static limitation of 257 priorities but is implemented in a way that is similar to "pqueue".

The "pqueue4" data structure supports the priorities -128 (high) to +128 (low) and compares favorably with both "pqueue" and "pqueue3". The "pqueue4" data structure is faster than the "pqueue3" data structure, but is slightly slower than the "pqueue" data structure (due to its size). The "pqueue4" provides the best performance if 257 priorities need to be supported, and the latency will never grow based on the size of the priority queue (like it does with the "priority_queue" and "pqueue2" data structures). The benchmark was ran (with R14B04, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

TEST pqueue_priority0
N == 1000000 (10 runs)
              pqueue get:   449060.6 µs (  1.7), set:   363759.6 µs (  2.3)
             pqueue2 get:   354847.0 µs (  1.4), set:   413158.5 µs (  2.6)
             pqueue3 get:   867085.4 µs (  3.4), set:   815489.8 µs (  5.2)
             pqueue4 get:   540231.5 µs (  2.1), set:   466625.8 µs (  3.0)
      priority_queue get:   257257.6 µs (  1.0), set:   156641.0 µs (  1.0)
TEST pqueue_priorities2
N == 1000000 (10 runs)
           2x pqueue get:   457418.0 µs (  1.3), set:   456008.8 µs (  1.0)
          2x pqueue2 get:   357993.1 µs (  1.0), set:   534914.7 µs (  1.2)
          2x pqueue3 get:   879511.6 µs (  2.5), set:   768901.8 µs (  1.7)
          2x pqueue4 get:   590378.9 µs (  1.6), set:   530402.6 µs (  1.2)
   2x priority_queue get:   428915.3 µs (  1.2), set:   519009.0 µs (  1.1)
TEST pqueue_priorities41
N == 1000000 (10 runs)
           41 pqueue get:   482050.6 µs (  1.5), set:   485457.7 µs (  1.0)
          41 pqueue2 get:   329309.2 µs (  1.0), set:  2489879.6 µs (  5.1)
          41 pqueue3 get:   928454.4 µs (  2.8), set:   834174.2 µs (  1.7)
          41 pqueue4 get:   682014.1 µs (  2.1), set:   646907.9 µs (  1.3)
   41 priority_queue get:   398687.1 µs (  1.2), set:  1600208.7 µs (  3.3)
TEST pqueue_priorities64
N == 1000000 (10 runs)
          64 pqueue2 get:   326512.2 µs (  1.0), set:  3582432.8 µs (  5.5)
          64 pqueue3 get:   875909.0 µs (  2.7), set:   827656.1 µs (  1.3)
          64 pqueue4 get:   668160.2 µs (  2.0), set:   647776.4 µs (  1.0)
   64 priority_queue get:   395522.9 µs (  1.2), set:  2238489.8 µs (  3.5)

An Erlang Priority Queue (continued)

After looking at various priority queue implementation possibilities and creating 3 separate implementations, it became necessary to test the data structures for correctness. Jesper Louis Andersen added PropEr integration, which provided randomized test cases to determine correctness, based on a priority queue model. Then, it became clear that the efficiency testing needed to focus on separate cases for different priority usage patterns. The main cases are usage of: a single priority, 2 priorities, 41 priorities, and 64 priorities, where all the priorities are evenly distributed with random, concurrent usage. The results indicate that "priority_queue" is best for using 1 priority, "pqueue" is best for using 2 to 41 priorities, and "pqueue3" is best for using 64 or more priorities. The benchmark was ran (with R14B04, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

TEST pqueue_priority0
N == 1000000 (10 runs)
              pqueue get:   444087.9 µs (  1.8), set:   378655.3 µs (  2.3)
             pqueue2 get:   346286.3 µs (  1.4), set:   407807.4 µs (  2.4)
             pqueue3 get:   877157.6 µs (  3.6), set:   810117.1 µs (  4.8)
      priority_queue get:   245788.8 µs (  1.0), set:   168242.6 µs (  1.0)
TEST pqueue_priorities2
N == 1000000 (10 runs)
           2x pqueue get:   437337.3 µs (  1.2), set:   430425.9 µs (  1.0)
          2x pqueue2 get:   363276.3 µs (  1.0), set:   514642.0 µs (  1.2)
          2x pqueue3 get:   871478.7 µs (  2.4), set:   759261.5 µs (  1.8)
   2x priority_queue get:   419643.7 µs (  1.2), set:   501706.6 µs (  1.2)
TEST pqueue_priorities41
N == 1000000 (10 runs)
           41 pqueue get:   538690.6 µs (  1.6), set:   509626.7 µs (  1.0)
          41 pqueue2 get:   336863.6 µs (  1.0), set:  2518901.4 µs (  4.9)
          41 pqueue3 get:   906567.4 µs (  2.7), set:   816174.1 µs (  1.6)
   41 priority_queue get:   400770.3 µs (  1.2), set:  1676239.8 µs (  3.3)
TEST pqueue_priorities64
N == 1000000 (10 runs)
          64 pqueue2 get:   337312.1 µs (  1.0), set:  3721587.8 µs (  4.3)
          64 pqueue3 get:   923038.7 µs (  2.7), set:   863387.5 µs (  1.0)
   64 priority_queue get:   399396.4 µs (  1.2), set:  2178070.5 µs (  2.5)

An Erlang Priority Queue

Functional languages don't have mutable data (in the traditional sense) so certain data structures become more difficult to implement. A priority queue is normally implemented with a heap, however, after looking at heap implementations within Chris Okasaki's book ("Purely Functional Data Structures"), it didn't seem correct to use an Erlang heap data structure to implement a priority queue.

The previous Erlang priority queue data structure is used within both Riak and RabbitMQ (see priority_queue.erl). However, the "in" (i.e., "append") operation seemed quite slow because of its usage of a list of tuples to store the separate queues for each priority.

So, I created a priority queue implementation called "pqueue" that relies on tuple storage for individual priorities. However, because I am using tuple storage, I needed to use a static range of priorities. I chose the common UNIX operating system priority range for nice (-20 (high) to 20 (low)). I was able to obtain fast "in" operations at the slight expense of the "out" operations, as shown below. The benchmark was ran (with R14B02, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

TEST run_priority_queue
N == 1000000 (10 runs)
              pqueue get:   475747.2 µs (  1.3), set:   504762.3 µs (  1.0)
      priority_queue get:   372439.6 µs (  1.0), set:  1466042.1 µs (  2.9)

Global State

I have updated results for the erlbench Erlang data structures tests for string and integer associative data storage. These results include both the trie data structure and the statically-sized hash table data structure (hashtl) I have created for more efficient string associative data access. My main justification for creating efficient Erlang data structures is to make Erlang source code more scalable by reducing its dependency on global state (ETS).

The process dictionary is globally accessible with trickery (erlang:process_info/1) and is not used in testable source code, so the process dictionary is considered to be as bad as global state. The reason the process dictionary is bad is because its scope is undefined, since it is attached to the lifetime of the Erlang process (not to mention that it complicates the fault-tolerance of any stored data). The process dictionary is a single dictionary shared with all source code the Erlang process executes, so external source code can make its contents undefined also!

ETS is a more typical way to store global state as key/value pairs. With concurrency (operations that are not atomic) global state depends on locking to maintain consistency and it is the data locking that limits scalability. Erlang's main strength is its avoidance of global data locks, so it is only natural to pursue data structures that avoid global data.

(from string_key test below)
N == 64000 (10 runs)
              aadict get:   233343.8 µs (  5.4), set:   440130.2 µs (  2.6)
                dict get:   137424.3 µs (  3.2), set:   871894.4 µs (  5.2)
   ets (ordered_set) get:   177797.4 µs (  4.1), set:   211416.5 µs (  1.3)
           ets (set) get:   114235.0 µs (  2.6), set:   169276.2 µs (  1.0)
ets x10 read (ordere get:   169068.3 µs (  3.9)
  ets x10 read (set) get:   134240.8 µs (  3.1)
            gb_trees get:   255160.6 µs (  5.9), set:   488567.1 µs (  2.9)
              hashtl get:   129433.1 µs (  3.0), set:   210779.3 µs (  1.3)
  process dictionary get:    77023.9 µs (  1.8), set:   166122.5 µs (  1.0)
              rbdict get:   230949.9 µs (  5.3), set:   372831.9 µs (  2.2)
                trie get:    43435.1 µs (  1.0), set:   379677.6 µs (  2.3)


Just looking at string lookup results for a relatively large number of strings (64000) to hold in RAM shows that the trie string "get" is the fastest and the hashtl "set" is tied for second fastest (with both ETS and the process dictionary being the fastest alternative). You could also say that for accessing 64000 common strings, the trie and the hashtl source code are currently the fastest local string associative data storage in Erlang, if you accept that the scope of the process dictionary is non-local.

The ETS test shows the extra latency that occurs with a string "get" operation on both set and ordered_set with read_concurrency set to true. The latency changes by a factor of 0.951 for the ordered_set and a factor of 1.175 for the set when the number of processes doing concurrent read operations increases from 1 to 10. So the scalability benefit of using the trie (ignoring its efficiency) rather than ETS is obvious for the set but the ordered_set requires testing with more concurrent reads to show scalability benefits.

The benchmark results are from Linux version 2.6.32-21, Ubuntu 10.04 with an Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz and non-HiPE Erlang R14B02:

TEST string_key
N == 50 (10 runs)
              aadict get:       18.5 µs (  1.1), set:       61.5 µs (  1.2)
                dict get:       32.1 µs (  1.9), set:      119.4 µs (  2.4)
   ets (ordered_set) get:       30.6 µs (  1.8), set:       50.0 µs (  1.0)
           ets (set) get:       31.1 µs (  1.8), set:       53.9 µs (  1.1)
ets x10 read (ordere get:       39.8 µs (  2.4)
  ets x10 read (set) get:       41.5 µs (  2.5)
            gb_trees get:       22.9 µs (  1.4), set:       92.3 µs (  1.8)
              hashtl get:       27.4 µs (  1.6), set:       58.7 µs (  1.2)
  process dictionary get:       19.5 µs (  1.2), set:       74.1 µs (  1.5)
              rbdict get:       18.2 µs (  1.1), set:       52.6 µs (  1.1)
                trie get:       16.9 µs (  1.0), set:      108.0 µs (  2.2)
N == 100 (10 runs)
              aadict get:       43.4 µs (  1.3), set:      147.1 µs (  1.5)
                dict get:       68.8 µs (  2.0), set:      251.4 µs (  2.5)
   ets (ordered_set) get:       65.4 µs (  1.9), set:       99.6 µs (  1.0)
           ets (set) get:       59.2 µs (  1.7), set:      100.8 µs (  1.0)
ets x10 read (ordere get:       99.6 µs (  2.9)
  ets x10 read (set) get:       77.7 µs (  2.3)
            gb_trees get:       51.3 µs (  1.5), set:      200.4 µs (  2.0)
              hashtl get:       54.5 µs (  1.6), set:      116.7 µs (  1.2)
  process dictionary get:       38.5 µs (  1.1), set:      224.8 µs (  2.3)
              rbdict get:       43.4 µs (  1.3), set:      111.2 µs (  1.1)
                trie get:       33.9 µs (  1.0), set:      210.2 µs (  2.1)
N == 250 (10 runs)
              aadict get:      135.5 µs (  1.5), set:      430.9 µs (  1.9)
                dict get:      178.5 µs (  1.9), set:      709.3 µs (  3.1)
   ets (ordered_set) get:      167.5 µs (  1.8), set:      311.4 µs (  1.4)
           ets (set) get:      148.8 µs (  1.6), set:      250.1 µs (  1.1)
ets x10 read (ordere get:      266.5 µs (  2.9)
  ets x10 read (set) get:      270.7 µs (  2.9)
            gb_trees get:      155.3 µs (  1.7), set:      685.9 µs (  3.0)
              hashtl get:      161.6 µs (  1.7), set:      229.0 µs (  1.0)
  process dictionary get:       92.6 µs (  1.0), set:      296.4 µs (  1.3)
              rbdict get:      143.2 µs (  1.5), set:      334.2 µs (  1.5)
                trie get:       95.3 µs (  1.0), set:      507.2 µs (  2.2)
N == 500 (10 runs)
              aadict get:      332.5 µs (  1.7), set:     1090.7 µs (  1.9)
                dict get:      364.6 µs (  1.9), set:     1440.2 µs (  2.5)
   ets (ordered_set) get:      410.8 µs (  2.1), set:     1466.0 µs (  2.5)
           ets (set) get:      302.6 µs (  1.6), set:      586.5 µs (  1.0)
ets x10 read (ordere get:      573.1 µs (  3.0)
  ets x10 read (set) get:      494.3 µs (  2.6)
            gb_trees get:      365.2 µs (  1.9), set:     1334.5 µs (  2.3)
              hashtl get:      388.7 µs (  2.0), set:      800.4 µs (  1.4)
  process dictionary get:      191.6 µs (  1.0), set:      718.8 µs (  1.2)
              rbdict get:      330.7 µs (  1.7), set:      755.5 µs (  1.3)
                trie get:      220.4 µs (  1.2), set:      981.8 µs (  1.7)
N == 1000 (10 runs)
              aadict get:      798.7 µs (  1.1), set:     2322.7 µs (  1.5)
                dict get:     1093.8 µs (  1.5), set:     3336.9 µs (  2.1)
   ets (ordered_set) get:     1163.9 µs (  1.6), set:     2343.4 µs (  1.5)
           ets (set) get:      976.2 µs (  1.4), set:     1561.4 µs (  1.0)
ets x10 read (ordere get:     1071.3 µs (  1.5)
  ets x10 read (set) get:     1118.5 µs (  1.6)
            gb_trees get:      927.4 µs (  1.3), set:     3017.3 µs (  1.9)
              hashtl get:     1346.6 µs (  1.9), set:     1754.8 µs (  1.1)
  process dictionary get:      747.5 µs (  1.1), set:     1771.5 µs (  1.1)
              rbdict get:      786.5 µs (  1.1), set:     1806.9 µs (  1.2)
                trie get:      709.1 µs (  1.0), set:     2468.1 µs (  1.6)
N == 2000 (10 runs)
              aadict get:     2196.6 µs (  1.2), set:     5800.6 µs (  1.6)
                dict get:     2980.3 µs (  1.6), set:     7976.8 µs (  2.2)
   ets (ordered_set) get:     3098.8 µs (  1.6), set:     4165.5 µs (  1.1)
           ets (set) get:     2966.1 µs (  1.6), set:     4373.5 µs (  1.2)
ets x10 read (ordere get:     2640.6 µs (  1.4)
  ets x10 read (set) get:     2850.6 µs (  1.5)
            gb_trees get:     2320.3 µs (  1.2), set:     6872.1 µs (  1.9)
              hashtl get:     3581.9 µs (  1.9), set:     3662.4 µs (  1.0)
  process dictionary get:     2389.5 µs (  1.3), set:     4164.2 µs (  1.1)
              rbdict get:     2076.3 µs (  1.1), set:     4576.0 µs (  1.2)
                trie get:     1907.3 µs (  1.0), set:     5780.9 µs (  1.6)
N == 4000 (10 runs)
              aadict get:     5877.5 µs (  1.3), set:    13809.0 µs (  1.7)
                dict get:     6491.3 µs (  1.5), set:    18633.8 µs (  2.4)
   ets (ordered_set) get:     7127.5 µs (  1.6), set:     9145.6 µs (  1.2)
           ets (set) get:     6153.2 µs (  1.4), set:     7919.1 µs (  1.0)
ets x10 read (ordere get:     8061.6 µs (  1.8)
  ets x10 read (set) get:     7786.1 µs (  1.8)
            gb_trees get:     6424.2 µs (  1.5), set:    15693.2 µs (  2.0)
              hashtl get:     7054.5 µs (  1.6), set:     8605.5 µs (  1.1)
  process dictionary get:     4938.2 µs (  1.1), set:     9578.4 µs (  1.2)
              rbdict get:     5788.3 µs (  1.3), set:    11456.6 µs (  1.4)
                trie get:     4376.9 µs (  1.0), set:    13757.2 µs (  1.7)
N == 8000 (10 runs)
              aadict get:    15730.6 µs (  1.8), set:    34177.0 µs (  2.1)
                dict get:    14520.8 µs (  1.6), set:    44817.0 µs (  2.7)
   ets (ordered_set) get:    16009.9 µs (  1.8), set:    19875.4 µs (  1.2)
           ets (set) get:    13122.3 µs (  1.5), set:    16436.0 µs (  1.0)
ets x10 read (ordere get:    19029.0 µs (  2.1)
  ets x10 read (set) get:    16993.7 µs (  1.9)
            gb_trees get:    17208.8 µs (  1.9), set:    36673.0 µs (  2.2)
              hashtl get:    15175.1 µs (  1.7), set:    16599.3 µs (  1.0)
  process dictionary get:    10157.7 µs (  1.1), set:    21110.6 µs (  1.3)
              rbdict get:    15758.7 µs (  1.8), set:    28566.6 µs (  1.7)
                trie get:     8940.0 µs (  1.0), set:    28293.8 µs (  1.7)
N == 16000 (10 runs)
              aadict get:    41375.3 µs (  2.0), set:    82925.0 µs (  2.2)
                dict get:    29842.0 µs (  1.4), set:   108119.5 µs (  2.9)
   ets (ordered_set) get:    36909.9 µs (  1.8), set:    43162.0 µs (  1.2)
           ets (set) get:    28240.8 µs (  1.4), set:    37495.3 µs (  1.0)
ets x10 read (ordere get:    40634.9 µs (  2.0)
  ets x10 read (set) get:    35608.5 µs (  1.7)
            gb_trees get:    44022.7 µs (  2.1), set:    93643.0 µs (  2.5)
              hashtl get:    37419.0 µs (  1.8), set:    48790.5 µs (  1.3)
  process dictionary get:    20684.6 µs (  1.0), set:    43798.3 µs (  1.2)
              rbdict get:    40916.6 µs (  2.0), set:    71371.2 µs (  1.9)
                trie get:    21225.2 µs (  1.0), set:    69109.4 µs (  1.8)
N == 32000 (10 runs)
              aadict get:    97193.5 µs (  2.3), set:   188024.4 µs (  2.3)
                dict get:    64191.5 µs (  1.5), set:   300668.6 µs (  3.6)
   ets (ordered_set) get:    83813.1 µs (  2.0), set:   100045.7 µs (  1.2)
           ets (set) get:    60364.4 µs (  1.4), set:    83030.4 µs (  1.0)
ets x10 read (ordere get:    86087.9 µs (  2.0)
  ets x10 read (set) get:    74093.9 µs (  1.8)
            gb_trees get:   107121.1 µs (  2.5), set:   214457.9 µs (  2.6)
              hashtl get:    68200.3 µs (  1.6), set:   112145.8 µs (  1.4)
  process dictionary get:    42152.3 µs (  1.0), set:    93514.7 µs (  1.1)
              rbdict get:    96665.2 µs (  2.3), set:   160379.7 µs (  1.9)
                trie get:    54760.0 µs (  1.3), set:   213854.1 µs (  2.6)
N == 64000 (10 runs)
              aadict get:   233343.8 µs (  5.4), set:   440130.2 µs (  2.6)
                dict get:   137424.3 µs (  3.2), set:   871894.4 µs (  5.2)
   ets (ordered_set) get:   177797.4 µs (  4.1), set:   211416.5 µs (  1.3)
           ets (set) get:   114235.0 µs (  2.6), set:   169276.2 µs (  1.0)
ets x10 read (ordere get:   169068.3 µs (  3.9)
  ets x10 read (set) get:   134240.8 µs (  3.1)
            gb_trees get:   255160.6 µs (  5.9), set:   488567.1 µs (  2.9)
              hashtl get:   129433.1 µs (  3.0), set:   210779.3 µs (  1.3)
  process dictionary get:    77023.9 µs (  1.8), set:   166122.5 µs (  1.0)
              rbdict get:   230949.9 µs (  5.3), set:   372831.9 µs (  2.2)
                trie get:    43435.1 µs (  1.0), set:   379677.6 µs (  2.3)
TEST integer_key
N == 50 (10 runs)
              aadict get:       16.2 µs (  3.2), set:       54.9 µs (  5.8)
     array (dynamic) get:       15.0 µs (  2.9), set:       25.5 µs (  2.7)
       array (fixed) get:       15.3 µs (  3.0), set:       26.5 µs (  2.8)
                dict get:       22.2 µs (  4.4), set:       51.0 µs (  5.4)
   ets (ordered_set) get:       21.9 µs (  4.3), set:       29.1 µs (  3.1)
           ets (set) get:       16.3 µs (  3.2), set:       25.8 µs (  2.7)
ets x10 read (ordere get:       20.2 µs (  4.0)
  ets x10 read (set) get:       16.8 µs (  3.3)
            gb_trees get:       17.3 µs (  3.4), set:      104.8 µs ( 11.1)
              hashtl get:       16.6 µs (  3.3), set:       38.6 µs (  4.1)
  process dictionary get:        8.2 µs (  1.6), set:        9.4 µs (  1.0)
              rbdict get:       15.8 µs (  3.1), set:       43.8 µs (  4.7)
               tuple get:        5.1 µs (  1.0), set:       13.0 µs (  1.4)
N == 100 (10 runs)
              aadict get:       43.0 µs (  4.3), set:      128.1 µs (  7.9)
     array (dynamic) get:       28.4 µs (  2.9), set:       51.0 µs (  3.1)
       array (fixed) get:       28.6 µs (  2.9), set:       50.9 µs (  3.1)
                dict get:       48.6 µs (  4.9), set:      121.6 µs (  7.5)
   ets (ordered_set) get:       42.4 µs (  4.3), set:       57.7 µs (  3.6)
           ets (set) get:       98.6 µs ( 10.0), set:       56.5 µs (  3.5)
ets x10 read (ordere get:       37.7 µs (  3.8)
  ets x10 read (set) get:       26.9 µs (  2.7)
            gb_trees get:       36.3 µs (  3.7), set:      263.0 µs ( 16.2)
              hashtl get:       35.3 µs (  3.6), set:       76.6 µs (  4.7)
  process dictionary get:       15.1 µs (  1.5), set:       16.2 µs (  1.0)
              rbdict get:       34.5 µs (  3.5), set:      106.2 µs (  6.6)
               tuple get:        9.9 µs (  1.0), set:       47.9 µs (  3.0)
N == 250 (10 runs)
              aadict get:      103.4 µs (  5.2), set:      430.5 µs (  8.9)
     array (dynamic) get:       99.0 µs (  5.0), set:      187.0 µs (  3.9)
       array (fixed) get:       96.7 µs (  4.9), set:      195.8 µs (  4.1)
                dict get:      113.4 µs (  5.7), set:      868.7 µs ( 18.0)
   ets (ordered_set) get:      101.7 µs (  5.1), set:      150.4 µs (  3.1)
           ets (set) get:       87.5 µs (  4.4), set:      149.0 µs (  3.1)
ets x10 read (ordere get:       85.0 µs (  4.3)
  ets x10 read (set) get:       64.6 µs (  3.2)
            gb_trees get:      102.2 µs (  5.1), set:      883.4 µs ( 18.3)
              hashtl get:      131.7 µs (  6.6), set:      177.9 µs (  3.7)
  process dictionary get:       36.0 µs (  1.8), set:       48.3 µs (  1.0)
              rbdict get:       97.0 µs (  4.9), set:      346.5 µs (  7.2)
               tuple get:       19.9 µs (  1.0), set:      331.4 µs (  6.9)
N == 500 (10 runs)
              aadict get:      215.0 µs (  5.1), set:     1129.6 µs ( 12.6)
     array (dynamic) get:      194.3 µs (  4.6), set:      444.6 µs (  5.0)
       array (fixed) get:      198.1 µs (  4.7), set:      456.6 µs (  5.1)
                dict get:      221.7 µs (  5.2), set:     1258.3 µs ( 14.0)
   ets (ordered_set) get:      220.4 µs (  5.2), set:      308.5 µs (  3.4)
           ets (set) get:      158.3 µs (  3.7), set:      273.8 µs (  3.1)
ets x10 read (ordere get:      172.4 µs (  4.1)
  ets x10 read (set) get:      113.1 µs (  2.7)
            gb_trees get:      223.1 µs (  5.3), set:     2430.2 µs ( 27.1)
              hashtl get:      266.2 µs (  6.3), set:      500.3 µs (  5.6)
  process dictionary get:       66.2 µs (  1.6), set:       89.7 µs (  1.0)
              rbdict get:      214.4 µs (  5.1), set:      985.1 µs ( 11.0)
               tuple get:       42.4 µs (  1.0), set:     2230.6 µs ( 24.9)
N == 1000 (10 runs)
              aadict get:      472.9 µs (  5.7), set:     1953.9 µs ( 12.8)
     array (dynamic) get:      429.1 µs (  5.1), set:      759.1 µs (  5.0)
       array (fixed) get:      424.9 µs (  5.1), set:     1118.4 µs (  7.3)
                dict get:      436.0 µs (  5.2), set:     2460.1 µs ( 16.1)
   ets (ordered_set) get:      436.5 µs (  5.2), set:      870.3 µs (  5.7)
           ets (set) get:      330.5 µs (  4.0), set:      522.7 µs (  3.4)
ets x10 read (ordere get:      335.9 µs (  4.0)
  ets x10 read (set) get:      207.3 µs (  2.5)
            gb_trees get:      494.3 µs (  5.9), set:     4548.2 µs ( 29.7)
              hashtl get:      485.6 µs (  5.8), set:     1291.5 µs (  8.4)
  process dictionary get:      129.7 µs (  1.6), set:      153.0 µs (  1.0)
              rbdict get:      474.4 µs (  5.7), set:     1635.0 µs ( 10.7)
               tuple get:       83.6 µs (  1.0), set:     5586.7 µs ( 36.5)
N == 2000 (10 runs)
              aadict get:     1083.5 µs (  6.8), set:     5727.8 µs (  9.5)
     array (dynamic) get:     1113.2 µs (  7.0), set:     2274.1 µs (  3.8)
       array (fixed) get:      985.6 µs (  6.2), set:     2235.5 µs (  3.7)
                dict get:      904.9 µs (  5.7), set:     6388.5 µs ( 10.6)
   ets (ordered_set) get:      907.6 µs (  5.7), set:     1283.6 µs (  2.1)
           ets (set) get:      702.4 µs (  4.4), set:     1124.5 µs (  1.9)
ets x10 read (ordere get:      614.8 µs (  3.9)
  ets x10 read (set) get:      384.6 µs (  2.4)
            gb_trees get:     1109.6 µs (  7.0), set:    12070.2 µs ( 20.0)
              hashtl get:     1083.8 µs (  6.8), set:     2065.2 µs (  3.4)
  process dictionary get:      258.2 µs (  1.6), set:      603.9 µs (  1.0)
              rbdict get:     1080.8 µs (  6.8), set:     4763.0 µs (  7.9)
               tuple get:      159.3 µs (  1.0), set:    39640.4 µs ( 65.6)
N == 4000 (10 runs)
              aadict get:     2634.7 µs (  8.3), set:    10477.9 µs ( 15.9)
     array (dynamic) get:     1987.0 µs (  6.3), set:     3746.9 µs (  5.7)
       array (fixed) get:     2069.9 µs (  6.5), set:     4359.8 µs (  6.6)
                dict get:     2059.9 µs (  6.5), set:     9022.6 µs ( 13.7)
   ets (ordered_set) get:     1874.6 µs (  5.9), set:     2586.1 µs (  3.9)
           ets (set) get:     1433.8 µs (  4.5), set:     2526.4 µs (  3.8)
ets x10 read (ordere get:     1222.8 µs (  3.9)
  ets x10 read (set) get:      761.1 µs (  2.4)
            gb_trees get:     2287.2 µs (  7.2), set:    27537.3 µs ( 41.7)
              hashtl get:     2361.7 µs (  7.5), set:     3410.8 µs (  5.2)
  process dictionary get:      510.0 µs (  1.6), set:      660.0 µs (  1.0)
              rbdict get:     2750.5 µs (  8.7), set:     9114.6 µs ( 13.8)
               tuple get:      316.2 µs (  1.0), set:   151840.8 µs (230.1)
N == 8000 (10 runs)
              aadict get:     5749.0 µs (  9.3), set:    20115.9 µs ( 16.5)
     array (dynamic) get:     3995.8 µs (  6.4), set:     9530.1 µs (  7.8)
       array (fixed) get:     4147.7 µs (  6.7), set:     9336.6 µs (  7.6)
                dict get:     4393.0 µs (  7.1), set:    20980.3 µs ( 17.2)
   ets (ordered_set) get:     3966.9 µs (  6.4), set:     5457.4 µs (  4.5)
           ets (set) get:     2890.5 µs (  4.7), set:     5415.8 µs (  4.4)
ets x10 read (ordere get:     2441.7 µs (  3.9)
  ets x10 read (set) get:     1529.0 µs (  2.5)
            gb_trees get:     5126.7 µs (  8.3), set:    57499.8 µs ( 47.0)
              hashtl get:     5180.1 µs (  8.4), set:     8602.0 µs (  7.0)
  process dictionary get:     1035.3 µs (  1.7), set:     1222.7 µs (  1.0)
              rbdict get:     5657.5 µs (  9.1), set:    17929.1 µs ( 14.7)
               tuple get:      619.7 µs (  1.0), set:   617050.7 µs (504.7)
N == 16000 (10 runs)
              aadict get:    11571.6 µs (  9.5), set:    45005.3 µs ( 15.6)
     array (dynamic) get:     9951.0 µs (  8.2), set:    24316.3 µs (  8.4)
       array (fixed) get:     9905.7 µs (  8.1), set:    23415.4 µs (  8.1)
                dict get:     8804.0 µs (  7.2), set:    67758.2 µs ( 23.4)
   ets (ordered_set) get:     8359.1 µs (  6.9), set:    12555.9 µs (  4.3)
           ets (set) get:     6100.3 µs (  5.0), set:    16166.7 µs (  5.6)
ets x10 read (ordere get:     5045.8 µs (  4.1)
  ets x10 read (set) get:     3196.1 µs (  2.6)
            gb_trees get:    10767.4 µs (  8.8), set:   120424.8 µs ( 41.7)
              hashtl get:    11302.7 µs (  9.3), set:    25191.9 µs (  8.7)
  process dictionary get:     2081.1 µs (  1.7), set:     2890.8 µs (  1.0)
              rbdict get:    11418.0 µs (  9.4), set:    40875.7 µs ( 14.1)
               tuple get:     1219.9 µs (  1.0), set:  2500319.5 µs (864.9)
N == 32000 (10 runs)
              aadict get:    24278.9 µs (  9.9), set:    96279.2 µs ( 12.1)
     array (dynamic) get:    19642.4 µs (  8.0), set:    47917.9 µs (  6.0)
       array (fixed) get:    19781.2 µs (  8.1), set:    39503.3 µs (  5.0)
                dict get:    19155.6 µs (  7.8), set:   237609.2 µs ( 29.9)
   ets (ordered_set) get:    18021.3 µs (  7.4), set:    25457.2 µs (  3.2)
           ets (set) get:    14420.6 µs (  5.9), set:    33051.4 µs (  4.2)
ets x10 read (ordere get:    10584.6 µs (  4.3)
  ets x10 read (set) get:     7193.0 µs (  2.9)
            gb_trees get:    23442.1 µs (  9.6), set:   244698.6 µs ( 30.8)
              hashtl get:    25704.7 µs ( 10.5), set:    47058.7 µs (  5.9)
  process dictionary get:     4151.9 µs (  1.7), set:     7941.7 µs (  1.0)
              rbdict get:    23069.3 µs (  9.4), set:    85377.7 µs ( 10.8)
               tuple get:     2447.1 µs (  1.0), set: 10227316.7 µs (*****)
N == 64000 (10 runs)
              aadict get:    47719.6 µs (  9.8), set:   225636.9 µs ( 14.4)
     array (dynamic) get:    38896.1 µs (  8.0), set:    88107.9 µs (  5.6)
       array (fixed) get:    38844.9 µs (  8.0), set:    91707.1 µs (  5.8)
                dict get:    38987.5 µs (  8.0), set:   722098.9 µs ( 46.0)
   ets (ordered_set) get:    38267.4 µs (  7.8), set:    54992.6 µs (  3.5)
           ets (set) get:    35880.6 µs (  7.3), set:    71159.9 µs (  4.5)
ets x10 read (ordere get:    22375.3 µs (  4.6)
  ets x10 read (set) get:    17203.7 µs (  3.5)
            gb_trees get:    49406.1 µs ( 10.1), set:   522021.5 µs ( 33.3)
              hashtl get:    40078.7 µs (  8.2), set:   107626.0 µs (  6.9)
  process dictionary get:     8210.2 µs (  1.7), set:    15697.8 µs (  1.0)
              rbdict get:    48208.3 µs (  9.9), set:   216548.6 µs ( 13.8)
               tuple get:     4883.4 µs (  1.0), set: 50572301.0 µs (*****)

Erlang Pseudo Randomness

Here are some interesting ways to get pseudo random numbers. Using the reductions count does not generate a uniform distribution in the test, but perhaps in a busy system it (erlang:statistics(reductions)) could be uniform. The benchmark was ran (with R14B01, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

erl -noshell -pz ebin -s run test -s init stop
TEST pseudo_randomness
N == 10000 (10 runs)
crypto:rand_uniform/ get:    58996.8 µs ( 26.1)
        erlang:now/0 get:    10001.3 µs (  4.4)
erlang:process_info( get:     2256.5 µs (  1.0)
erlang:statistics(r) get:     2436.1 µs (  1.1)
    random:uniform/1 get:     6723.0 µs (  3.0)

Erlang JSON Encoding/Decoding

Benchmark results for JSON files of various sizes for ejson, mochijson2, rfc4627, and jsx. The benchmark was ran (with R14B01, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

erl -noshell -pz ebin -s run test -s init stop
TEST json_encode
N == 100 (10 runs)
  file1 encode ejson get:    28615.4 µs (  2.1)
    file1 encode jsx get:    44310.3 µs (  3.2)
file1 encode mochijs get:    13951.6 µs (  1.0)
file1 encode rfc4627 get:    31408.8 µs (  2.3)
  file2 encode ejson get:    24410.5 µs (  1.7)
    file2 encode jsx get:    43363.5 µs (  3.1)
file2 encode mochijs get:    14239.8 µs (  1.0)
file2 encode rfc4627 get:    31685.9 µs (  2.3)
  file3 encode ejson get:    89951.9 µs (  6.4)
    file3 encode jsx get:   141590.1 µs ( 10.1)
file3 encode mochijs get:    51394.0 µs (  3.7)
file3 encode rfc4627 get:   105642.4 µs (  7.6)
  file4 encode ejson get:    87437.1 µs (  6.3)
    file4 encode jsx get:   149272.8 µs ( 10.7)
file4 encode mochijs get:    52154.8 µs (  3.7)
file4 encode rfc4627 get:   104215.8 µs (  7.5)
  file5 encode ejson get:  1471440.2 µs (105.5)
    file5 encode jsx get:  2058126.0 µs (147.5)
file5 encode mochijs get:  1054604.5 µs ( 75.6)
file5 encode rfc4627 get:  1623437.5 µs (116.4)
  file6 encode ejson get:  1476520.5 µs (105.8)
    file6 encode jsx get:  2030774.4 µs (145.6)
file6 encode mochijs get:  1049391.4 µs ( 75.2)
file6 encode rfc4627 get:  1667034.2 µs (119.5)
  file7 encode ejson get:    54972.6 µs (  3.9)
    file7 encode jsx get:    83343.0 µs (  6.0)
file7 encode mochijs get:    30296.7 µs (  2.2)
file7 encode rfc4627 get:    58950.8 µs (  4.2)
  file8 encode ejson get:    54527.2 µs (  3.9)
    file8 encode jsx get:    84560.0 µs (  6.1)
file8 encode mochijs get:    30187.6 µs (  2.2)
file8 encode rfc4627 get:    56012.6 µs (  4.0)
TEST json_decode
N == 100 (10 runs)
  file1 decode ejson get:    21136.5 µs (  1.0)
    file1 decode jsx get:    23450.2 µs (  1.1)
file1 decode mochijs get:    31773.6 µs (  1.5)
file1 decode rfc4627 get:    28486.0 µs (  1.3)
  file2 decode ejson get:    23377.2 µs (  1.1)
    file2 decode jsx get:    26499.0 µs (  1.3)
file2 decode mochijs get:    46210.6 µs (  2.2)
file2 decode rfc4627 get:    33724.5 µs (  1.6)
  file3 decode ejson get:    86309.2 µs (  4.1)
    file3 decode jsx get:    96833.1 µs (  4.6)
file3 decode mochijs get:   113926.7 µs (  5.4)
file3 decode rfc4627 get:    96695.6 µs (  4.6)
  file4 decode ejson get:   100045.1 µs (  4.7)
    file4 decode jsx get:   111998.8 µs (  5.3)
file4 decode mochijs get:   180957.7 µs (  8.6)
file4 decode rfc4627 get:   153633.4 µs (  7.3)
  file5 decode ejson get:  1116020.8 µs ( 52.8)
    file5 decode jsx get:   936208.7 µs ( 44.3)
file5 decode mochijs get:  1236134.1 µs ( 58.5)
file5 decode rfc4627 get:  1316855.2 µs ( 62.3)
  file6 decode ejson get:  1362946.8 µs ( 64.5)
    file6 decode jsx get:  1056306.2 µs ( 50.0)
file6 decode mochijs get:  1714188.8 µs ( 81.1)
file6 decode rfc4627 get:  1058858.4 µs ( 50.1)
  file7 decode ejson get:    42934.6 µs (  2.0)
    file7 decode jsx get:    55812.2 µs (  2.6)
file7 decode mochijs get:    57909.8 µs (  2.7)
file7 decode rfc4627 get:    50009.7 µs (  2.4)
  file8 decode ejson get:    52428.6 µs (  2.5)
    file8 decode jsx get:    67999.6 µs (  3.2)
file8 decode mochijs get:    90002.6 µs (  4.3)
file8 decode rfc4627 get:    56971.5 µs (  2.7)

Erlang Hash Table Layered Implementation

After seeing the slow storage speed on my simple Erlang hash table implementation (hasht), I attempted to create a faster data structure. With smaller nested tuples that are smaller segments of a hash table, my new hashtl (Hash Table Layered) Erlang data structure does provide faster storage than hasht. However, hashtl has slightly slower fetch times that are similar to dict fetch times (dict uses a 2 layered hash table that is dynamically sized). The hashtl performance is roughly opposite the trie data structure, since the trie has quick "get" times on string keys, while the hashtl has quick "set" times on string keys (i.e., the ratios in the parentheses for the opposite operations are roughly equivalent between the two data structures, when only considering string keys). The benchmark results are from Linux version 2.6.32-21, Ubuntu 10.04 with an Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz and non-HiPE Erlang R14B01:
TEST string_key
N == 50 (10 runs)
              aadict get:       19.0 µs (  1.1), set:       58.5 µs (  1.2)
                dict get:       32.5 µs (  1.9), set:      118.4 µs (  2.4)
           ets (set) get:       32.4 µs (  1.9), set:       54.7 µs (  1.1)
       ets x10 (set) get:       45.2 µs (  2.6)
            gb_trees get:       21.0 µs (  1.2), set:       92.4 µs (  1.9)
               hasht get:       21.6 µs (  1.3), set:      177.4 µs (  3.6)
              hashtl get:       31.5 µs (  1.8), set:       54.0 µs (  1.1)
  process dictionary get:       18.8 µs (  1.1), set:       53.1 µs (  1.1)
              rbdict get:       19.9 µs (  1.2), set:       49.4 µs (  1.0)
                trie get:       17.1 µs (  1.0), set:      105.6 µs (  2.1)
N == 100 (10 runs)
              aadict get:       44.3 µs (  1.3), set:      129.2 µs (  1.3)
                dict get:       75.2 µs (  2.1), set:      235.2 µs (  2.4)
           ets (set) get:       62.1 µs (  1.8), set:      103.8 µs (  1.1)
       ets x10 (set) get:      158.0 µs (  4.5)
            gb_trees get:       49.8 µs (  1.4), set:      188.3 µs (  1.9)
               hasht get:       42.7 µs (  1.2), set:      269.7 µs (  2.8)
              hashtl get:       56.2 µs (  1.6), set:       97.9 µs (  1.0)
  process dictionary get:       38.1 µs (  1.1), set:      112.4 µs (  1.1)
              rbdict get:       45.1 µs (  1.3), set:      107.4 µs (  1.1)
                trie get:       35.2 µs (  1.0), set:      198.5 µs (  2.0)
N == 250 (10 runs)
              aadict get:      138.8 µs (  1.5), set:      400.4 µs (  2.0)
                dict get:      175.6 µs (  1.9), set:      682.3 µs (  3.4)
           ets (set) get:      149.5 µs (  1.6), set:      255.2 µs (  1.3)
       ets x10 (set) get:      259.8 µs (  2.8)
            gb_trees get:      154.2 µs (  1.7), set:      530.6 µs (  2.6)
               hasht get:      131.7 µs (  1.4), set:      793.0 µs (  3.9)
              hashtl get:      167.7 µs (  1.8), set:      201.1 µs (  1.0)
  process dictionary get:       93.3 µs (  1.0), set:      340.0 µs (  1.7)
              rbdict get:      139.0 µs (  1.5), set:      338.8 µs (  1.7)
                trie get:       96.4 µs (  1.0), set:      510.3 µs (  2.5)
N == 500 (10 runs)
              aadict get:      340.8 µs (  1.8), set:      999.1 µs (  2.0)
                dict get:      372.0 µs (  2.0), set:     1555.1 µs (  3.1)
           ets (set) get:      307.4 µs (  1.6), set:      614.2 µs (  1.2)
       ets x10 (set) get:      461.6 µs (  2.5)
            gb_trees get:      359.9 µs (  1.9), set:     1358.1 µs (  2.7)
               hasht get:      472.9 µs (  2.5), set:     1968.7 µs (  3.9)
              hashtl get:      465.1 µs (  2.5), set:      506.7 µs (  1.0)
  process dictionary get:      187.6 µs (  1.0), set:      578.7 µs (  1.1)
              rbdict get:      327.2 µs (  1.7), set:      810.9 µs (  1.6)
                trie get:      214.7 µs (  1.1), set:     1148.8 µs (  2.3)
N == 1000 (10 runs)
              aadict get:      806.1 µs (  1.3), set:     2299.6 µs (  1.6)
                dict get:      931.0 µs (  1.5), set:     3238.2 µs (  2.2)
           ets (set) get:      804.5 µs (  1.3), set:     1534.6 µs (  1.0)
       ets x10 (set) get:      968.7 µs (  1.6)
            gb_trees get:      876.5 µs (  1.4), set:     2706.6 µs (  1.9)
               hasht get:     1324.3 µs (  2.2), set:    10091.8 µs (  6.9)
              hashtl get:     1045.4 µs (  1.7), set:     1462.6 µs (  1.0)
  process dictionary get:      604.8 µs (  1.0), set:     1465.2 µs (  1.0)
              rbdict get:      794.7 µs (  1.3), set:     1782.3 µs (  1.2)
                trie get:      636.7 µs (  1.1), set:     2463.2 µs (  1.7)
N == 2000 (10 runs)
              aadict get:     2024.5 µs (  1.1), set:     5195.9 µs (  1.6)
                dict get:     2688.9 µs (  1.5), set:     6884.1 µs (  2.1)
           ets (set) get:     2591.4 µs (  1.4), set:     5177.3 µs (  1.6)
       ets x10 (set) get:     2733.6 µs (  1.5)
            gb_trees get:     2226.3 µs (  1.2), set:     6122.8 µs (  1.9)
               hasht get:     2589.5 µs (  1.4), set:    30504.6 µs (  9.3)
              hashtl get:     2991.4 µs (  1.6), set:     3288.4 µs (  1.0)
  process dictionary get:     2114.5 µs (  1.2), set:     4030.5 µs (  1.2)
              rbdict get:     2025.2 µs (  1.1), set:     4224.6 µs (  1.3)
                trie get:     1837.8 µs (  1.0), set:     5675.4 µs (  1.7)
N == 4000 (10 runs)
              aadict get:     5926.7 µs (  1.4), set:    12774.6 µs (  1.8)
                dict get:     6796.7 µs (  1.5), set:    18243.7 µs (  2.6)
           ets (set) get:     6281.3 µs (  1.4), set:     7824.5 µs (  1.1)
       ets x10 (set) get:     8046.3 µs (  1.8)
            gb_trees get:     6301.9 µs (  1.4), set:    14376.7 µs (  2.1)
               hasht get:     5460.8 µs (  1.2), set:   103768.9 µs ( 14.9)
              hashtl get:     7137.7 µs (  1.6), set:     6957.0 µs (  1.0)
  process dictionary get:     5115.8 µs (  1.2), set:    10582.6 µs (  1.5)
              rbdict get:     5963.5 µs (  1.4), set:    10554.8 µs (  1.5)
                trie get:     4386.7 µs (  1.0), set:    13996.3 µs (  2.0)
N == 8000 (10 runs)
              aadict get:    16002.9 µs (  1.9), set:    31812.0 µs (  2.0)
                dict get:    14557.8 µs (  1.7), set:    45314.4 µs (  2.8)
           ets (set) get:     1.34e4 µs (  1.6), set:    17021.6 µs (  1.1)
       ets x10 (set) get:    17172.7 µs (  2.0)
            gb_trees get:    16952.9 µs (  2.0), set:    35821.8 µs (  2.2)
               hasht get:    11080.5 µs (  1.3), set:   408960.8 µs ( 25.5)
              hashtl get:    14790.3 µs (  1.7), set:    16040.3 µs (  1.0)
  process dictionary get:    10445.2 µs (  1.2), set:    21810.0 µs (  1.4)
              rbdict get:    15936.7 µs (  1.9), set:    28578.3 µs (  1.8)
                trie get:     8543.9 µs (  1.0), set:    27808.7 µs (  1.7)
N == 16000 (10 runs)
              aadict get:    40660.4 µs (  2.0), set:    76056.4 µs (  2.0)
                dict get:    30382.6 µs (  1.5), set:   115643.4 µs (  3.1)
           ets (set) get:    28500.2 µs (  1.4), set:    37204.1 µs (  1.0)
       ets x10 (set) get:    35423.6 µs (  1.7)
            gb_trees get:    43823.2 µs (  2.1), set:    88600.6 µs (  2.4)
               hasht get:    22523.7 µs (  1.1), set:  2302517.3 µs ( 61.9)
              hashtl get:    29762.8 µs (  1.4), set:    41664.0 µs (  1.1)
  process dictionary get:    20756.7 µs (  1.0), set:    45505.0 µs (  1.2)
              rbdict get:    40808.4 µs (  2.0), set:    67970.2 µs (  1.8)
                trie get:    21091.4 µs (  1.0), set:    83384.2 µs (  2.2)
N == 32000 (10 runs)
              aadict get:   100165.9 µs (  2.5), set:   182365.3 µs (  2.3)
                dict get:    65931.7 µs (  1.6), set:   281306.1 µs (  3.6)
           ets (set) get:    58365.9 µs (  1.4), set:    86377.8 µs (  1.1)
       ets x10 (set) get:    70832.4 µs (  1.7)
            gb_trees get:   108996.9 µs (  2.7), set:   201768.6 µs (  2.6)
               hasht get:    46683.5 µs (  1.1), set:  7559194.8 µs ( 96.3)
              hashtl get:    68377.4 µs (  1.7), set:    78516.3 µs (  1.0)
  process dictionary get:    40873.3 µs (  1.0), set:    90079.1 µs (  1.1)
              rbdict get:   101229.4 µs (  2.5), set:   160677.5 µs (  2.0)
                trie get:    52120.7 µs (  1.3), set:   205179.4 µs (  2.6)
N == 64000 (10 runs)
              aadict get:   236930.0 µs (  5.3), set:   434665.3 µs (  2.5)
                dict get:   140327.2 µs (  3.1), set:   955742.9 µs (  5.4)
           ets (set) get:   116628.2 µs (  2.6), set:   177744.3 µs (  1.0)
       ets x10 (set) get:   136503.0 µs (  3.0)
            gb_trees get:   258394.8 µs (  5.8), set:   475881.6 µs (  2.7)
               hasht get:    86721.3 µs (  1.9), set: 40883420.3 µs (232.6)
              hashtl get:   132184.1 µs (  2.9), set:   189342.2 µs (  1.1)
  process dictionary get:    80194.3 µs (  1.8), set:   175787.6 µs (  1.0)
              rbdict get:   236683.9 µs (  5.3), set:   393917.9 µs (  2.2)
                trie get:    44896.4 µs (  1.0), set:   406019.4 µs (  2.3)
TEST integer_key
N == 50 (10 runs)
              aadict get:       17.0 µs (  3.2), set:       70.9 µs (  7.2)
     array (dynamic) get:       15.5 µs (  2.9), set:       25.1 µs (  2.5)
       array (fixed) get:       15.5 µs (  2.9), set:       27.3 µs (  2.8)
                dict get:       22.9 µs (  4.3), set:       47.2 µs (  4.8)
           ets (set) get:       16.3 µs (  3.1), set:       28.9 µs (  2.9)
       ets x10 (set) get:       20.4 µs (  3.8)
            gb_trees get:       16.6 µs (  3.1), set:       99.9 µs ( 10.1)
               hasht get:       11.5 µs (  2.2), set:      117.8 µs ( 11.9)
              hashtl get:       16.0 µs (  3.0), set:       38.4 µs (  3.9)
  process dictionary get:        7.9 µs (  1.5), set:        9.9 µs (  1.0)
              rbdict get:       16.0 µs (  3.0), set:       43.2 µs (  4.4)
               tuple get:        5.3 µs (  1.0), set:       12.1 µs (  1.2)
N == 100 (10 runs)
              aadict get:       42.4 µs (  3.8), set:      127.8 µs (  6.7)
     array (dynamic) get:       35.7 µs (  3.2), set:       59.6 µs (  3.1)
       array (fixed) get:       36.5 µs (  3.3), set:       60.6 µs (  3.2)
                dict get:       58.0 µs (  5.2), set:      137.0 µs (  7.1)
           ets (set) get:       39.4 µs (  3.5), set:       70.6 µs (  3.7)
       ets x10 (set) get:       37.9 µs (  3.4)
            gb_trees get:       43.3 µs (  3.9), set:      544.8 µs ( 28.4)
               hasht get:       26.7 µs (  2.4), set:      257.0 µs ( 13.4)
              hashtl get:       40.5 µs (  3.6), set:      102.5 µs (  5.3)
  process dictionary get:       18.7 µs (  1.7), set:       19.2 µs (  1.0)
              rbdict get:       42.6 µs (  3.8), set:      120.3 µs (  6.3)
               tuple get:       11.1 µs (  1.0), set:       52.8 µs (  2.8)
N == 250 (10 runs)
              aadict get:      123.9 µs (  3.8), set:      392.2 µs (  8.6)
     array (dynamic) get:      121.6 µs (  3.7), set:      215.9 µs (  4.8)
       array (fixed) get:      121.9 µs (  3.7), set:      378.6 µs (  8.3)
                dict get:      143.7 µs (  4.4), set:      476.0 µs ( 10.5)
           ets (set) get:       98.0 µs (  3.0), set:      187.3 µs (  4.1)
       ets x10 (set) get:       73.5 µs (  2.3)
            gb_trees get:      124.0 µs (  3.8), set:      977.7 µs ( 21.5)
               hasht get:       68.0 µs (  2.1), set:      646.4 µs ( 14.2)
              hashtl get:      123.8 µs (  3.8), set:      329.9 µs (  7.3)
  process dictionary get:       40.8 µs (  1.3), set:       45.4 µs (  1.0)
              rbdict get:      124.2 µs (  3.8), set:      365.7 µs (  8.1)
               tuple get:       32.6 µs (  1.0), set:      273.5 µs (  6.0)
N == 500 (10 runs)
              aadict get:      272.6 µs (  5.8), set:      918.0 µs ( 10.3)
     array (dynamic) get:      208.8 µs (  4.5), set:      380.0 µs (  4.3)
       array (fixed) get:      209.1 µs (  4.5), set:      391.9 µs (  4.4)
                dict get:      267.8 µs (  5.7), set:     1089.2 µs ( 12.3)
           ets (set) get:      189.0 µs (  4.0), set:      339.7 µs (  3.8)
       ets x10 (set) get:      123.6 µs (  2.6)
            gb_trees get:      256.7 µs (  5.5), set:     2419.5 µs ( 27.2)
               hasht get:      152.6 µs (  3.3), set:     1541.5 µs ( 17.3)
              hashtl get:      253.2 µs (  5.4), set:      408.1 µs (  4.6)
  process dictionary get:       84.9 µs (  1.8), set:       88.9 µs (  1.0)
              rbdict get:      256.2 µs (  5.5), set:      840.5 µs (  9.5)
               tuple get:       46.9 µs (  1.0), set:     2125.3 µs ( 23.9)
N == 1000 (10 runs)
              aadict get:      478.9 µs (  5.4), set:     1683.9 µs ( 11.0)
     array (dynamic) get:      415.5 µs (  4.7), set:      857.5 µs (  5.6)
       array (fixed) get:      453.3 µs (  5.1), set:      888.8 µs (  5.8)
                dict get:      448.0 µs (  5.0), set:     2231.5 µs ( 14.5)
           ets (set) get:      334.7 µs (  3.8), set:     1006.4 µs (  6.6)
       ets x10 (set) get:      203.2 µs (  2.3)
            gb_trees get:      477.3 µs (  5.4), set:     4498.8 µs ( 29.3)
               hasht get:      321.7 µs (  3.6), set:     7766.1 µs ( 50.6)
              hashtl get:      561.4 µs (  6.3), set:      830.2 µs (  5.4)
  process dictionary get:      138.8 µs (  1.6), set:      153.6 µs (  1.0)
              rbdict get:      488.3 µs (  5.5), set:     1596.8 µs ( 10.4)
               tuple get:       89.0 µs (  1.0), set:     7330.6 µs ( 47.7)
N == 2000 (10 runs)
              aadict get:     1096.2 µs (  6.4), set:     3939.2 µs ( 13.9)
     array (dynamic) get:      996.5 µs (  5.8), set:     2026.3 µs (  7.2)
       array (fixed) get:      994.3 µs (  5.8), set:     1856.3 µs (  6.6)
                dict get:      934.0 µs (  5.5), set:     3885.4 µs ( 13.7)
           ets (set) get:      714.5 µs (  4.2), set:     2270.4 µs (  8.0)
       ets x10 (set) get:      390.3 µs (  2.3)
            gb_trees get:     1061.6 µs (  6.2), set:     9302.3 µs ( 32.9)
               hasht get:      559.6 µs (  3.3), set:    26775.2 µs ( 94.6)
              hashtl get:     1239.9 µs (  7.3), set:     1985.6 µs (  7.0)
  process dictionary get:      268.6 µs (  1.6), set:      283.1 µs (  1.0)
              rbdict get:     1083.6 µs (  6.3), set:     3509.0 µs ( 12.4)
               tuple get:      170.9 µs (  1.0), set:    22928.5 µs ( 81.0)
N == 4000 (10 runs)
              aadict get:     2598.7 µs (  7.5), set:     9143.9 µs ( 15.4)
     array (dynamic) get:     1991.0 µs (  5.7), set:     3780.1 µs (  6.3)
       array (fixed) get:     1994.8 µs (  5.8), set:     3619.8 µs (  6.1)
                dict get:     1959.2 µs (  5.7), set:    10221.3 µs ( 17.2)
           ets (set) get:     1450.3 µs (  4.2), set:     4560.7 µs (  7.7)
       ets x10 (set) get:      766.1 µs (  2.2)
            gb_trees get:     2220.8 µs (  6.4), set:    22137.9 µs ( 37.2)
               hasht get:     1032.0 µs (  3.0), set:   107334.7 µs (180.2)
              hashtl get:     2263.7 µs (  6.5), set:     3107.2 µs (  5.2)
  process dictionary get:      526.6 µs (  1.5), set:      595.6 µs (  1.0)
              rbdict get:     2581.0 µs (  7.4), set:     8322.6 µs ( 14.0)
               tuple get:      346.5 µs (  1.0), set:    91854.1 µs (154.2)
N == 8000 (10 runs)
              aadict get:     5625.3 µs (  8.6), set:    21650.5 µs ( 19.5)
     array (dynamic) get:     3998.4 µs (  6.1), set:     7457.9 µs (  6.7)
       array (fixed) get:     4014.6 µs (  6.2), set:     7380.2 µs (  6.7)
                dict get:     4200.5 µs (  6.5), set:    21589.1 µs ( 19.5)
           ets (set) get:     2958.9 µs (  4.5), set:     8535.7 µs (  7.7)
       ets x10 (set) get:     1509.6 µs (  2.3)
            gb_trees get:     4922.7 µs (  7.6), set:    55256.5 µs ( 49.9)
               hasht get:     1990.7 µs (  3.1), set:   394366.0 µs (356.0)
              hashtl get:     5715.1 µs (  8.8), set:     6898.9 µs (  6.2)
  process dictionary get:     1057.2 µs (  1.6), set:     1107.8 µs (  1.0)
              rbdict get:     5455.4 µs (  8.4), set:    24543.2 µs ( 22.2)
               tuple get:      650.9 µs (  1.0), set:   365080.7 µs (329.6)
N == 16000 (10 runs)
              aadict get:    11302.9 µs (  8.7), set:    38483.2 µs ( 15.1)
     array (dynamic) get:     9810.6 µs (  7.6), set:    21888.0 µs (  8.6)
       array (fixed) get:    10283.3 µs (  7.9), set:    21910.1 µs (  8.6)
                dict get:     8249.0 µs (  6.4), set:    60084.0 µs ( 23.6)
           ets (set) get:     5965.8 µs (  4.6), set:    14583.2 µs (  5.7)
       ets x10 (set) get:     3052.2 µs (  2.4)
            gb_trees get:    10263.4 µs (  7.9), set:   126358.0 µs ( 49.7)
               hasht get:     3966.8 µs (  3.1), set:  1570470.8 µs (617.4)
              hashtl get:    11766.2 µs (  9.1), set:    17786.2 µs (  7.0)
  process dictionary get:     2094.7 µs (  1.6), set:     2543.8 µs (  1.0)
              rbdict get:    11263.8 µs (  8.7), set:    36493.2 µs ( 14.3)
               tuple get:     1294.3 µs (  1.0), set:  1486397.5 µs (584.3)
N == 32000 (10 runs)
              aadict get:    22952.3 µs (  8.9), set:    85579.9 µs ( 14.6)
     array (dynamic) get:    19565.7 µs (  7.6), set:    40450.3 µs (  6.9)
       array (fixed) get:    19597.6 µs (  7.6), set:    39994.8 µs (  6.8)
                dict get:    17543.6 µs (  6.8), set:   185507.8 µs ( 31.7)
           ets (set) get:    13456.0 µs (  5.2), set:    29891.5 µs (  5.1)
       ets x10 (set) get:     6824.8 µs (  2.6)
            gb_trees get:    22242.6 µs (  8.6), set:   244581.5 µs ( 41.8)
               hasht get:     8159.9 µs (  3.2), set:  6891457.9 µs (*****)
              hashtl get:    24608.9 µs (  9.5), set:    43185.2 µs (  7.4)
  process dictionary get:     4210.2 µs (  1.6), set:     5853.7 µs (  1.0)
              rbdict get:    22806.8 µs (  8.8), set:    84932.4 µs ( 14.5)
               tuple get:     2581.2 µs (  1.0), set:  6422056.3 µs (*****)
N == 64000 (10 runs)
              aadict get:    47862.1 µs (  9.4), set:   191716.5 µs ( 15.6)
     array (dynamic) get:    39008.0 µs (  7.7), set:    77311.6 µs (  6.3)
       array (fixed) get:    38962.4 µs (  7.6), set:    78340.9 µs (  6.4)
                dict get:    38602.2 µs (  7.6), set:   625891.7 µs ( 50.9)
           ets (set) get:    35109.1 µs (  6.9), set:    69833.4 µs (  5.7)
       ets x10 (set) get:    16434.1 µs (  3.2)
            gb_trees get:    46772.8 µs (  9.2), set:   513581.4 µs ( 41.8)
               hasht get:    18970.6 µs (  3.7), set: 33729619.8 µs (*****)
              hashtl get:    59980.1 µs ( 11.8), set:   108446.2 µs (  8.8)
  process dictionary get:     8468.0 µs (  1.7), set:    12288.2 µs (  1.0)
              rbdict get:    48106.1 µs (  9.4), set:   178132.4 µs ( 14.5)
               tuple get:     5097.6 µs (  1.0), set: 35494023.9 µs (*****)

Erlang Hash Table

I created hasht, a simple hash table implementation that performs quick lookups. Setting the structure is quite slow because it always modifies a single large tuple. However, with a more tiered approach, the storage speed might increase. The hasht lookup time for strings is similar to the trie data structure, but is slower for smaller N. The hasht lookup time for integers is similar to ETS (unless the ETS table is accessed with greater than ~10 processes concurrently with read concurrency optimizations set on the table), but the process dictionary and tuple are both quicker for integer lookups. The benchmark results are from Linux version 2.6.32-21, Ubuntu 10.04 with an Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz and non-HiPE Erlang R14B01:
TEST string_key
N == 1000 (10 runs)
              aadict get:      828.6 µs (  1.5), set:     2226.0 µs (  1.4)
                dict get:      940.9 µs (  1.7), set:     3184.6 µs (  2.0)
           ets (set) get:      828.5 µs (  1.5), set:     1586.9 µs (  1.0)
       ets x10 (set) get:     1030.8 µs (  1.8)
            gb_trees get:      897.6 µs (  1.6), set:     2786.5 µs (  1.8)
               hasht get:     1392.5 µs (  2.5), set:     9815.1 µs (  6.2)
  process dictionary get:      566.3 µs (  1.0), set:     1618.2 µs (  1.0)
              rbdict get:      817.3 µs (  1.4), set:     1806.3 µs (  1.1)
                trie get:      650.9 µs (  1.1), set:     2466.6 µs (  1.6)
N == 10000 (10 runs)
              aadict get:    21320.8 µs (  2.6), set:    45882.9 µs (  2.1)
                dict get:    17345.4 µs (  2.1), set:    59625.4 µs (  2.7)
           ets (set) get:    16985.1 µs (  2.0), set:    22305.2 µs (  1.0)
       ets x10 (set) get:    21002.8 µs (  2.5)
            gb_trees get:    23835.2 µs (  2.9), set:    47572.9 µs (  2.1)
               hasht get:    13345.9 µs (  1.6), set:  1030469.0 µs ( 46.2)
  process dictionary get:    12837.2 µs (  1.5), set:    25609.6 µs (  1.1)
              rbdict get:    21680.3 µs (  2.6), set:    42978.7 µs (  1.9)
                trie get:     8351.5 µs (  1.0), set:    35003.4 µs (  1.6)
N == 50000 (10 runs)
              aadict get:   179311.1 µs (  2.9), set:   326961.5 µs (  2.4)
                dict get:   111803.5 µs (  1.8), set:   615110.2 µs (  4.6)
           ets (set) get:    91941.2 µs (  1.5), set:   135026.9 µs (  1.0)
       ets x10 (set) get:   107469.8 µs (  1.7)
            gb_trees get:   195219.1 µs (  3.1), set:   358408.6 µs (  2.7)
               hasht get:    66421.6 µs (  1.1), set: 27200459.7 µs (201.4)
  process dictionary get:    62683.4 µs (  1.0), set:   141821.2 µs (  1.1)
              rbdict get:   182102.3 µs (  2.9), set:   294744.7 µs (  2.2)
                trie get:    71213.7 µs (  1.1), set:   357100.4 µs (  2.6)
TEST integer_key
N == 1000 (10 runs)
              aadict get:      482.5 µs (  5.6), set:     1917.5 µs ( 13.1)
     array (dynamic) get:      393.3 µs (  4.5), set:      721.6 µs (  4.9)
       array (fixed) get:      393.7 µs (  4.6), set:     1080.2 µs (  7.4)
                dict get:      447.6 µs (  5.2), set:     2010.5 µs ( 13.7)
           ets (set) get:      333.3 µs (  3.9), set:      748.7 µs (  5.1)
       ets x10 (set) get:      203.2 µs (  2.3)
            gb_trees get:      476.2 µs (  5.5), set:     4339.8 µs ( 29.6)
               hasht get:      358.1 µs (  4.1), set:     7134.3 µs ( 48.7)
  process dictionary get:      133.6 µs (  1.5), set:      146.5 µs (  1.0)
              rbdict get:      478.0 µs (  5.5), set:     2027.6 µs ( 13.8)
               tuple get:       86.5 µs (  1.0), set:     6903.4 µs ( 47.1)
N == 10000 (10 runs)
              aadict get:     6670.9 µs (  8.3), set:    24593.4 µs ( 14.3)
     array (dynamic) get:     4968.5 µs (  6.1), set:     9641.3 µs (  5.6)
       array (fixed) get:     5057.3 µs (  6.3), set:     8929.1 µs (  5.2)
                dict get:     4857.0 µs (  6.0), set:    29415.4 µs ( 17.1)
           ets (set) get:     3695.3 µs (  4.6), set:     8625.1 µs (  5.0)
       ets x10 (set) get:     1896.0 µs (  2.3)
            gb_trees get:     6234.5 µs (  7.7), set:    72117.2 µs ( 42.0)
               hasht get:     2386.1 µs (  3.0), set:  1042563.7 µs (607.4)
  process dictionary get:     1317.5 µs (  1.6), set:     1716.3 µs (  1.0)
              rbdict get:     6772.1 µs (  8.4), set:    25672.9 µs ( 15.0)
               tuple get:      808.1 µs (  1.0), set:   613975.0 µs (357.7)
N == 50000 (10 runs)
              aadict get:    36984.9 µs (  9.2), set:   166405.2 µs ( 16.8)
     array (dynamic) get:    30181.9 µs (  7.5), set:    58707.0 µs (  5.9)
       array (fixed) get:    30171.7 µs (  7.5), set:    60833.8 µs (  6.2)
                dict get:    27988.1 µs (  7.0), set:   408414.5 µs ( 41.3)
           ets (set) get:    25076.6 µs (  6.3), set:    50396.9 µs (  5.1)
       ets x10 (set) get:    12081.4 µs (  3.0)
            gb_trees get:    35874.3 µs (  9.0), set:   405197.7 µs ( 41.0)
               hasht get:    13516.8 µs (  3.4), set: 26560861.7 µs (*****)
  process dictionary get:     6687.2 µs (  1.7), set:     9888.3 µs (  1.0)
              rbdict get:    36388.4 µs (  9.1), set:   152543.6 µs ( 15.4)
               tuple get:     4002.1 µs (  1.0), set: 18081650.6 µs (*****)