You are viewing [info]okeuday's journal

Michael Truog
The other priority queue implementations were meant to avoid the O(N) task of finding a priority within the implementation used by both Riak and RabbitMQ. The previous results showed that "priority_queue" is best for using 1 priority, "pqueue" is best for using 2 to 41 priorities, and "pqueue3" is best for using 64 or more priorities. Both the "pqueue" data structure and the "pqueue3" data structure benefited from O(1) lookup time within an Erlang tuple. However, the "pqueue" data structure has a static limitation of 41 priorities, whereas the "pqueue3" data structure supports an arbitrary number of priorities (at creation time). To merge the best of both data structures, I created the "pqueue4" data structure, which has a static limitation of 257 priorities but is implemented in a way that is similar to "pqueue".

The "pqueue4" data structure supports the priorities -128 (high) to +128 (low) and compares favorably with both "pqueue" and "pqueue3". The "pqueue4" data structure is faster than the "pqueue3" data structure, but is slightly slower than the "pqueue" data structure (due to its size). The "pqueue4" provides the best performance if 257 priorities need to be supported, and the latency will never grow based on the size of the priority queue (like it does with the "priority_queue" and "pqueue2" data structures). The benchmark was ran (with R14B04, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

TEST pqueue_priority0
N == 1000000 (10 runs)
              pqueue get:   449060.6 µs (  1.7), set:   363759.6 µs (  2.3)
             pqueue2 get:   354847.0 µs (  1.4), set:   413158.5 µs (  2.6)
             pqueue3 get:   867085.4 µs (  3.4), set:   815489.8 µs (  5.2)
             pqueue4 get:   540231.5 µs (  2.1), set:   466625.8 µs (  3.0)
      priority_queue get:   257257.6 µs (  1.0), set:   156641.0 µs (  1.0)
TEST pqueue_priorities2
N == 1000000 (10 runs)
           2x pqueue get:   457418.0 µs (  1.3), set:   456008.8 µs (  1.0)
          2x pqueue2 get:   357993.1 µs (  1.0), set:   534914.7 µs (  1.2)
          2x pqueue3 get:   879511.6 µs (  2.5), set:   768901.8 µs (  1.7)
          2x pqueue4 get:   590378.9 µs (  1.6), set:   530402.6 µs (  1.2)
   2x priority_queue get:   428915.3 µs (  1.2), set:   519009.0 µs (  1.1)
TEST pqueue_priorities41
N == 1000000 (10 runs)
           41 pqueue get:   482050.6 µs (  1.5), set:   485457.7 µs (  1.0)
          41 pqueue2 get:   329309.2 µs (  1.0), set:  2489879.6 µs (  5.1)
          41 pqueue3 get:   928454.4 µs (  2.8), set:   834174.2 µs (  1.7)
          41 pqueue4 get:   682014.1 µs (  2.1), set:   646907.9 µs (  1.3)
   41 priority_queue get:   398687.1 µs (  1.2), set:  1600208.7 µs (  3.3)
TEST pqueue_priorities64
N == 1000000 (10 runs)
          64 pqueue2 get:   326512.2 µs (  1.0), set:  3582432.8 µs (  5.5)
          64 pqueue3 get:   875909.0 µs (  2.7), set:   827656.1 µs (  1.3)
          64 pqueue4 get:   668160.2 µs (  2.0), set:   647776.4 µs (  1.0)
   64 priority_queue get:   395522.9 µs (  1.2), set:  2238489.8 µs (  3.5)

After looking at various priority queue implementation possibilities and creating 3 separate implementations, it became necessary to test the data structures for correctness. Jesper Louis Andersen added PropEr integration, which provided randomized test cases to determine correctness, based on a priority queue model. Then, it became clear that the efficiency testing needed to focus on separate cases for different priority usage patterns. The main cases are usage of: a single priority, 2 priorities, 41 priorities, and 64 priorities, where all the priorities are evenly distributed with random, concurrent usage. The results indicate that "priority_queue" is best for using 1 priority, "pqueue" is best for using 2 to 41 priorities, and "pqueue3" is best for using 64 or more priorities. The benchmark was ran (with R14B04, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

TEST pqueue_priority0
N == 1000000 (10 runs)
              pqueue get:   444087.9 µs (  1.8), set:   378655.3 µs (  2.3)
             pqueue2 get:   346286.3 µs (  1.4), set:   407807.4 µs (  2.4)
             pqueue3 get:   877157.6 µs (  3.6), set:   810117.1 µs (  4.8)
      priority_queue get:   245788.8 µs (  1.0), set:   168242.6 µs (  1.0)
TEST pqueue_priorities2
N == 1000000 (10 runs)
           2x pqueue get:   437337.3 µs (  1.2), set:   430425.9 µs (  1.0)
          2x pqueue2 get:   363276.3 µs (  1.0), set:   514642.0 µs (  1.2)
          2x pqueue3 get:   871478.7 µs (  2.4), set:   759261.5 µs (  1.8)
   2x priority_queue get:   419643.7 µs (  1.2), set:   501706.6 µs (  1.2)
TEST pqueue_priorities41
N == 1000000 (10 runs)
           41 pqueue get:   538690.6 µs (  1.6), set:   509626.7 µs (  1.0)
          41 pqueue2 get:   336863.6 µs (  1.0), set:  2518901.4 µs (  4.9)
          41 pqueue3 get:   906567.4 µs (  2.7), set:   816174.1 µs (  1.6)
   41 priority_queue get:   400770.3 µs (  1.2), set:  1676239.8 µs (  3.3)
TEST pqueue_priorities64
N == 1000000 (10 runs)
          64 pqueue2 get:   337312.1 µs (  1.0), set:  3721587.8 µs (  4.3)
          64 pqueue3 get:   923038.7 µs (  2.7), set:   863387.5 µs (  1.0)
   64 priority_queue get:   399396.4 µs (  1.2), set:  2178070.5 µs (  2.5)
Functional languages don't have mutable data (in the traditional sense) so certain data structures become more difficult to implement. A priority queue is normally implemented with a heap, however, after looking at heap implementations within Chris Okasaki's book ("Purely Functional Data Structures"), it didn't seem correct to use an Erlang heap data structure to implement a priority queue.

The previous Erlang priority queue data structure is used within both Riak and RabbitMQ (see priority_queue.erl). However, the "in" (i.e., "append") operation seemed quite slow because of its usage of a list of tuples to store the separate queues for each priority.

So, I created a priority queue implementation called "pqueue" that relies on tuple storage for individual priorities. However, because I am using tuple storage, I needed to use a static range of priorities. I chose the common UNIX operating system priority range for nice (-20 (high) to 20 (low)). I was able to obtain fast "in" operations at the slight expense of the "out" operations, as shown below. The benchmark was ran (with R14B02, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

TEST run_priority_queue
N == 1000000 (10 runs)
              pqueue get:   475747.2 µs (  1.3), set:   504762.3 µs (  1.0)
      priority_queue get:   372439.6 µs (  1.0), set:  1466042.1 µs (  2.9)
I have updated results for the erlbench Erlang data structures tests for string and integer associative data storage. These results include both the trie data structure and the statically-sized hash table data structure (hashtl) I have created for more efficient string associative data access. My main justification for creating efficient Erlang data structures is to make Erlang source code more scalable by reducing its dependency on global state (ETS).

The process dictionary is globally accessible with trickery (erlang:process_info/1) and is not used in testable source code, so the process dictionary is considered to be as bad as global state. The reason the process dictionary is bad is because its scope is undefined, since it is attached to the lifetime of the Erlang process (not to mention that it complicates the fault-tolerance of any stored data). The process dictionary is a single dictionary shared with all source code the Erlang process executes, so external source code can make its contents undefined also!

ETS is a more typical way to store global state as key/value pairs. With concurrency (operations that are not atomic) global state depends on locking to maintain consistency and it is the data locking that limits scalability. Erlang's main strength is its avoidance of global data locks, so it is only natural to pursue data structures that avoid global data.

(from string_key test below)
N == 64000 (10 runs)
              aadict get:   233343.8 µs (  5.4), set:   440130.2 µs (  2.6)
                dict get:   137424.3 µs (  3.2), set:   871894.4 µs (  5.2)
   ets (ordered_set) get:   177797.4 µs (  4.1), set:   211416.5 µs (  1.3)
           ets (set) get:   114235.0 µs (  2.6), set:   169276.2 µs (  1.0)
ets x10 read (ordere get:   169068.3 µs (  3.9)
  ets x10 read (set) get:   134240.8 µs (  3.1)
            gb_trees get:   255160.6 µs (  5.9), set:   488567.1 µs (  2.9)
              hashtl get:   129433.1 µs (  3.0), set:   210779.3 µs (  1.3)
  process dictionary get:    77023.9 µs (  1.8), set:   166122.5 µs (  1.0)
              rbdict get:   230949.9 µs (  5.3), set:   372831.9 µs (  2.2)
                trie get:    43435.1 µs (  1.0), set:   379677.6 µs (  2.3)


Just looking at string lookup results for a relatively large number of strings (64000) to hold in RAM shows that the trie string "get" is the fastest and the hashtl "set" is tied for second fastest (with both ETS and the process dictionary being the fastest alternative). You could also say that for accessing 64000 common strings, the trie and the hashtl source code are currently the fastest local string associative data storage in Erlang, if you accept that the scope of the process dictionary is non-local.

The ETS test shows the extra latency that occurs with a string "get" operation on both set and ordered_set with read_concurrency set to true. The latency changes by a factor of 0.951 for the ordered_set and a factor of 1.175 for the set when the number of processes doing concurrent read operations increases from 1 to 10. So the scalability benefit of using the trie (ignoring its efficiency) rather than ETS is obvious for the set but the ordered_set requires testing with more concurrent reads to show scalability benefits.

The benchmark results are from Linux version 2.6.32-21, Ubuntu 10.04 with an Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz and non-HiPE Erlang R14B02:

TEST string_key
N == 50 (10 runs)
              aadict get:       18.5 µs (  1.1), set:       61.5 µs (  1.2)
                dict get:       32.1 µs (  1.9), set:      119.4 µs (  2.4)
   ets (ordered_set) get:       30.6 µs (  1.8), set:       50.0 µs (  1.0)
           ets (set) get:       31.1 µs (  1.8), set:       53.9 µs (  1.1)
ets x10 read (ordere get:       39.8 µs (  2.4)
  ets x10 read (set) get:       41.5 µs (  2.5)
            gb_trees get:       22.9 µs (  1.4), set:       92.3 µs (  1.8)
              hashtl get:       27.4 µs (  1.6), set:       58.7 µs (  1.2)
  process dictionary get:       19.5 µs (  1.2), set:       74.1 µs (  1.5)
              rbdict get:       18.2 µs (  1.1), set:       52.6 µs (  1.1)
                trie get:       16.9 µs (  1.0), set:      108.0 µs (  2.2)
N == 100 (10 runs)
              aadict get:       43.4 µs (  1.3), set:      147.1 µs (  1.5)
                dict get:       68.8 µs (  2.0), set:      251.4 µs (  2.5)
   ets (ordered_set) get:       65.4 µs (  1.9), set:       99.6 µs (  1.0)
           ets (set) get:       59.2 µs (  1.7), set:      100.8 µs (  1.0)
ets x10 read (ordere get:       99.6 µs (  2.9)
  ets x10 read (set) get:       77.7 µs (  2.3)
            gb_trees get:       51.3 µs (  1.5), set:      200.4 µs (  2.0)
              hashtl get:       54.5 µs (  1.6), set:      116.7 µs (  1.2)
  process dictionary get:       38.5 µs (  1.1), set:      224.8 µs (  2.3)
              rbdict get:       43.4 µs (  1.3), set:      111.2 µs (  1.1)
                trie get:       33.9 µs (  1.0), set:      210.2 µs (  2.1)
N == 250 (10 runs)
              aadict get:      135.5 µs (  1.5), set:      430.9 µs (  1.9)
                dict get:      178.5 µs (  1.9), set:      709.3 µs (  3.1)
   ets (ordered_set) get:      167.5 µs (  1.8), set:      311.4 µs (  1.4)
           ets (set) get:      148.8 µs (  1.6), set:      250.1 µs (  1.1)
ets x10 read (ordere get:      266.5 µs (  2.9)
  ets x10 read (set) get:      270.7 µs (  2.9)
            gb_trees get:      155.3 µs (  1.7), set:      685.9 µs (  3.0)
              hashtl get:      161.6 µs (  1.7), set:      229.0 µs (  1.0)
  process dictionary get:       92.6 µs (  1.0), set:      296.4 µs (  1.3)
              rbdict get:      143.2 µs (  1.5), set:      334.2 µs (  1.5)
                trie get:       95.3 µs (  1.0), set:      507.2 µs (  2.2)
N == 500 (10 runs)
              aadict get:      332.5 µs (  1.7), set:     1090.7 µs (  1.9)
                dict get:      364.6 µs (  1.9), set:     1440.2 µs (  2.5)
   ets (ordered_set) get:      410.8 µs (  2.1), set:     1466.0 µs (  2.5)
           ets (set) get:      302.6 µs (  1.6), set:      586.5 µs (  1.0)
ets x10 read (ordere get:      573.1 µs (  3.0)
  ets x10 read (set) get:      494.3 µs (  2.6)
            gb_trees get:      365.2 µs (  1.9), set:     1334.5 µs (  2.3)
              hashtl get:      388.7 µs (  2.0), set:      800.4 µs (  1.4)
  process dictionary get:      191.6 µs (  1.0), set:      718.8 µs (  1.2)
              rbdict get:      330.7 µs (  1.7), set:      755.5 µs (  1.3)
                trie get:      220.4 µs (  1.2), set:      981.8 µs (  1.7)
N == 1000 (10 runs)
              aadict get:      798.7 µs (  1.1), set:     2322.7 µs (  1.5)
                dict get:     1093.8 µs (  1.5), set:     3336.9 µs (  2.1)
   ets (ordered_set) get:     1163.9 µs (  1.6), set:     2343.4 µs (  1.5)
           ets (set) get:      976.2 µs (  1.4), set:     1561.4 µs (  1.0)
ets x10 read (ordere get:     1071.3 µs (  1.5)
  ets x10 read (set) get:     1118.5 µs (  1.6)
            gb_trees get:      927.4 µs (  1.3), set:     3017.3 µs (  1.9)
              hashtl get:     1346.6 µs (  1.9), set:     1754.8 µs (  1.1)
  process dictionary get:      747.5 µs (  1.1), set:     1771.5 µs (  1.1)
              rbdict get:      786.5 µs (  1.1), set:     1806.9 µs (  1.2)
                trie get:      709.1 µs (  1.0), set:     2468.1 µs (  1.6)
N == 2000 (10 runs)
              aadict get:     2196.6 µs (  1.2), set:     5800.6 µs (  1.6)
                dict get:     2980.3 µs (  1.6), set:     7976.8 µs (  2.2)
   ets (ordered_set) get:     3098.8 µs (  1.6), set:     4165.5 µs (  1.1)
           ets (set) get:     2966.1 µs (  1.6), set:     4373.5 µs (  1.2)
ets x10 read (ordere get:     2640.6 µs (  1.4)
  ets x10 read (set) get:     2850.6 µs (  1.5)
            gb_trees get:     2320.3 µs (  1.2), set:     6872.1 µs (  1.9)
              hashtl get:     3581.9 µs (  1.9), set:     3662.4 µs (  1.0)
  process dictionary get:     2389.5 µs (  1.3), set:     4164.2 µs (  1.1)
              rbdict get:     2076.3 µs (  1.1), set:     4576.0 µs (  1.2)
                trie get:     1907.3 µs (  1.0), set:     5780.9 µs (  1.6)
N == 4000 (10 runs)
              aadict get:     5877.5 µs (  1.3), set:    13809.0 µs (  1.7)
                dict get:     6491.3 µs (  1.5), set:    18633.8 µs (  2.4)
   ets (ordered_set) get:     7127.5 µs (  1.6), set:     9145.6 µs (  1.2)
           ets (set) get:     6153.2 µs (  1.4), set:     7919.1 µs (  1.0)
ets x10 read (ordere get:     8061.6 µs (  1.8)
  ets x10 read (set) get:     7786.1 µs (  1.8)
            gb_trees get:     6424.2 µs (  1.5), set:    15693.2 µs (  2.0)
              hashtl get:     7054.5 µs (  1.6), set:     8605.5 µs (  1.1)
  process dictionary get:     4938.2 µs (  1.1), set:     9578.4 µs (  1.2)
              rbdict get:     5788.3 µs (  1.3), set:    11456.6 µs (  1.4)
                trie get:     4376.9 µs (  1.0), set:    13757.2 µs (  1.7)
N == 8000 (10 runs)
              aadict get:    15730.6 µs (  1.8), set:    34177.0 µs (  2.1)
                dict get:    14520.8 µs (  1.6), set:    44817.0 µs (  2.7)
   ets (ordered_set) get:    16009.9 µs (  1.8), set:    19875.4 µs (  1.2)
           ets (set) get:    13122.3 µs (  1.5), set:    16436.0 µs (  1.0)
ets x10 read (ordere get:    19029.0 µs (  2.1)
  ets x10 read (set) get:    16993.7 µs (  1.9)
            gb_trees get:    17208.8 µs (  1.9), set:    36673.0 µs (  2.2)
              hashtl get:    15175.1 µs (  1.7), set:    16599.3 µs (  1.0)
  process dictionary get:    10157.7 µs (  1.1), set:    21110.6 µs (  1.3)
              rbdict get:    15758.7 µs (  1.8), set:    28566.6 µs (  1.7)
                trie get:     8940.0 µs (  1.0), set:    28293.8 µs (  1.7)
N == 16000 (10 runs)
              aadict get:    41375.3 µs (  2.0), set:    82925.0 µs (  2.2)
                dict get:    29842.0 µs (  1.4), set:   108119.5 µs (  2.9)
   ets (ordered_set) get:    36909.9 µs (  1.8), set:    43162.0 µs (  1.2)
           ets (set) get:    28240.8 µs (  1.4), set:    37495.3 µs (  1.0)
ets x10 read (ordere get:    40634.9 µs (  2.0)
  ets x10 read (set) get:    35608.5 µs (  1.7)
            gb_trees get:    44022.7 µs (  2.1), set:    93643.0 µs (  2.5)
              hashtl get:    37419.0 µs (  1.8), set:    48790.5 µs (  1.3)
  process dictionary get:    20684.6 µs (  1.0), set:    43798.3 µs (  1.2)
              rbdict get:    40916.6 µs (  2.0), set:    71371.2 µs (  1.9)
                trie get:    21225.2 µs (  1.0), set:    69109.4 µs (  1.8)
N == 32000 (10 runs)
              aadict get:    97193.5 µs (  2.3), set:   188024.4 µs (  2.3)
                dict get:    64191.5 µs (  1.5), set:   300668.6 µs (  3.6)
   ets (ordered_set) get:    83813.1 µs (  2.0), set:   100045.7 µs (  1.2)
           ets (set) get:    60364.4 µs (  1.4), set:    83030.4 µs (  1.0)
ets x10 read (ordere get:    86087.9 µs (  2.0)
  ets x10 read (set) get:    74093.9 µs (  1.8)
            gb_trees get:   107121.1 µs (  2.5), set:   214457.9 µs (  2.6)
              hashtl get:    68200.3 µs (  1.6), set:   112145.8 µs (  1.4)
  process dictionary get:    42152.3 µs (  1.0), set:    93514.7 µs (  1.1)
              rbdict get:    96665.2 µs (  2.3), set:   160379.7 µs (  1.9)
                trie get:    54760.0 µs (  1.3), set:   213854.1 µs (  2.6)
N == 64000 (10 runs)
              aadict get:   233343.8 µs (  5.4), set:   440130.2 µs (  2.6)
                dict get:   137424.3 µs (  3.2), set:   871894.4 µs (  5.2)
   ets (ordered_set) get:   177797.4 µs (  4.1), set:   211416.5 µs (  1.3)
           ets (set) get:   114235.0 µs (  2.6), set:   169276.2 µs (  1.0)
ets x10 read (ordere get:   169068.3 µs (  3.9)
  ets x10 read (set) get:   134240.8 µs (  3.1)
            gb_trees get:   255160.6 µs (  5.9), set:   488567.1 µs (  2.9)
              hashtl get:   129433.1 µs (  3.0), set:   210779.3 µs (  1.3)
  process dictionary get:    77023.9 µs (  1.8), set:   166122.5 µs (  1.0)
              rbdict get:   230949.9 µs (  5.3), set:   372831.9 µs (  2.2)
                trie get:    43435.1 µs (  1.0), set:   379677.6 µs (  2.3)
TEST integer_key
N == 50 (10 runs)
              aadict get:       16.2 µs (  3.2), set:       54.9 µs (  5.8)
     array (dynamic) get:       15.0 µs (  2.9), set:       25.5 µs (  2.7)
       array (fixed) get:       15.3 µs (  3.0), set:       26.5 µs (  2.8)
                dict get:       22.2 µs (  4.4), set:       51.0 µs (  5.4)
   ets (ordered_set) get:       21.9 µs (  4.3), set:       29.1 µs (  3.1)
           ets (set) get:       16.3 µs (  3.2), set:       25.8 µs (  2.7)
ets x10 read (ordere get:       20.2 µs (  4.0)
  ets x10 read (set) get:       16.8 µs (  3.3)
            gb_trees get:       17.3 µs (  3.4), set:      104.8 µs ( 11.1)
              hashtl get:       16.6 µs (  3.3), set:       38.6 µs (  4.1)
  process dictionary get:        8.2 µs (  1.6), set:        9.4 µs (  1.0)
              rbdict get:       15.8 µs (  3.1), set:       43.8 µs (  4.7)
               tuple get:        5.1 µs (  1.0), set:       13.0 µs (  1.4)
N == 100 (10 runs)
              aadict get:       43.0 µs (  4.3), set:      128.1 µs (  7.9)
     array (dynamic) get:       28.4 µs (  2.9), set:       51.0 µs (  3.1)
       array (fixed) get:       28.6 µs (  2.9), set:       50.9 µs (  3.1)
                dict get:       48.6 µs (  4.9), set:      121.6 µs (  7.5)
   ets (ordered_set) get:       42.4 µs (  4.3), set:       57.7 µs (  3.6)
           ets (set) get:       98.6 µs ( 10.0), set:       56.5 µs (  3.5)
ets x10 read (ordere get:       37.7 µs (  3.8)
  ets x10 read (set) get:       26.9 µs (  2.7)
            gb_trees get:       36.3 µs (  3.7), set:      263.0 µs ( 16.2)
              hashtl get:       35.3 µs (  3.6), set:       76.6 µs (  4.7)
  process dictionary get:       15.1 µs (  1.5), set:       16.2 µs (  1.0)
              rbdict get:       34.5 µs (  3.5), set:      106.2 µs (  6.6)
               tuple get:        9.9 µs (  1.0), set:       47.9 µs (  3.0)
N == 250 (10 runs)
              aadict get:      103.4 µs (  5.2), set:      430.5 µs (  8.9)
     array (dynamic) get:       99.0 µs (  5.0), set:      187.0 µs (  3.9)
       array (fixed) get:       96.7 µs (  4.9), set:      195.8 µs (  4.1)
                dict get:      113.4 µs (  5.7), set:      868.7 µs ( 18.0)
   ets (ordered_set) get:      101.7 µs (  5.1), set:      150.4 µs (  3.1)
           ets (set) get:       87.5 µs (  4.4), set:      149.0 µs (  3.1)
ets x10 read (ordere get:       85.0 µs (  4.3)
  ets x10 read (set) get:       64.6 µs (  3.2)
            gb_trees get:      102.2 µs (  5.1), set:      883.4 µs ( 18.3)
              hashtl get:      131.7 µs (  6.6), set:      177.9 µs (  3.7)
  process dictionary get:       36.0 µs (  1.8), set:       48.3 µs (  1.0)
              rbdict get:       97.0 µs (  4.9), set:      346.5 µs (  7.2)
               tuple get:       19.9 µs (  1.0), set:      331.4 µs (  6.9)
N == 500 (10 runs)
              aadict get:      215.0 µs (  5.1), set:     1129.6 µs ( 12.6)
     array (dynamic) get:      194.3 µs (  4.6), set:      444.6 µs (  5.0)
       array (fixed) get:      198.1 µs (  4.7), set:      456.6 µs (  5.1)
                dict get:      221.7 µs (  5.2), set:     1258.3 µs ( 14.0)
   ets (ordered_set) get:      220.4 µs (  5.2), set:      308.5 µs (  3.4)
           ets (set) get:      158.3 µs (  3.7), set:      273.8 µs (  3.1)
ets x10 read (ordere get:      172.4 µs (  4.1)
  ets x10 read (set) get:      113.1 µs (  2.7)
            gb_trees get:      223.1 µs (  5.3), set:     2430.2 µs ( 27.1)
              hashtl get:      266.2 µs (  6.3), set:      500.3 µs (  5.6)
  process dictionary get:       66.2 µs (  1.6), set:       89.7 µs (  1.0)
              rbdict get:      214.4 µs (  5.1), set:      985.1 µs ( 11.0)
               tuple get:       42.4 µs (  1.0), set:     2230.6 µs ( 24.9)
N == 1000 (10 runs)
              aadict get:      472.9 µs (  5.7), set:     1953.9 µs ( 12.8)
     array (dynamic) get:      429.1 µs (  5.1), set:      759.1 µs (  5.0)
       array (fixed) get:      424.9 µs (  5.1), set:     1118.4 µs (  7.3)
                dict get:      436.0 µs (  5.2), set:     2460.1 µs ( 16.1)
   ets (ordered_set) get:      436.5 µs (  5.2), set:      870.3 µs (  5.7)
           ets (set) get:      330.5 µs (  4.0), set:      522.7 µs (  3.4)
ets x10 read (ordere get:      335.9 µs (  4.0)
  ets x10 read (set) get:      207.3 µs (  2.5)
            gb_trees get:      494.3 µs (  5.9), set:     4548.2 µs ( 29.7)
              hashtl get:      485.6 µs (  5.8), set:     1291.5 µs (  8.4)
  process dictionary get:      129.7 µs (  1.6), set:      153.0 µs (  1.0)
              rbdict get:      474.4 µs (  5.7), set:     1635.0 µs ( 10.7)
               tuple get:       83.6 µs (  1.0), set:     5586.7 µs ( 36.5)
N == 2000 (10 runs)
              aadict get:     1083.5 µs (  6.8), set:     5727.8 µs (  9.5)
     array (dynamic) get:     1113.2 µs (  7.0), set:     2274.1 µs (  3.8)
       array (fixed) get:      985.6 µs (  6.2), set:     2235.5 µs (  3.7)
                dict get:      904.9 µs (  5.7), set:     6388.5 µs ( 10.6)
   ets (ordered_set) get:      907.6 µs (  5.7), set:     1283.6 µs (  2.1)
           ets (set) get:      702.4 µs (  4.4), set:     1124.5 µs (  1.9)
ets x10 read (ordere get:      614.8 µs (  3.9)
  ets x10 read (set) get:      384.6 µs (  2.4)
            gb_trees get:     1109.6 µs (  7.0), set:    12070.2 µs ( 20.0)
              hashtl get:     1083.8 µs (  6.8), set:     2065.2 µs (  3.4)
  process dictionary get:      258.2 µs (  1.6), set:      603.9 µs (  1.0)
              rbdict get:     1080.8 µs (  6.8), set:     4763.0 µs (  7.9)
               tuple get:      159.3 µs (  1.0), set:    39640.4 µs ( 65.6)
N == 4000 (10 runs)
              aadict get:     2634.7 µs (  8.3), set:    10477.9 µs ( 15.9)
     array (dynamic) get:     1987.0 µs (  6.3), set:     3746.9 µs (  5.7)
       array (fixed) get:     2069.9 µs (  6.5), set:     4359.8 µs (  6.6)
                dict get:     2059.9 µs (  6.5), set:     9022.6 µs ( 13.7)
   ets (ordered_set) get:     1874.6 µs (  5.9), set:     2586.1 µs (  3.9)
           ets (set) get:     1433.8 µs (  4.5), set:     2526.4 µs (  3.8)
ets x10 read (ordere get:     1222.8 µs (  3.9)
  ets x10 read (set) get:      761.1 µs (  2.4)
            gb_trees get:     2287.2 µs (  7.2), set:    27537.3 µs ( 41.7)
              hashtl get:     2361.7 µs (  7.5), set:     3410.8 µs (  5.2)
  process dictionary get:      510.0 µs (  1.6), set:      660.0 µs (  1.0)
              rbdict get:     2750.5 µs (  8.7), set:     9114.6 µs ( 13.8)
               tuple get:      316.2 µs (  1.0), set:   151840.8 µs (230.1)
N == 8000 (10 runs)
              aadict get:     5749.0 µs (  9.3), set:    20115.9 µs ( 16.5)
     array (dynamic) get:     3995.8 µs (  6.4), set:     9530.1 µs (  7.8)
       array (fixed) get:     4147.7 µs (  6.7), set:     9336.6 µs (  7.6)
                dict get:     4393.0 µs (  7.1), set:    20980.3 µs ( 17.2)
   ets (ordered_set) get:     3966.9 µs (  6.4), set:     5457.4 µs (  4.5)
           ets (set) get:     2890.5 µs (  4.7), set:     5415.8 µs (  4.4)
ets x10 read (ordere get:     2441.7 µs (  3.9)
  ets x10 read (set) get:     1529.0 µs (  2.5)
            gb_trees get:     5126.7 µs (  8.3), set:    57499.8 µs ( 47.0)
              hashtl get:     5180.1 µs (  8.4), set:     8602.0 µs (  7.0)
  process dictionary get:     1035.3 µs (  1.7), set:     1222.7 µs (  1.0)
              rbdict get:     5657.5 µs (  9.1), set:    17929.1 µs ( 14.7)
               tuple get:      619.7 µs (  1.0), set:   617050.7 µs (504.7)
N == 16000 (10 runs)
              aadict get:    11571.6 µs (  9.5), set:    45005.3 µs ( 15.6)
     array (dynamic) get:     9951.0 µs (  8.2), set:    24316.3 µs (  8.4)
       array (fixed) get:     9905.7 µs (  8.1), set:    23415.4 µs (  8.1)
                dict get:     8804.0 µs (  7.2), set:    67758.2 µs ( 23.4)
   ets (ordered_set) get:     8359.1 µs (  6.9), set:    12555.9 µs (  4.3)
           ets (set) get:     6100.3 µs (  5.0), set:    16166.7 µs (  5.6)
ets x10 read (ordere get:     5045.8 µs (  4.1)
  ets x10 read (set) get:     3196.1 µs (  2.6)
            gb_trees get:    10767.4 µs (  8.8), set:   120424.8 µs ( 41.7)
              hashtl get:    11302.7 µs (  9.3), set:    25191.9 µs (  8.7)
  process dictionary get:     2081.1 µs (  1.7), set:     2890.8 µs (  1.0)
              rbdict get:    11418.0 µs (  9.4), set:    40875.7 µs ( 14.1)
               tuple get:     1219.9 µs (  1.0), set:  2500319.5 µs (864.9)
N == 32000 (10 runs)
              aadict get:    24278.9 µs (  9.9), set:    96279.2 µs ( 12.1)
     array (dynamic) get:    19642.4 µs (  8.0), set:    47917.9 µs (  6.0)
       array (fixed) get:    19781.2 µs (  8.1), set:    39503.3 µs (  5.0)
                dict get:    19155.6 µs (  7.8), set:   237609.2 µs ( 29.9)
   ets (ordered_set) get:    18021.3 µs (  7.4), set:    25457.2 µs (  3.2)
           ets (set) get:    14420.6 µs (  5.9), set:    33051.4 µs (  4.2)
ets x10 read (ordere get:    10584.6 µs (  4.3)
  ets x10 read (set) get:     7193.0 µs (  2.9)
            gb_trees get:    23442.1 µs (  9.6), set:   244698.6 µs ( 30.8)
              hashtl get:    25704.7 µs ( 10.5), set:    47058.7 µs (  5.9)
  process dictionary get:     4151.9 µs (  1.7), set:     7941.7 µs (  1.0)
              rbdict get:    23069.3 µs (  9.4), set:    85377.7 µs ( 10.8)
               tuple get:     2447.1 µs (  1.0), set: 10227316.7 µs (*****)
N == 64000 (10 runs)
              aadict get:    47719.6 µs (  9.8), set:   225636.9 µs ( 14.4)
     array (dynamic) get:    38896.1 µs (  8.0), set:    88107.9 µs (  5.6)
       array (fixed) get:    38844.9 µs (  8.0), set:    91707.1 µs (  5.8)
                dict get:    38987.5 µs (  8.0), set:   722098.9 µs ( 46.0)
   ets (ordered_set) get:    38267.4 µs (  7.8), set:    54992.6 µs (  3.5)
           ets (set) get:    35880.6 µs (  7.3), set:    71159.9 µs (  4.5)
ets x10 read (ordere get:    22375.3 µs (  4.6)
  ets x10 read (set) get:    17203.7 µs (  3.5)
            gb_trees get:    49406.1 µs ( 10.1), set:   522021.5 µs ( 33.3)
              hashtl get:    40078.7 µs (  8.2), set:   107626.0 µs (  6.9)
  process dictionary get:     8210.2 µs (  1.7), set:    15697.8 µs (  1.0)
              rbdict get:    48208.3 µs (  9.9), set:   216548.6 µs ( 13.8)
               tuple get:     4883.4 µs (  1.0), set: 50572301.0 µs (*****)
Here are some interesting ways to get pseudo random numbers. Using the reductions count does not generate a uniform distribution in the test, but perhaps in a busy system it (erlang:statistics(reductions)) could be uniform. The benchmark was ran (with R14B01, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

erl -noshell -pz ebin -s run test -s init stop
TEST pseudo_randomness
N == 10000 (10 runs)
crypto:rand_uniform/ get:    58996.8 µs ( 26.1)
        erlang:now/0 get:    10001.3 µs (  4.4)
erlang:process_info( get:     2256.5 µs (  1.0)
erlang:statistics(r) get:     2436.1 µs (  1.1)
    random:uniform/1 get:     6723.0 µs (  3.0)
Benchmark results for JSON files of various sizes for ejson, mochijson2, rfc4627, and jsx. The benchmark was ran (with R14B01, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

erl -noshell -pz ebin -s run test -s init stop
TEST json_encode
N == 100 (10 runs)
  file1 encode ejson get:    28615.4 µs (  2.1)
    file1 encode jsx get:    44310.3 µs (  3.2)
file1 encode mochijs get:    13951.6 µs (  1.0)
file1 encode rfc4627 get:    31408.8 µs (  2.3)
  file2 encode ejson get:    24410.5 µs (  1.7)
    file2 encode jsx get:    43363.5 µs (  3.1)
file2 encode mochijs get:    14239.8 µs (  1.0)
file2 encode rfc4627 get:    31685.9 µs (  2.3)
  file3 encode ejson get:    89951.9 µs (  6.4)
    file3 encode jsx get:   141590.1 µs ( 10.1)
file3 encode mochijs get:    51394.0 µs (  3.7)
file3 encode rfc4627 get:   105642.4 µs (  7.6)
  file4 encode ejson get:    87437.1 µs (  6.3)
    file4 encode jsx get:   149272.8 µs ( 10.7)
file4 encode mochijs get:    52154.8 µs (  3.7)
file4 encode rfc4627 get:   104215.8 µs (  7.5)
  file5 encode ejson get:  1471440.2 µs (105.5)
    file5 encode jsx get:  2058126.0 µs (147.5)
file5 encode mochijs get:  1054604.5 µs ( 75.6)
file5 encode rfc4627 get:  1623437.5 µs (116.4)
  file6 encode ejson get:  1476520.5 µs (105.8)
    file6 encode jsx get:  2030774.4 µs (145.6)
file6 encode mochijs get:  1049391.4 µs ( 75.2)
file6 encode rfc4627 get:  1667034.2 µs (119.5)
  file7 encode ejson get:    54972.6 µs (  3.9)
    file7 encode jsx get:    83343.0 µs (  6.0)
file7 encode mochijs get:    30296.7 µs (  2.2)
file7 encode rfc4627 get:    58950.8 µs (  4.2)
  file8 encode ejson get:    54527.2 µs (  3.9)
    file8 encode jsx get:    84560.0 µs (  6.1)
file8 encode mochijs get:    30187.6 µs (  2.2)
file8 encode rfc4627 get:    56012.6 µs (  4.0)
TEST json_decode
N == 100 (10 runs)
  file1 decode ejson get:    21136.5 µs (  1.0)
    file1 decode jsx get:    23450.2 µs (  1.1)
file1 decode mochijs get:    31773.6 µs (  1.5)
file1 decode rfc4627 get:    28486.0 µs (  1.3)
  file2 decode ejson get:    23377.2 µs (  1.1)
    file2 decode jsx get:    26499.0 µs (  1.3)
file2 decode mochijs get:    46210.6 µs (  2.2)
file2 decode rfc4627 get:    33724.5 µs (  1.6)
  file3 decode ejson get:    86309.2 µs (  4.1)
    file3 decode jsx get:    96833.1 µs (  4.6)
file3 decode mochijs get:   113926.7 µs (  5.4)
file3 decode rfc4627 get:    96695.6 µs (  4.6)
  file4 decode ejson get:   100045.1 µs (  4.7)
    file4 decode jsx get:   111998.8 µs (  5.3)
file4 decode mochijs get:   180957.7 µs (  8.6)
file4 decode rfc4627 get:   153633.4 µs (  7.3)
  file5 decode ejson get:  1116020.8 µs ( 52.8)
    file5 decode jsx get:   936208.7 µs ( 44.3)
file5 decode mochijs get:  1236134.1 µs ( 58.5)
file5 decode rfc4627 get:  1316855.2 µs ( 62.3)
  file6 decode ejson get:  1362946.8 µs ( 64.5)
    file6 decode jsx get:  1056306.2 µs ( 50.0)
file6 decode mochijs get:  1714188.8 µs ( 81.1)
file6 decode rfc4627 get:  1058858.4 µs ( 50.1)
  file7 decode ejson get:    42934.6 µs (  2.0)
    file7 decode jsx get:    55812.2 µs (  2.6)
file7 decode mochijs get:    57909.8 µs (  2.7)
file7 decode rfc4627 get:    50009.7 µs (  2.4)
  file8 decode ejson get:    52428.6 µs (  2.5)
    file8 decode jsx get:    67999.6 µs (  3.2)
file8 decode mochijs get:    90002.6 µs (  4.3)
file8 decode rfc4627 get:    56971.5 µs (  2.7)
After seeing the slow storage speed on my simple Erlang hash table implementation (hasht), I attempted to create a faster data structure. With smaller nested tuples that are smaller segments of a hash table, my new hashtl (Hash Table Layered) Erlang data structure does provide faster storage than hasht. However, hashtl has slightly slower fetch times that are similar to dict fetch times (dict uses a 2 layered hash table that is dynamically sized). The hashtl performance is roughly opposite the trie data structure, since the trie has quick "get" times on string keys, while the hashtl has quick "set" times on string keys (i.e., the ratios in the parentheses for the opposite operations are roughly equivalent between the two data structures, when only considering string keys). The benchmark results are from Linux version 2.6.32-21, Ubuntu 10.04 with an Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz and non-HiPE Erlang R14B01:
TEST string_key
N == 50 (10 runs)
              aadict get:       19.0 µs (  1.1), set:       58.5 µs (  1.2)
                dict get:       32.5 µs (  1.9), set:      118.4 µs (  2.4)
           ets (set) get:       32.4 µs (  1.9), set:       54.7 µs (  1.1)
       ets x10 (set) get:       45.2 µs (  2.6)
            gb_trees get:       21.0 µs (  1.2), set:       92.4 µs (  1.9)
               hasht get:       21.6 µs (  1.3), set:      177.4 µs (  3.6)
              hashtl get:       31.5 µs (  1.8), set:       54.0 µs (  1.1)
  process dictionary get:       18.8 µs (  1.1), set:       53.1 µs (  1.1)
              rbdict get:       19.9 µs (  1.2), set:       49.4 µs (  1.0)
                trie get:       17.1 µs (  1.0), set:      105.6 µs (  2.1)
N == 100 (10 runs)
              aadict get:       44.3 µs (  1.3), set:      129.2 µs (  1.3)
                dict get:       75.2 µs (  2.1), set:      235.2 µs (  2.4)
           ets (set) get:       62.1 µs (  1.8), set:      103.8 µs (  1.1)
       ets x10 (set) get:      158.0 µs (  4.5)
            gb_trees get:       49.8 µs (  1.4), set:      188.3 µs (  1.9)
               hasht get:       42.7 µs (  1.2), set:      269.7 µs (  2.8)
              hashtl get:       56.2 µs (  1.6), set:       97.9 µs (  1.0)
  process dictionary get:       38.1 µs (  1.1), set:      112.4 µs (  1.1)
              rbdict get:       45.1 µs (  1.3), set:      107.4 µs (  1.1)
                trie get:       35.2 µs (  1.0), set:      198.5 µs (  2.0)
N == 250 (10 runs)
              aadict get:      138.8 µs (  1.5), set:      400.4 µs (  2.0)
                dict get:      175.6 µs (  1.9), set:      682.3 µs (  3.4)
           ets (set) get:      149.5 µs (  1.6), set:      255.2 µs (  1.3)
       ets x10 (set) get:      259.8 µs (  2.8)
            gb_trees get:      154.2 µs (  1.7), set:      530.6 µs (  2.6)
               hasht get:      131.7 µs (  1.4), set:      793.0 µs (  3.9)
              hashtl get:      167.7 µs (  1.8), set:      201.1 µs (  1.0)
  process dictionary get:       93.3 µs (  1.0), set:      340.0 µs (  1.7)
              rbdict get:      139.0 µs (  1.5), set:      338.8 µs (  1.7)
                trie get:       96.4 µs (  1.0), set:      510.3 µs (  2.5)
N == 500 (10 runs)
              aadict get:      340.8 µs (  1.8), set:      999.1 µs (  2.0)
                dict get:      372.0 µs (  2.0), set:     1555.1 µs (  3.1)
           ets (set) get:      307.4 µs (  1.6), set:      614.2 µs (  1.2)
       ets x10 (set) get:      461.6 µs (  2.5)
            gb_trees get:      359.9 µs (  1.9), set:     1358.1 µs (  2.7)
               hasht get:      472.9 µs (  2.5), set:     1968.7 µs (  3.9)
              hashtl get:      465.1 µs (  2.5), set:      506.7 µs (  1.0)
  process dictionary get:      187.6 µs (  1.0), set:      578.7 µs (  1.1)
              rbdict get:      327.2 µs (  1.7), set:      810.9 µs (  1.6)
                trie get:      214.7 µs (  1.1), set:     1148.8 µs (  2.3)
N == 1000 (10 runs)
              aadict get:      806.1 µs (  1.3), set:     2299.6 µs (  1.6)
                dict get:      931.0 µs (  1.5), set:     3238.2 µs (  2.2)
           ets (set) get:      804.5 µs (  1.3), set:     1534.6 µs (  1.0)
       ets x10 (set) get:      968.7 µs (  1.6)
            gb_trees get:      876.5 µs (  1.4), set:     2706.6 µs (  1.9)
               hasht get:     1324.3 µs (  2.2), set:    10091.8 µs (  6.9)
              hashtl get:     1045.4 µs (  1.7), set:     1462.6 µs (  1.0)
  process dictionary get:      604.8 µs (  1.0), set:     1465.2 µs (  1.0)
              rbdict get:      794.7 µs (  1.3), set:     1782.3 µs (  1.2)
                trie get:      636.7 µs (  1.1), set:     2463.2 µs (  1.7)
N == 2000 (10 runs)
              aadict get:     2024.5 µs (  1.1), set:     5195.9 µs (  1.6)
                dict get:     2688.9 µs (  1.5), set:     6884.1 µs (  2.1)
           ets (set) get:     2591.4 µs (  1.4), set:     5177.3 µs (  1.6)
       ets x10 (set) get:     2733.6 µs (  1.5)
            gb_trees get:     2226.3 µs (  1.2), set:     6122.8 µs (  1.9)
               hasht get:     2589.5 µs (  1.4), set:    30504.6 µs (  9.3)
              hashtl get:     2991.4 µs (  1.6), set:     3288.4 µs (  1.0)
  process dictionary get:     2114.5 µs (  1.2), set:     4030.5 µs (  1.2)
              rbdict get:     2025.2 µs (  1.1), set:     4224.6 µs (  1.3)
                trie get:     1837.8 µs (  1.0), set:     5675.4 µs (  1.7)
N == 4000 (10 runs)
              aadict get:     5926.7 µs (  1.4), set:    12774.6 µs (  1.8)
                dict get:     6796.7 µs (  1.5), set:    18243.7 µs (  2.6)
           ets (set) get:     6281.3 µs (  1.4), set:     7824.5 µs (  1.1)
       ets x10 (set) get:     8046.3 µs (  1.8)
            gb_trees get:     6301.9 µs (  1.4), set:    14376.7 µs (  2.1)
               hasht get:     5460.8 µs (  1.2), set:   103768.9 µs ( 14.9)
              hashtl get:     7137.7 µs (  1.6), set:     6957.0 µs (  1.0)
  process dictionary get:     5115.8 µs (  1.2), set:    10582.6 µs (  1.5)
              rbdict get:     5963.5 µs (  1.4), set:    10554.8 µs (  1.5)
                trie get:     4386.7 µs (  1.0), set:    13996.3 µs (  2.0)
N == 8000 (10 runs)
              aadict get:    16002.9 µs (  1.9), set:    31812.0 µs (  2.0)
                dict get:    14557.8 µs (  1.7), set:    45314.4 µs (  2.8)
           ets (set) get:     1.34e4 µs (  1.6), set:    17021.6 µs (  1.1)
       ets x10 (set) get:    17172.7 µs (  2.0)
            gb_trees get:    16952.9 µs (  2.0), set:    35821.8 µs (  2.2)
               hasht get:    11080.5 µs (  1.3), set:   408960.8 µs ( 25.5)
              hashtl get:    14790.3 µs (  1.7), set:    16040.3 µs (  1.0)
  process dictionary get:    10445.2 µs (  1.2), set:    21810.0 µs (  1.4)
              rbdict get:    15936.7 µs (  1.9), set:    28578.3 µs (  1.8)
                trie get:     8543.9 µs (  1.0), set:    27808.7 µs (  1.7)
N == 16000 (10 runs)
              aadict get:    40660.4 µs (  2.0), set:    76056.4 µs (  2.0)
                dict get:    30382.6 µs (  1.5), set:   115643.4 µs (  3.1)
           ets (set) get:    28500.2 µs (  1.4), set:    37204.1 µs (  1.0)
       ets x10 (set) get:    35423.6 µs (  1.7)
            gb_trees get:    43823.2 µs (  2.1), set:    88600.6 µs (  2.4)
               hasht get:    22523.7 µs (  1.1), set:  2302517.3 µs ( 61.9)
              hashtl get:    29762.8 µs (  1.4), set:    41664.0 µs (  1.1)
  process dictionary get:    20756.7 µs (  1.0), set:    45505.0 µs (  1.2)
              rbdict get:    40808.4 µs (  2.0), set:    67970.2 µs (  1.8)
                trie get:    21091.4 µs (  1.0), set:    83384.2 µs (  2.2)
N == 32000 (10 runs)
              aadict get:   100165.9 µs (  2.5), set:   182365.3 µs (  2.3)
                dict get:    65931.7 µs (  1.6), set:   281306.1 µs (  3.6)
           ets (set) get:    58365.9 µs (  1.4), set:    86377.8 µs (  1.1)
       ets x10 (set) get:    70832.4 µs (  1.7)
            gb_trees get:   108996.9 µs (  2.7), set:   201768.6 µs (  2.6)
               hasht get:    46683.5 µs (  1.1), set:  7559194.8 µs ( 96.3)
              hashtl get:    68377.4 µs (  1.7), set:    78516.3 µs (  1.0)
  process dictionary get:    40873.3 µs (  1.0), set:    90079.1 µs (  1.1)
              rbdict get:   101229.4 µs (  2.5), set:   160677.5 µs (  2.0)
                trie get:    52120.7 µs (  1.3), set:   205179.4 µs (  2.6)
N == 64000 (10 runs)
              aadict get:   236930.0 µs (  5.3), set:   434665.3 µs (  2.5)
                dict get:   140327.2 µs (  3.1), set:   955742.9 µs (  5.4)
           ets (set) get:   116628.2 µs (  2.6), set:   177744.3 µs (  1.0)
       ets x10 (set) get:   136503.0 µs (  3.0)
            gb_trees get:   258394.8 µs (  5.8), set:   475881.6 µs (  2.7)
               hasht get:    86721.3 µs (  1.9), set: 40883420.3 µs (232.6)
              hashtl get:   132184.1 µs (  2.9), set:   189342.2 µs (  1.1)
  process dictionary get:    80194.3 µs (  1.8), set:   175787.6 µs (  1.0)
              rbdict get:   236683.9 µs (  5.3), set:   393917.9 µs (  2.2)
                trie get:    44896.4 µs (  1.0), set:   406019.4 µs (  2.3)
TEST integer_key
N == 50 (10 runs)
              aadict get:       17.0 µs (  3.2), set:       70.9 µs (  7.2)
     array (dynamic) get:       15.5 µs (  2.9), set:       25.1 µs (  2.5)
       array (fixed) get:       15.5 µs (  2.9), set:       27.3 µs (  2.8)
                dict get:       22.9 µs (  4.3), set:       47.2 µs (  4.8)
           ets (set) get:       16.3 µs (  3.1), set:       28.9 µs (  2.9)
       ets x10 (set) get:       20.4 µs (  3.8)
            gb_trees get:       16.6 µs (  3.1), set:       99.9 µs ( 10.1)
               hasht get:       11.5 µs (  2.2), set:      117.8 µs ( 11.9)
              hashtl get:       16.0 µs (  3.0), set:       38.4 µs (  3.9)
  process dictionary get:        7.9 µs (  1.5), set:        9.9 µs (  1.0)
              rbdict get:       16.0 µs (  3.0), set:       43.2 µs (  4.4)
               tuple get:        5.3 µs (  1.0), set:       12.1 µs (  1.2)
N == 100 (10 runs)
              aadict get:       42.4 µs (  3.8), set:      127.8 µs (  6.7)
     array (dynamic) get:       35.7 µs (  3.2), set:       59.6 µs (  3.1)
       array (fixed) get:       36.5 µs (  3.3), set:       60.6 µs (  3.2)
                dict get:       58.0 µs (  5.2), set:      137.0 µs (  7.1)
           ets (set) get:       39.4 µs (  3.5), set:       70.6 µs (  3.7)
       ets x10 (set) get:       37.9 µs (  3.4)
            gb_trees get:       43.3 µs (  3.9), set:      544.8 µs ( 28.4)
               hasht get:       26.7 µs (  2.4), set:      257.0 µs ( 13.4)
              hashtl get:       40.5 µs (  3.6), set:      102.5 µs (  5.3)
  process dictionary get:       18.7 µs (  1.7), set:       19.2 µs (  1.0)
              rbdict get:       42.6 µs (  3.8), set:      120.3 µs (  6.3)
               tuple get:       11.1 µs (  1.0), set:       52.8 µs (  2.8)
N == 250 (10 runs)
              aadict get:      123.9 µs (  3.8), set:      392.2 µs (  8.6)
     array (dynamic) get:      121.6 µs (  3.7), set:      215.9 µs (  4.8)
       array (fixed) get:      121.9 µs (  3.7), set:      378.6 µs (  8.3)
                dict get:      143.7 µs (  4.4), set:      476.0 µs ( 10.5)
           ets (set) get:       98.0 µs (  3.0), set:      187.3 µs (  4.1)
       ets x10 (set) get:       73.5 µs (  2.3)
            gb_trees get:      124.0 µs (  3.8), set:      977.7 µs ( 21.5)
               hasht get:       68.0 µs (  2.1), set:      646.4 µs ( 14.2)
              hashtl get:      123.8 µs (  3.8), set:      329.9 µs (  7.3)
  process dictionary get:       40.8 µs (  1.3), set:       45.4 µs (  1.0)
              rbdict get:      124.2 µs (  3.8), set:      365.7 µs (  8.1)
               tuple get:       32.6 µs (  1.0), set:      273.5 µs (  6.0)
N == 500 (10 runs)
              aadict get:      272.6 µs (  5.8), set:      918.0 µs ( 10.3)
     array (dynamic) get:      208.8 µs (  4.5), set:      380.0 µs (  4.3)
       array (fixed) get:      209.1 µs (  4.5), set:      391.9 µs (  4.4)
                dict get:      267.8 µs (  5.7), set:     1089.2 µs ( 12.3)
           ets (set) get:      189.0 µs (  4.0), set:      339.7 µs (  3.8)
       ets x10 (set) get:      123.6 µs (  2.6)
            gb_trees get:      256.7 µs (  5.5), set:     2419.5 µs ( 27.2)
               hasht get:      152.6 µs (  3.3), set:     1541.5 µs ( 17.3)
              hashtl get:      253.2 µs (  5.4), set:      408.1 µs (  4.6)
  process dictionary get:       84.9 µs (  1.8), set:       88.9 µs (  1.0)
              rbdict get:      256.2 µs (  5.5), set:      840.5 µs (  9.5)
               tuple get:       46.9 µs (  1.0), set:     2125.3 µs ( 23.9)
N == 1000 (10 runs)
              aadict get:      478.9 µs (  5.4), set:     1683.9 µs ( 11.0)
     array (dynamic) get:      415.5 µs (  4.7), set:      857.5 µs (  5.6)
       array (fixed) get:      453.3 µs (  5.1), set:      888.8 µs (  5.8)
                dict get:      448.0 µs (  5.0), set:     2231.5 µs ( 14.5)
           ets (set) get:      334.7 µs (  3.8), set:     1006.4 µs (  6.6)
       ets x10 (set) get:      203.2 µs (  2.3)
            gb_trees get:      477.3 µs (  5.4), set:     4498.8 µs ( 29.3)
               hasht get:      321.7 µs (  3.6), set:     7766.1 µs ( 50.6)
              hashtl get:      561.4 µs (  6.3), set:      830.2 µs (  5.4)
  process dictionary get:      138.8 µs (  1.6), set:      153.6 µs (  1.0)
              rbdict get:      488.3 µs (  5.5), set:     1596.8 µs ( 10.4)
               tuple get:       89.0 µs (  1.0), set:     7330.6 µs ( 47.7)
N == 2000 (10 runs)
              aadict get:     1096.2 µs (  6.4), set:     3939.2 µs ( 13.9)
     array (dynamic) get:      996.5 µs (  5.8), set:     2026.3 µs (  7.2)
       array (fixed) get:      994.3 µs (  5.8), set:     1856.3 µs (  6.6)
                dict get:      934.0 µs (  5.5), set:     3885.4 µs ( 13.7)
           ets (set) get:      714.5 µs (  4.2), set:     2270.4 µs (  8.0)
       ets x10 (set) get:      390.3 µs (  2.3)
            gb_trees get:     1061.6 µs (  6.2), set:     9302.3 µs ( 32.9)
               hasht get:      559.6 µs (  3.3), set:    26775.2 µs ( 94.6)
              hashtl get:     1239.9 µs (  7.3), set:     1985.6 µs (  7.0)
  process dictionary get:      268.6 µs (  1.6), set:      283.1 µs (  1.0)
              rbdict get:     1083.6 µs (  6.3), set:     3509.0 µs ( 12.4)
               tuple get:      170.9 µs (  1.0), set:    22928.5 µs ( 81.0)
N == 4000 (10 runs)
              aadict get:     2598.7 µs (  7.5), set:     9143.9 µs ( 15.4)
     array (dynamic) get:     1991.0 µs (  5.7), set:     3780.1 µs (  6.3)
       array (fixed) get:     1994.8 µs (  5.8), set:     3619.8 µs (  6.1)
                dict get:     1959.2 µs (  5.7), set:    10221.3 µs ( 17.2)
           ets (set) get:     1450.3 µs (  4.2), set:     4560.7 µs (  7.7)
       ets x10 (set) get:      766.1 µs (  2.2)
            gb_trees get:     2220.8 µs (  6.4), set:    22137.9 µs ( 37.2)
               hasht get:     1032.0 µs (  3.0), set:   107334.7 µs (180.2)
              hashtl get:     2263.7 µs (  6.5), set:     3107.2 µs (  5.2)
  process dictionary get:      526.6 µs (  1.5), set:      595.6 µs (  1.0)
              rbdict get:     2581.0 µs (  7.4), set:     8322.6 µs ( 14.0)
               tuple get:      346.5 µs (  1.0), set:    91854.1 µs (154.2)
N == 8000 (10 runs)
              aadict get:     5625.3 µs (  8.6), set:    21650.5 µs ( 19.5)
     array (dynamic) get:     3998.4 µs (  6.1), set:     7457.9 µs (  6.7)
       array (fixed) get:     4014.6 µs (  6.2), set:     7380.2 µs (  6.7)
                dict get:     4200.5 µs (  6.5), set:    21589.1 µs ( 19.5)
           ets (set) get:     2958.9 µs (  4.5), set:     8535.7 µs (  7.7)
       ets x10 (set) get:     1509.6 µs (  2.3)
            gb_trees get:     4922.7 µs (  7.6), set:    55256.5 µs ( 49.9)
               hasht get:     1990.7 µs (  3.1), set:   394366.0 µs (356.0)
              hashtl get:     5715.1 µs (  8.8), set:     6898.9 µs (  6.2)
  process dictionary get:     1057.2 µs (  1.6), set:     1107.8 µs (  1.0)
              rbdict get:     5455.4 µs (  8.4), set:    24543.2 µs ( 22.2)
               tuple get:      650.9 µs (  1.0), set:   365080.7 µs (329.6)
N == 16000 (10 runs)
              aadict get:    11302.9 µs (  8.7), set:    38483.2 µs ( 15.1)
     array (dynamic) get:     9810.6 µs (  7.6), set:    21888.0 µs (  8.6)
       array (fixed) get:    10283.3 µs (  7.9), set:    21910.1 µs (  8.6)
                dict get:     8249.0 µs (  6.4), set:    60084.0 µs ( 23.6)
           ets (set) get:     5965.8 µs (  4.6), set:    14583.2 µs (  5.7)
       ets x10 (set) get:     3052.2 µs (  2.4)
            gb_trees get:    10263.4 µs (  7.9), set:   126358.0 µs ( 49.7)
               hasht get:     3966.8 µs (  3.1), set:  1570470.8 µs (617.4)
              hashtl get:    11766.2 µs (  9.1), set:    17786.2 µs (  7.0)
  process dictionary get:     2094.7 µs (  1.6), set:     2543.8 µs (  1.0)
              rbdict get:    11263.8 µs (  8.7), set:    36493.2 µs ( 14.3)
               tuple get:     1294.3 µs (  1.0), set:  1486397.5 µs (584.3)
N == 32000 (10 runs)
              aadict get:    22952.3 µs (  8.9), set:    85579.9 µs ( 14.6)
     array (dynamic) get:    19565.7 µs (  7.6), set:    40450.3 µs (  6.9)
       array (fixed) get:    19597.6 µs (  7.6), set:    39994.8 µs (  6.8)
                dict get:    17543.6 µs (  6.8), set:   185507.8 µs ( 31.7)
           ets (set) get:    13456.0 µs (  5.2), set:    29891.5 µs (  5.1)
       ets x10 (set) get:     6824.8 µs (  2.6)
            gb_trees get:    22242.6 µs (  8.6), set:   244581.5 µs ( 41.8)
               hasht get:     8159.9 µs (  3.2), set:  6891457.9 µs (*****)
              hashtl get:    24608.9 µs (  9.5), set:    43185.2 µs (  7.4)
  process dictionary get:     4210.2 µs (  1.6), set:     5853.7 µs (  1.0)
              rbdict get:    22806.8 µs (  8.8), set:    84932.4 µs ( 14.5)
               tuple get:     2581.2 µs (  1.0), set:  6422056.3 µs (*****)
N == 64000 (10 runs)
              aadict get:    47862.1 µs (  9.4), set:   191716.5 µs ( 15.6)
     array (dynamic) get:    39008.0 µs (  7.7), set:    77311.6 µs (  6.3)
       array (fixed) get:    38962.4 µs (  7.6), set:    78340.9 µs (  6.4)
                dict get:    38602.2 µs (  7.6), set:   625891.7 µs ( 50.9)
           ets (set) get:    35109.1 µs (  6.9), set:    69833.4 µs (  5.7)
       ets x10 (set) get:    16434.1 µs (  3.2)
            gb_trees get:    46772.8 µs (  9.2), set:   513581.4 µs ( 41.8)
               hasht get:    18970.6 µs (  3.7), set: 33729619.8 µs (*****)
              hashtl get:    59980.1 µs ( 11.8), set:   108446.2 µs (  8.8)
  process dictionary get:     8468.0 µs (  1.7), set:    12288.2 µs (  1.0)
              rbdict get:    48106.1 µs (  9.4), set:   178132.4 µs ( 14.5)
               tuple get:     5097.6 µs (  1.0), set: 35494023.9 µs (*****)
I created hasht, a simple hash table implementation that performs quick lookups. Setting the structure is quite slow because it always modifies a single large tuple. However, with a more tiered approach, the storage speed might increase. The hasht lookup time for strings is similar to the trie data structure, but is slower for smaller N. The hasht lookup time for integers is similar to ETS (unless the ETS table is accessed with greater than ~10 processes concurrently with read concurrency optimizations set on the table), but the process dictionary and tuple are both quicker for integer lookups. The benchmark results are from Linux version 2.6.32-21, Ubuntu 10.04 with an Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz and non-HiPE Erlang R14B01:
TEST string_key
N == 1000 (10 runs)
              aadict get:      828.6 µs (  1.5), set:     2226.0 µs (  1.4)
                dict get:      940.9 µs (  1.7), set:     3184.6 µs (  2.0)
           ets (set) get:      828.5 µs (  1.5), set:     1586.9 µs (  1.0)
       ets x10 (set) get:     1030.8 µs (  1.8)
            gb_trees get:      897.6 µs (  1.6), set:     2786.5 µs (  1.8)
               hasht get:     1392.5 µs (  2.5), set:     9815.1 µs (  6.2)
  process dictionary get:      566.3 µs (  1.0), set:     1618.2 µs (  1.0)
              rbdict get:      817.3 µs (  1.4), set:     1806.3 µs (  1.1)
                trie get:      650.9 µs (  1.1), set:     2466.6 µs (  1.6)
N == 10000 (10 runs)
              aadict get:    21320.8 µs (  2.6), set:    45882.9 µs (  2.1)
                dict get:    17345.4 µs (  2.1), set:    59625.4 µs (  2.7)
           ets (set) get:    16985.1 µs (  2.0), set:    22305.2 µs (  1.0)
       ets x10 (set) get:    21002.8 µs (  2.5)
            gb_trees get:    23835.2 µs (  2.9), set:    47572.9 µs (  2.1)
               hasht get:    13345.9 µs (  1.6), set:  1030469.0 µs ( 46.2)
  process dictionary get:    12837.2 µs (  1.5), set:    25609.6 µs (  1.1)
              rbdict get:    21680.3 µs (  2.6), set:    42978.7 µs (  1.9)
                trie get:     8351.5 µs (  1.0), set:    35003.4 µs (  1.6)
N == 50000 (10 runs)
              aadict get:   179311.1 µs (  2.9), set:   326961.5 µs (  2.4)
                dict get:   111803.5 µs (  1.8), set:   615110.2 µs (  4.6)
           ets (set) get:    91941.2 µs (  1.5), set:   135026.9 µs (  1.0)
       ets x10 (set) get:   107469.8 µs (  1.7)
            gb_trees get:   195219.1 µs (  3.1), set:   358408.6 µs (  2.7)
               hasht get:    66421.6 µs (  1.1), set: 27200459.7 µs (201.4)
  process dictionary get:    62683.4 µs (  1.0), set:   141821.2 µs (  1.1)
              rbdict get:   182102.3 µs (  2.9), set:   294744.7 µs (  2.2)
                trie get:    71213.7 µs (  1.1), set:   357100.4 µs (  2.6)
TEST integer_key
N == 1000 (10 runs)
              aadict get:      482.5 µs (  5.6), set:     1917.5 µs ( 13.1)
     array (dynamic) get:      393.3 µs (  4.5), set:      721.6 µs (  4.9)
       array (fixed) get:      393.7 µs (  4.6), set:     1080.2 µs (  7.4)
                dict get:      447.6 µs (  5.2), set:     2010.5 µs ( 13.7)
           ets (set) get:      333.3 µs (  3.9), set:      748.7 µs (  5.1)
       ets x10 (set) get:      203.2 µs (  2.3)
            gb_trees get:      476.2 µs (  5.5), set:     4339.8 µs ( 29.6)
               hasht get:      358.1 µs (  4.1), set:     7134.3 µs ( 48.7)
  process dictionary get:      133.6 µs (  1.5), set:      146.5 µs (  1.0)
              rbdict get:      478.0 µs (  5.5), set:     2027.6 µs ( 13.8)
               tuple get:       86.5 µs (  1.0), set:     6903.4 µs ( 47.1)
N == 10000 (10 runs)
              aadict get:     6670.9 µs (  8.3), set:    24593.4 µs ( 14.3)
     array (dynamic) get:     4968.5 µs (  6.1), set:     9641.3 µs (  5.6)
       array (fixed) get:     5057.3 µs (  6.3), set:     8929.1 µs (  5.2)
                dict get:     4857.0 µs (  6.0), set:    29415.4 µs ( 17.1)
           ets (set) get:     3695.3 µs (  4.6), set:     8625.1 µs (  5.0)
       ets x10 (set) get:     1896.0 µs (  2.3)
            gb_trees get:     6234.5 µs (  7.7), set:    72117.2 µs ( 42.0)
               hasht get:     2386.1 µs (  3.0), set:  1042563.7 µs (607.4)
  process dictionary get:     1317.5 µs (  1.6), set:     1716.3 µs (  1.0)
              rbdict get:     6772.1 µs (  8.4), set:    25672.9 µs ( 15.0)
               tuple get:      808.1 µs (  1.0), set:   613975.0 µs (357.7)
N == 50000 (10 runs)
              aadict get:    36984.9 µs (  9.2), set:   166405.2 µs ( 16.8)
     array (dynamic) get:    30181.9 µs (  7.5), set:    58707.0 µs (  5.9)
       array (fixed) get:    30171.7 µs (  7.5), set:    60833.8 µs (  6.2)
                dict get:    27988.1 µs (  7.0), set:   408414.5 µs ( 41.3)
           ets (set) get:    25076.6 µs (  6.3), set:    50396.9 µs (  5.1)
       ets x10 (set) get:    12081.4 µs (  3.0)
            gb_trees get:    35874.3 µs (  9.0), set:   405197.7 µs ( 41.0)
               hasht get:    13516.8 µs (  3.4), set: 26560861.7 µs (*****)
  process dictionary get:     6687.2 µs (  1.7), set:     9888.3 µs (  1.0)
              rbdict get:    36388.4 µs (  9.1), set:   152543.6 µs ( 15.4)
               tuple get:     4002.1 µs (  1.0), set: 18081650.6 µs (*****)
My benchmarks for key/value data stored with either integer or string keys lacked explanation of the results. I will explain my conclusions here based on the benchmark results. When looking at the benchmark results, the normalized time spent during the test is within the parentheses, next to the time it relates to. The normalized values shows how many times an operation for a data structure is slower than the fastest alternative.

The orddict data structure is the fastest data structure for storing data with integer keys. However, all other operations on an orddict are very slow compared to any of the other data structures. Storing data with an integer key is slowest in a tuple, but the integer key lookup of data within a tuple is the fastest among all the data structures. The process dictionary is the fastest for storing data if you ignore an orddict with an integer key (since a lookup in an orddict is painfully slow). Typically, usage of the process dictionary is discouraged, since it eliminates the source code's referential transparency making the code harder to test, maintain, and understand.

ETS results in the benchmark showed that concurrent lookups are faster with integer keys but slower with string keys, which shows that more complex comparison operations limit scalability quicker with ETS' global data storage (the assumption being that concurrent integer key lookups on ETS would also become slower, but it would take more concurrent lookups to see the performance degrade due to the global data locking that is used for consistency). A persistent puzzle has been the aadict and gb_trees which should implement the same algorithm, but differ for storing integer key data (the aadict is quicker). The aadict and gb_trees data structures should have faster lookup times than the rbdict data structure, but no significant difference can be seen (gb_trees actually has slower lookup times for string keys). The dict data structure is faster for lookups but slower for storing when compared with the red-black tree data structures (rbdict, aatree and gb_trees).

The array data structure is much quicker than a tuple for storing integer key data, if you can tolerate the slower lookup time (lookups become 7.5 times slower than a tuple). The trie data structure is a new data structure I created for efficient string lookups. The trie is able to be roughly equivalent to the process dictionary for lookups (1.1 times the speed for the largest test), but should always be slightly slower as an externally implemented data structure (i.e., not a core language feature like the process dictionary). The store of string key data in a trie is roughly equivalent to storing data in a red-black tree data structure (a trie is slower than rbdict, equivalent to aatree, and faster than gb_trees for the largest test).

The speed of the trie comes from exploiting the quick integer key lookups on a tuple. The speed of storing string key data in the red-black tree data structures is similar to storing data in a trie, because all of these tree structures use tuples for nodes. One interesting fact is that the trie can not store the atom value 'error' because of the ambiguity within the dict interface (the dict:fetch/2 function) and the fact that the atom 'error' is used for empty values within the trie to make the implementation simpler.
Writing an Erlang in-order list traversal to form a new list is very common, but it is not always clear what traversal method to use. There is a brief benchmark (added to erlbench) which shows the best method depends on the size of the list. The benchmark was ran (with R14B01, without HiPE) on an AMD Phenom 9950 Quad-Core (64 bit) running Linux 2.6.32-23-generic (Ubuntu).

TEST list_traversal
N == 1000 (10 runs)
       list -> queue get:      245.4 µs (  7.9)
  list comprehension get:       31.2 µs (  1.0)
       lists:foldr/3 get:       54.4 µs (  1.7)
         lists:map/2 get:       59.4 µs (  1.9)
      queue traverse get:      193.4 µs (  6.2)
    reverse/traverse get:       53.8 µs (  1.7)
     traverse/append get:     3070.1 µs ( 98.4)
N == 10000 (10 runs)
       list -> queue get:     3607.6 µs (  8.0)
  list comprehension get:      449.2 µs (  1.0)
       lists:foldr/3 get:      993.5 µs (  2.2)
         lists:map/2 get:      681.8 µs (  1.5)
      queue traverse get:     3062.9 µs (  6.8)
    reverse/traverse get:      929.5 µs (  2.1)
     traverse/append get:   333260.8 µs (741.9)
N == 50000 (10 runs)
       list -> queue get:    19467.0 µs (  7.8)
  list comprehension get:     2505.1 µs (  1.0)
       lists:foldr/3 get:     2813.3 µs (  1.1)
         lists:map/2 get:     2844.0 µs (  1.1)
      queue traverse get:    15101.6 µs (  6.0)
    reverse/traverse get:     5721.9 µs (  2.3)
     traverse/append get: 13025345.9 µs (*****)


The results show that any traversal that keeps the list size the same should use list comprehension. A traversal that must alter the size of the list could use lists:foldr/3 with the [H|T] syntax, but the usage is not tail-recursive. A tail-recursive solution must use the reverse/traverse method for slightly slower performance (the choice is described in the Erlang Efficiency Guide). Robert Virding mentioned that the reverse/traverse method should be preferred since it is tail-recursive and can keep memory consumption low when processing large lists. This test currently only tests the reverse/traverse method where the reverse is performed before the traversal, since performing the reversal after the traversal should have very similar characteristics.

The important point to remember from the benchmark is to not use the "traverse/append" method which does an in-order traversal that relies on ++[Element] syntax to append an element to the new list (newly created each append operation). Not using "traverse/append" is somewhat contrary to the Erlang efficiency guide, that states '++' is not slow. However, with the other options available, it is easy to ignore.

Thanks goes to Robert Virding and Andrew Thompson for helping with suggestions for improving the benchmark.
profile
User: [info]okeuday
Name: Michael Truog
Website: My Website
calendar
Back November 2011
12345
6789101112
13141516171819
20212223242526
27282930
links
License
tags