Michael Truog (okeuday) wrote,
Michael Truog
okeuday

trie retest

The older trie (speed) testing didn't experiment with Erlang compilation options and used Erlang R14B01 (newest, at the time). To examine how performance has changed, I wanted to rerun the tests with Erlang R15B03 on newer hardware. I have tested various HiPE optimization levels (Erlang configuration: --enable-threads --enable-smp-support --enable-kernel-poll --enable-hipe --enable-native-libs) and running without optimization (Erlang configuration: --enable-threads --enable-smp-support --enable-kernel-poll --disable-hipe). The hardware for these tests is: Core i7 2670QM 2.2GHz 4 cores, 8 hyper-threads, L2:4×256KB L3:6MB RAM:8GB:DDR3-1333MHz, Sandy Bridge-HE-4 (Socket G2), Linux 3.2.0-29-generic x86_64, Ubuntu 12.04.1 LTS. To understand the Erlang compilation options for each optimization level, look at the makefile.

The older results and the summary showed similar performance between the trie and the process dictionary with larger tries (50000 elements), for the trie:find/2 operation (shown as "get" in the results). The new testing only deals with N as 1000, 10000, and 100000 instead of 1000, 10000, and 50000, to look at orders of magnitude difference. When doing the new testing, the smaller tries (1000 elements) showed similar performance between the trie and the process dictionary (i.e., the ideal usage changed). The difference became more dramatic with HiPE optimization level 1, such that the trie outperformed the process dictionary for N as 1000 and 10000. Outperforming the Erlang process dictionary is very surprising and atypical since the process dictionary is normally the fastest storage available, but it is normally avoided since it eliminates the source code's referential transparency making the code harder to test, maintain, and understand.

Using the trie data structure instead of the process dictionary allows you to provide a scope for your data, since an Erlang process only has a single shared process dictionary (i.e., one process dictionary per Erlang process). The trie data structure also allows the state to be saved as Erlang terms, so it can easily be put in ets, a database, or sent between processes.

While it is unclear if HiPE is used within production Erlang systems (by people that need stability), it is good to see that the trie can benefit from the simplest optimizations available to Erlang (HiPE optimization level 1). The exact source of the trie performance change between R14B01 and R15B03 is also unclear because of how many changes influence the Erlang runtime system, but the trend from larger tries being faster to smaller tries being faster helps to make the trie data structure more applicable to common tasks, as the best solution for storing associative lookup data with string keys (where a string is a list of integers, not a binary). Feel free to try the string_key test within your own software development environment.

make OPTIMIZE=0
erl -noshell -pz ebin -s run test -s init stop
TEST string_key
N == 1000 (10 runs)
              aadict get:      431.8 us (  1.4), set:     1116.5 us (  1.5)
                dict get:      443.7 us (  1.5), set:     1791.7 us (  2.4)
   ets (ordered_set) get:      791.9 us (  2.6), set:     1497.6 us (  2.0)
           ets (set) get:      364.4 us (  1.2), set:      890.3 us (  1.2)
ets x10 read (ordere get:      655.3 us (  2.2)
  ets x10 read (set) get:      593.7 us (  2.0)
            gb_trees get:      459.4 us (  1.5), set:     1518.7 us (  2.0)
              hashtl get:      915.7 us (  3.0), set:     1270.2 us (  1.7)
  process dictionary get:      316.6 us (  1.0), set:      741.5 us (  1.0)
              rbdict get:      420.0 us (  1.4), set:      940.5 us (  1.3)
                trie get:      302.9 us (  1.0), set:      978.1 us (  1.3)
N == 10000 (10 runs)
              aadict get:     8903.2 us (  1.4), set:    20652.3 us (  2.0)
                dict get:     8144.9 us (  1.3), set:    26353.0 us (  2.6)
   ets (ordered_set) get:     9488.3 us (  1.5), set:    12515.1 us (  1.2)
           ets (set) get:     8149.0 us (  1.3), set:    10223.1 us (  1.0)
ets x10 read (ordere get:     7549.7 us (  1.2)
  ets x10 read (set) get:     6636.2 us (  1.1)
            gb_trees get:     9630.4 us (  1.6), set:    24495.7 us (  2.4)
              hashtl get:     9724.2 us (  1.6), set:    16015.6 us (  1.6)
  process dictionary get:     6178.1 us (  1.0), set:    14000.4 us (  1.4)
              rbdict get:     8522.1 us (  1.4), set:    18301.5 us (  1.8)
                trie get:     7215.9 us (  1.2), set:    18760.3 us (  1.8)
N == 100000 (10 runs)
              aadict get:   217818.2 us (  2.9), set:   363481.8 us (  3.1)
                dict get:   120559.7 us (  1.6), set:   702588.1 us (  6.0)
   ets (ordered_set) get:   140960.1 us (  1.9), set:   160380.2 us (  1.4)
           ets (set) get:    92217.0 us (  1.2), set:   117812.8 us (  1.0)
ets x10 read (ordere get:    94970.5 us (  1.3)
  ets x10 read (set) get:    77114.3 us (  1.0)
            gb_trees get:   240592.4 us (  3.2), set:   413380.8 us (  3.5)
              hashtl get:   123689.9 us (  1.7), set:   162366.1 us (  1.4)
  process dictionary get:    74274.1 us (  1.0), set:   145834.0 us (  1.2)
              rbdict get:   216569.4 us (  2.9), set:   329517.9 us (  2.8)
                trie get:   128545.4 us (  1.7), set:   469294.1 us (  4.0)

make OPTIMIZE=1
erl -noshell -pz ebin -s run test -s init stop
TEST string_key
N == 1000 (10 runs)
              aadict get:      364.6 us (  2.5), set:      682.8 us (  1.2)
                dict get:      338.6 us (  2.3), set:     1243.5 us (  2.3)
   ets (ordered_set) get:      575.0 us (  3.9), set:      795.2 us (  1.4)
           ets (set) get:      344.1 us (  2.3), set:      572.9 us (  1.0)
ets x10 read (ordere get:      569.3 us (  3.8)
  ets x10 read (set) get:      675.6 us (  4.6)
            gb_trees get:      404.0 us (  2.7), set:     1024.3 us (  1.9)
              hashtl get:      488.2 us (  3.3), set:      571.5 us (  1.0)
  process dictionary get:      228.3 us (  1.5), set:      783.6 us (  1.4)
              rbdict get:      350.9 us (  2.4), set:      551.9 us (  1.0)
                trie get:      148.2 us (  1.0), set:      637.4 us (  1.2)
N == 10000 (10 runs)
              aadict get:     8243.8 us (  1.5), set:    14899.0 us (  1.3)
                dict get:     7905.4 us (  1.5), set:    22639.6 us (  2.0)
   ets (ordered_set) get:     9647.8 us (  1.8), set:    12273.3 us (  1.1)
           ets (set) get:     8741.0 us (  1.6), set:    11131.5 us (  1.0)
ets x10 read (ordere get:     8152.5 us (  1.5)
  ets x10 read (set) get:     7511.2 us (  1.4)
            gb_trees get:     8854.4 us (  1.6), set:    17573.0 us (  1.6)
              hashtl get:     8872.8 us (  1.6), set:    14108.0 us (  1.3)
  process dictionary get:     6474.1 us (  1.2), set:    15250.0 us (  1.4)
              rbdict get:     7876.7 us (  1.5), set:    12443.8 us (  1.1)
                trie get:     5395.0 us (  1.0), set:    13855.8 us (  1.2)
N == 100000 (10 runs)
              aadict get:   203331.7 us (  3.2), set:   290739.7 us (  2.8)
                dict get:   115243.7 us (  1.8), set:   636958.5 us (  6.0)
   ets (ordered_set) get:   138594.7 us (  2.1), set:   156941.5 us (  1.5)
           ets (set) get:    83570.1 us (  1.3), set:   105417.7 us (  1.0)
ets x10 read (ordere get:    95338.7 us (  1.5)
  ets x10 read (set) get:    68988.9 us (  1.1)
            gb_trees get:   228740.4 us (  3.5), set:   334276.2 us (  3.2)
              hashtl get:   106109.2 us (  1.6), set:   119801.5 us (  1.1)
  process dictionary get:    64525.8 us (  1.0), set:   137005.9 us (  1.3)
              rbdict get:   199559.6 us (  3.1), set:   257816.9 us (  2.4)
                trie get:    95711.2 us (  1.5), set:   426041.9 us (  4.0)

make OPTIMIZE=2
erl -noshell -pz ebin -s run test -s init stop
TEST string_key
N == 1000 (10 runs)
              aadict get:      473.4 us (  3.3), set:      763.9 us (  1.4)
                dict get:      435.0 us (  3.0), set:     1547.7 us (  2.9)
   ets (ordered_set) get:      574.4 us (  4.0), set:      805.5 us (  1.5)
           ets (set) get:      339.6 us (  2.3), set:      589.4 us (  1.1)
ets x10 read (ordere get:      623.7 us (  4.3)
  ets x10 read (set) get:      585.3 us (  4.0)
            gb_trees get:      408.5 us (  2.8), set:     1014.8 us (  1.9)
              hashtl get:      492.9 us (  3.4), set:      597.2 us (  1.1)
  process dictionary get:      222.7 us (  1.5), set:      791.6 us (  1.5)
              rbdict get:      352.2 us (  2.4), set:      540.2 us (  1.0)
                trie get:      144.9 us (  1.0), set:      775.0 us (  1.4)
N == 10000 (10 runs)
              aadict get:     8240.3 us (  1.6), set:    14352.5 us (  1.4)
                dict get:     7820.1 us (  1.5), set:    22096.4 us (  2.1)
   ets (ordered_set) get:     9841.4 us (  1.9), set:    12463.4 us (  1.2)
           ets (set) get:     8539.8 us (  1.6), set:    10536.7 us (  1.0)
ets x10 read (ordere get:     8252.2 us (  1.6)
  ets x10 read (set) get:     7503.8 us (  1.4)
            gb_trees get:     8956.5 us (  1.7), set:    17869.8 us (  1.7)
              hashtl get:     8733.6 us (  1.7), set:    15034.7 us (  1.4)
  process dictionary get:     6167.1 us (  1.2), set:    14606.7 us (  1.4)
              rbdict get:     7936.7 us (  1.5), set:    12769.4 us (  1.2)
                trie get:     5220.3 us (  1.0), set:    14063.3 us (  1.3)
N == 100000 (10 runs)
              aadict get:   213701.7 us (  3.1), set:   299393.6 us (  2.7)
                dict get:   118862.2 us (  1.7), set:   637653.9 us (  5.7)
   ets (ordered_set) get:   131884.9 us (  1.9), set:   151626.0 us (  1.4)
           ets (set) get:    90851.1 us (  1.3), set:   112281.9 us (  1.0)
ets x10 read (ordere get:    90603.8 us (  1.3)
  ets x10 read (set) get:    74600.5 us (  1.1)
            gb_trees get:   234143.5 us (  3.4), set:   337062.8 us (  3.0)
              hashtl get:    99279.3 us (  1.4), set:   118642.3 us (  1.1)
  process dictionary get:    69277.4 us (  1.0), set:   141672.6 us (  1.3)
              rbdict get:   206904.5 us (  3.0), set:   265790.1 us (  2.4)
                trie get:   100507.5 us (  1.5), set:   437189.1 us (  3.9)

make OPTIMIZE=3
erl -noshell -pz ebin -s run test -s init stop
TEST string_key
N == 1000 (10 runs)
              aadict get:      465.3 us (  2.6), set:      790.3 us (  1.4)
                dict get:      424.2 us (  2.3), set:     1523.1 us (  2.7)
   ets (ordered_set) get:      569.2 us (  3.1), set:      798.2 us (  1.4)
           ets (set) get:      333.5 us (  1.8), set:      567.9 us (  1.0)
ets x10 read (ordere get:      598.3 us (  3.3)
  ets x10 read (set) get:      615.6 us (  3.4)
            gb_trees get:      509.1 us (  2.8), set:     1186.2 us (  2.1)
              hashtl get:      464.7 us (  2.6), set:      585.9 us (  1.0)
  process dictionary get:      224.2 us (  1.2), set:      817.1 us (  1.4)
              rbdict get:      449.9 us (  2.5), set:      639.7 us (  1.1)
                trie get:      181.9 us (  1.0), set:      839.4 us (  1.5)
N == 10000 (10 runs)
              aadict get:     8498.7 us (  1.6), set:    15190.0 us (  1.4)
                dict get:     7812.6 us (  1.5), set:    21764.6 us (  2.1)
   ets (ordered_set) get:     9503.9 us (  1.8), set:    12374.8 us (  1.2)
           ets (set) get:     8636.0 us (  1.7), set:    10541.9 us (  1.0)
ets x10 read (ordere get:     8148.1 us (  1.6)
  ets x10 read (set) get:     7765.6 us (  1.5)
            gb_trees get:     8986.5 us (  1.7), set:    17331.9 us (  1.6)
              hashtl get:     9372.1 us (  1.8), set:    15798.2 us (  1.5)
  process dictionary get:     6215.1 us (  1.2), set:    14557.3 us (  1.4)
              rbdict get:     7963.2 us (  1.5), set:    12940.2 us (  1.2)
                trie get:     5199.3 us (  1.0), set:    14114.5 us (  1.3)
N == 100000 (10 runs)
              aadict get:   203343.0 us (  3.0), set:   302129.4 us (  2.7)
                dict get:   109768.0 us (  1.6), set:   610699.3 us (  5.4)
   ets (ordered_set) get:   126994.1 us (  1.9), set:   144943.7 us (  1.3)
           ets (set) get:    87807.5 us (  1.3), set:   113148.6 us (  1.0)
ets x10 read (ordere get:    81821.7 us (  1.2)
  ets x10 read (set) get:    72250.2 us (  1.1)
            gb_trees get:   228222.0 us (  3.3), set:   336378.5 us (  3.0)
              hashtl get:    90128.9 us (  1.3), set:   128015.4 us (  1.1)
  process dictionary get:    68322.9 us (  1.0), set:   141096.2 us (  1.2)
              rbdict get:   196716.5 us (  2.9), set:   266625.2 us (  2.4)
                trie get:    99331.1 us (  1.5), set:   422425.9 us (  3.7)

make OPTIMIZE=4
erl -noshell -pz ebin -s run test -s init stop
TEST string_key
N == 1000 (10 runs)
              aadict get:      364.5 us (  2.4), set:      681.8 us (  1.1)
                dict get:      341.7 us (  2.3), set:     1184.9 us (  1.8)
   ets (ordered_set) get:      586.1 us (  3.9), set:      867.0 us (  1.3)
           ets (set) get:      336.7 us (  2.2), set:      648.8 us (  1.0)
ets x10 read (ordere get:      567.0 us (  3.8)
  ets x10 read (set) get:      559.8 us (  3.7)
            gb_trees get:      507.6 us (  3.4), set:     1226.4 us (  1.9)
              hashtl get:     1015.4 us (  6.7), set:     2740.5 us (  4.2)
  process dictionary get:      226.9 us (  1.5), set:      731.0 us (  1.1)
              rbdict get:      461.8 us (  3.1), set:      658.7 us (  1.0)
                trie get:      150.8 us (  1.0), set:      757.3 us (  1.2)
N == 10000 (10 runs)
              aadict get:     7712.4 us (  1.5), set:    15373.6 us (  1.5)
                dict get:     7464.2 us (  1.4), set:    21264.0 us (  2.1)
   ets (ordered_set) get:     8903.2 us (  1.7), set:    11940.7 us (  1.2)
           ets (set) get:     8213.1 us (  1.6), set:    10315.6 us (  1.0)
ets x10 read (ordere get:     7535.7 us (  1.4)
  ets x10 read (set) get:     6949.5 us (  1.3)
            gb_trees get:     8625.9 us (  1.6), set:    17152.0 us (  1.7)
              hashtl get:    15289.7 us (  2.9), set:    30531.0 us (  3.0)
  process dictionary get:     5988.8 us (  1.1), set:    14283.2 us (  1.4)
              rbdict get:     7701.0 us (  1.5), set:    14704.0 us (  1.4)
                trie get:     5275.3 us (  1.0), set:    14217.7 us (  1.4)
N == 100000 (10 runs)
              aadict get:   198179.8 us (  2.9), set:   292331.5 us (  2.7)
                dict get:   109865.6 us (  1.6), set:   552816.3 us (  5.0)
   ets (ordered_set) get:   135377.6 us (  2.0), set:   155204.8 us (  1.4)
           ets (set) get:    88240.8 us (  1.3), set:   110088.4 us (  1.0)
ets x10 read (ordere get:    97416.2 us (  1.4)
  ets x10 read (set) get:    74857.3 us (  1.1)
            gb_trees get:   221469.0 us (  3.2), set:   338019.2 us (  3.1)
              hashtl get:   101314.6 us (  1.5), set:   212521.6 us (  1.9)
  process dictionary get:    68286.0 us (  1.0), set:   139829.3 us (  1.3)
              rbdict get:   194049.9 us (  2.8), set:   263348.2 us (  2.4)
                trie get:    94791.4 us (  1.4), set:   399552.5 us (  3.6)
Subscribe

  • v4 UUIDs

    After updating the Erlang uuid implementation to fix various bugs, I added uuid:get_v4_urandom/0 function to utilize the 2006 Wichmann-Hill…

  • A Better Erlang Priority Queue

    The other priority queue implementations were meant to avoid the O(N) task of finding a priority within the implementation used by both Riak and…

  • An Erlang Priority Queue (continued)

    After looking at various priority queue implementation possibilities and creating 3 separate implementations, it became necessary to test the data…

  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments