Monday, September 30, 2013

Are MATLAB containers.Map more efficient?

UPDATE 2015-02-10 I learned something interesting recently that sheds some light on the answer to this question. I believe that MATLAB's containers.Map is a wrapper around Java's java.util.HashMap. A hash map or hash table is a zeroth order look-up table that uses a hash of the look-up key or index for tables as the address in memory that contains the corresponding value. Therefore it doesn't matter how big the hash map or hash table is, MATLAB (or Java) can immediately return the contents given the key. Since the hash is guaranteed to be unique for any given index or key, the memory address of the value will also be unique. So the answer to the question, "Are MATLAB containers.Map more efficient?" is absolutely and without any doubt a loud and resounding, "Yes!" QED

MATLAB's containers.Map's are similar to a Java HashMap or a Python dictionary. They have been around at least since 2008b believe it or not. They were billed as
Fast Key Lookup Provided with New Map Data Structure
when they were introduced. See Programming Fundamentals in 2008b Release Notes. How much memory do they realy take up? MATLAB doesn't say. It seems like they're smaller, but you'll soon see that every containers.Map is always 112 Bytes. How much faster can they really be? My guess is they are really only more efficient for really large data, that are only nested with other containers.Map's of primitive types (IE integers, doubles, &c.).

They have some limitations in subassignment and indexing.

For example:
  • If the value of map('k1') is also a containers.Map,
  • map('k1')('k2')
    Error: ()-indexing must appear last in an index expression.
    then this is invalid because MATLAB only allows one pair of parentheses which must appear last. So you'll have to split this into two commands. This also allows assignment into the nested containers.Map ...
    val_is_map = map('k1') % make a temp handle to the top level containers.Map
    val_is_map('k2') = new_value % assignment using the handle also changes the containers.Map it points to
    test = map('k1') % make another handle to test the original containers.Map was updated
    test('k2') == new_value % true
    ... because containers.Map's are handles (IE pointers) so changing the value of a copied containers.Map changes the source.
  • If the value of map('k1') is a structure,
  • map('k1').field = 'foo'
    Error using containers.Map/subsasgn
    Only one level of indexing is supported by a containers.Map.
    then this is also not valid because only the top level of containers.Map can be assigned.
  • Finally if map('k1') is a cell array, then good luck trying to index it.
>> dict = containers.Map({'all','amend','author','msg'}, ...
       {{'-a','--all',true}, ...
       {[],'--amend',true}, ...
       {[],'--author',true}, ...
       {'-m','--message',false}})

dict = 

  Map with properties:

        Count: 4
      KeyType: char
    ValueType: any

>> s = struct('all',{'-a','--all',true}, ...
       'amend',{[],'--amend',true}, ...
       'author',{[],'--author',true}, ...
       'msg',{'-m','--message',false})

s = 

1x3 struct array with fields:

    all
    amend
    author
    msg

>> c1 = {{'all','-a','--all',true}, ...
       {'amend',[],'--amend',true}, ...
       {'author',[],'--author',false}, ...
       {'msg','-m','--message',false}}

c1 = 

    {1x4 cell}    {1x4 cell}    {1x4 cell}    {1x4 cell}

>> c2 = {'all','-a','--all',true; ...
       'amend',[],'--amend',true; ...
       'author',[],'--author',false; ...
       'msg','-m','--message',false}

c2 = 

    'all'       '-a'    '--all'        [1]
    'amend'       []    '--amend'      [1]
    'author'      []    '--author'     [0]
    'msg'       '-m'    '--message'    [0]

>> whos
  Name      Size            Bytes  Class

  dict      4x1               112  containers.Map
  s         1x3              1670  struct
  c1        1x4              2344  cell
  c2        4x4              1896  cell
Fork me on GitHub