Sunday, March 20, 2011

Constructing a python set from a numpy matrix

I'm trying to execute the following

>> from numpy import *
>> x = array([[3,2,3],[4,4,4]])
>> y = set(x)
TypeError: unhashable type: 'numpy.ndarray'

How can I easily and efficiently create a set from a numpy array?

From stackoverflow
  • The immutable counterpart to an array is the tuple, hence, try convert the array of arrays into an array of tuples:

    >> from numpy import *
    >> x = array([[3,2,3],[4,4,4]])
    
    >> x_hashable = map(tuple, x)
    
    >> y = set(x_hashable)
    set([(3, 2, 3), (4, 4, 4)])
    
  • If you want a set of the elements:

    >> y = set(e for r in x
                 for e in r)
    set([2, 3, 4])
    

    For a set of the rows:

    >> y = set(tuple(r) for r in x)
    set([(3, 2, 3), (4, 4, 4)])
    
  • If you want a set of the elements, here is another, probably faster way:

    y = set(x.flatten())
    

    PS: after performing comparisons between x.flat, x.flatten(), and x.ravel() on a 10x100 array, I found out that they all perform at the same speed. For a 3x3 array, the fastest version is the iterator version:

    y = set(x.flat)
    

    which I would recommend because it is the less memory expensive version (it scales up well with the size of the array).

    : Good suggestion! You could also use set(x.ravel()), which does the same thing but creates a copy only if needed. Or, better, use set(x.flat). x.flat is an iterator over the elements of the flattened array, but does not waste time actually flattening the array
    EOL : @musicinmybrain: very good points! Thank you!

0 comments:

Post a Comment