Efficiently Find Row Intersections Of Two 2-d Numpy Arrays
I am trying to figure out an efficient way of finding row intersections of two np.arrays. Two arrays have the same shapes, and duplicate values in each row cannot happen. For examp
Solution 1:
If you have no duplicates within a row you can try to replicate what np.intersect1d
does under the hood (see the source code here):
>>> c = np.hstack((a, b))
>>> c
array([[2, 5, 6, 2, 3, 4],
[8, 2, 3, 7, 4, 3],
[4, 1, 5, 5, 4, 1],
[1, 7, 9, 7, 6, 9]])
>>> c.sort(axis=1)
>>> c
array([[2, 2, 3, 4, 5, 6],
[2, 3, 3, 4, 7, 8],
[1, 1, 4, 4, 5, 5],
[1, 6, 7, 7, 9, 9]])
>>> c[:, 1:] == c[:, :-1]
array([[ True, False, False, False, False],
[False, True, False, False, False],
[ True, False, True, False, True],
[False, False, True, False, True]], dtype=bool)
>>> np.sum(c[:, 1:] == c[:, :-1], axis=1)
array([1, 1, 3, 2])
Solution 2:
This answer might not be viable, because if the input has shape (N, M), it generates an intermediate array with size (N, M, M), but it's always fun to see what you can do with broadcasting:
In [43]: a
Out[43]:
array([[2, 5, 6],
[8, 2, 3],
[4, 1, 5],
[1, 7, 9]])
In [44]: b
Out[44]:
array([[2, 3, 4],
[7, 4, 3],
[5, 4, 1],
[7, 6, 9]])
In [45]: (np.expand_dims(a, -1) == np.expand_dims(b, 1)).sum(axis=-1).sum(axis=-1)
Out[45]: array([1, 1, 3, 2])
For large arrays, the method could be made more memory-friendly by doing the operation in batches.
Solution 3:
I can't think of a clean pure-numpy solution, but the following suggestion should speed things up, potentially dramatically:
- use numba. It is as simple as decorating your
get_intersect1ds
function with@autojit
- pass
assume_unique = True
when you callintersect1d
Post a Comment for "Efficiently Find Row Intersections Of Two 2-d Numpy Arrays"