Finding Intersection Of Two Matrices In Python Within A Tolerance?
I'm looking for the most efficient way of finding the intersection of two different-sized matrices. Each matrix has three variables (columns) and a varying number of observations (
Solution 1:
If you don't mind working with NumPy arrays, you could exploit broadcasting
for a vectorized solution. Here's the implementation -
# Set tolerance valuesforeachcolumn
tol = [1, 2, 10]
# Get absolute differences between a and b keeping their columns aligned
diffs = np.abs(np.asarray(a[:,None]) - np.asarray(b))
# Compare eachrowwith the triplet from `tol`.
# Get mask ofall matching rowsand finally get the matching indices
x1,x2 = np.nonzero((diffs < tol).all(2))
Sample run -
In [46]: # Inputs
...: a=np.matrix('1 5 1003; 2 4 1002; 4 3 1008; 8 1 2005')
...: b=np.matrix('7 9 1006; 4 4 1007; 7 7 1050; 8 2 2003; 9 9 3000; 7 7 1000')
...:
In [47]: # Set tolerance valuesforeachcolumn
...: tol = [1, 2, 10]
...:
...: # Get absolute differences between a and b keeping their columns aligned
...: diffs = np.abs(np.asarray(a[:,None]) - np.asarray(b))
...:
...: # Compare eachrowwith the triplet from `tol`.
...: # Get mask ofall matching rowsand finally get the matching indices
...: x1,x2 = np.nonzero((diffs < tol).all(2))
...:
In [48]: x1,x2
Out[48]: (array([2, 3]), array([1, 3]))
Large datasizes case : If you are working with huge datasizes that cause memory issues and since you already know that the number of columns is a small number 3
, you might want to have a minimal loop of 3
iterations and save huge memory footprint, like so -
na = a.shape[0]
nb = b.shape[0]
accum = np.ones((na,nb),dtype=bool)
for i in range(a.shape[1]):
accum &= np.abs((a[:,i] - b[:,i].ravel())) < tol[i]
x1,x2 = np.nonzero(accum)
Post a Comment for "Finding Intersection Of Two Matrices In Python Within A Tolerance?"