Filter Numpy Array If Elements In Subarrays Are Repeated Position-wise In The Other Subarrays
Solution 1:
Edit:
My initial solution doesn't consistently produce the result you're looking for, example at bottom.
So here's an alternative solution, which actually iterates through the rows as seems necessary:
ar = b.copy()
new_rows = []
while ar.shape[0]:
new_rows.append(ar[0])
ar = ar[(ar != ar[0]).all(axis=1)]
np.stack(new_rows)
Out[463]:
array([[ 1, 2],
[ 2, 3],
[ 5, 6],
[ 7, 8],
[10, 1]])
Original Answer:
You can use np.unique
with the argument return_index=True
to identify rows which are the first to contain a value in a given column. You can then select these rows, in order, and do the same for the next column.
ar = b.copy()
num_cols = ar.shape[1]
for col in range(num_cols):
ar = ar[np.sort(np.unique(ar[:, col], return_index=True)[1])]
ar
Out[30]:
array([[ 1, 2],
[ 2, 3],
[ 5, 6],
[ 7, 8],
[10, 1]])
Case where original fails:
Consider ar = b[:, ::-1]
, with columns in reversed order.
Then,
num_cols = ar.shape[1]
for col in range(num_cols):
ar = ar[np.sort(np.unique(ar[:, col], return_index=True)[1])]
Gives
ar
Out[426]:
array([[ 2, 1],
[ 3, 2],
[ 6, 5],
[1, 10]])
missing the desired [8, 7]
row.
Solution 2:
Your question and example need some clarifications (why is [10, 1]
not part of the final answer? If a subarray gets eliminated, does that mean it doesn't contribute to eliminating any further subarrays?), but here's a first shot. It's not very num-pythonic (or pythonic for that matter) but all it requires is a single loop through the larger array, with a map to keep track of the numbers you've seen, and a set for each number to keep track of the indices in which it's appeared.
final_arr = []
found_nums = {}
for subarray in array:
found = False
for i in xrange(len(subarray)):
num = subarray[i]
if num in found_nums:
if i in found_nums[num]:
found = True
break
else:
found_nums[num].add(i)
else:
found_nums[num] = set([i])
if not found:
final_arr.append(subarray)
Post a Comment for "Filter Numpy Array If Elements In Subarrays Are Repeated Position-wise In The Other Subarrays"