Filter Numpy Array If Elements In Subarrays Are Repeated Position-wise In The Other Subarrays

December 31, 2022 Post a Comment

Unluckily it is terribly similar to: Filter a numpy array if any list within it contains at least one value of a previous row which is a question I asked some minutes ago. In this

Solution 1:

Edit:

My initial solution doesn't consistently produce the result you're looking for, example at bottom.

So here's an alternative solution, which actually iterates through the rows as seems necessary:

ar = b.copy()
new_rows = []
while ar.shape[0]:
    new_rows.append(ar[0])
    ar = ar[(ar != ar[0]).all(axis=1)]
np.stack(new_rows)
Out[463]:
array([[ 1,  2],
       [ 2,  3],
       [ 5,  6],
       [ 7,  8],
       [10,  1]])

Original Answer:

You can use np.unique with the argument return_index=True to identify rows which are the first to contain a value in a given column. You can then select these rows, in order, and do the same for the next column.

ar = b.copy()
num_cols = ar.shape[1]
for col in range(num_cols):
    ar = ar[np.sort(np.unique(ar[:, col], return_index=True)[1])]
ar
Out[30]: 
array([[ 1,  2],
       [ 2,  3],
       [ 5,  6],
       [ 7,  8],
       [10,  1]])

Case where original fails:

Consider ar = b[:, ::-1], with columns in reversed order.

Then,

num_cols = ar.shape[1]
    for col in range(num_cols):
        ar = ar[np.sort(np.unique(ar[:, col], return_index=True)[1])]

Gives

ar
Out[426]: 
array([[ 2,  1],
       [ 3,  2],
       [ 6,  5],
       [1,  10]])

missing the desired [8, 7] row.

Solution 2:

Your question and example need some clarifications (why is [10, 1] not part of the final answer? If a subarray gets eliminated, does that mean it doesn't contribute to eliminating any further subarrays?), but here's a first shot. It's not very num-pythonic (or pythonic for that matter) but all it requires is a single loop through the larger array, with a map to keep track of the numbers you've seen, and a set for each number to keep track of the indices in which it's appeared.

final_arr = []
found_nums = {}
for subarray in array:
    found = False
    for i in xrange(len(subarray)):
        num = subarray[i]
        if num in found_nums:
            if i in found_nums[num]:
                found = True
                break
            else:
                found_nums[num].add(i)
        else:
            found_nums[num] = set([i])
    if not found:
        final_arr.append(subarray)

Python Courses, Training, and Tutorials