Skip to content Skip to sidebar Skip to footer

Fastest Method Of Getting K Smallest Numbers In Unsorted List Of Size N In Python?

What is the fastest method to get the k smallest numbers in an unsorted list of size N using python? Is it faster to sort the big list of numbers, and then get the k smallest numbe

Solution 1:

You could use a heap queue; it can give you the K largest or smallest numbers out of a list of size N in O(NlogK) time.

The Python standard library includes the heapq module, complete with a heapq.nsmallest() function ready implemented:

importheapqk_smallest= heapq.nsmallest(k, input_list)

Internally, this creates a heap of size K with the first K elements of the input list, then iterating over the remaining N-K elements, pushing each to the heap, then popping off the largest one. Such a push and pop takes log K time, making the overall operation O(NlogK).

The function also optimises the following edge cases:

  • If K is 1, the min() function is used instead, giving you a O(N) result.
  • If K >= N, the function uses sorting instead, since O(NlogN) would beat O(NlogK) in that case.

A better option is to use the introselect algorithm, which offers an O(n) option. The only implementation I am aware of is using the numpy.partition() function:

import numpy

# assuming you have a python list, you need to convert to a numpy array firstarray = numpy.array(input_list)
# partition, slice back to the k smallest elements, convert back to a Python list
k_smallest = numpy.partition(array, k)[:k].tolist()

Apart from requiring installation of numpy, this also takes N memory (versus K for heapq), as a copy of the list is created for the partition.

If you only wanted indices, you can use, for either variant:

heapq.nsmallest(k, range(len(input_list)), key=input_list.__getitem__)  # O(NlogK)
numpy.argpartition(numpy.array(input_list), k)[:k].tolist()  # O(N)

Solution 2:

If the list of the kth smallest numbers doesn't need to be sorted, this can be done in O(n) time with a selection algorithm like introselect. The standard library doesn't come with one, but NumPy has numpy.partition for the job:

partitioned = numpy.partition(l, k)
# The subarray partitioned[:k] now contains the k smallest elements.

Solution 3:

You might want to take a look at heapq:

In [109]: L = [random.randint(1,1000) for _ in range(100)]

In [110]: heapq.nsmallest(10, L)
Out[110]: [1, 17, 17, 19, 24, 37, 37, 45, 63, 73]

Solution 4:

EDIT: this assumes that the list is immutable. If the list is an array and can be modified there are linear methods available.

You can get the complexity down to O(n * log k) by using a heap of size k + 1.

  1. Initially get the first k elements into a min-heap.
  2. For every subsequent element, add the element as a leaf and heapify.
  3. Replace the last element with the next element.

Heapify can be done in logarithmic time and hence the time complexity is as above.

Solution 5:

You can do it in O(kn) with a selection algorithm. Once kn >= n log n, switch to sorting. That said, the constant on the selection algorithm tends to be a lot higher than the one on quicksort, so you really need to compare i (kn) and j (n log n). In practice, it's usually more desirable to just sort unless you're dealing with large n or very small k.

Edit: see comments. It's actually a lot better.

Post a Comment for "Fastest Method Of Getting K Smallest Numbers In Unsorted List Of Size N In Python?"