Sort And Get Uniq Lines Of File In Python
Solution 1:
You don't need to do a sort in python since set
would take care of uniqueness even without sorting.
f = open("filename.txt", "r")
lines = set(f.readlines())
The shell sort
command would also load the lines into memory, so using that would not get you any memory savings. If you have really large files or you are adamant on not using additional memory, you can try some crazy tricks like the one shown here: http://neopythonic.blogspot.in/2008/10/sorting-million-32-bit-integers-in-2mb.html
Solution 2:
There is an iterator that does what sort does, sorted. Let's make one that mimics uniq, by only yielding lines that aren't equal to the previous line:
def uniq(iterator):
previous = float("NaN") # Not equal to anythingfor value in iterator:
if previous != value:
yield value
previous = value
Now you can do the same thing, with:
with open('/path/to/filename') as f:
for line in uniq(sorted(f)):
print(line)
BUt sorted (and shell's sort) has to store everything anyway (what if the last line in the file should be output first), so it's worse than just using set(f) instead of uniq(sorted(f)).
Solution 3:
use shell commands from python:
import osos.system("sort filename.txt | uniq | sponge filename.txt")
Solution 4:
Here is a shorter example:
withopen("filename.txt", 'r') as f:
lines = set(f)
Also, one thing, that should be noticed, that in this case, only one line at a time will be loaded into memory. The reason for this is that the above code is equivalent to:
lines = set()
f = open("filename.txt", 'r')
for line in f: # now f works as a generator of lines, reading only one line at a timelines.add(line)
Post a Comment for "Sort And Get Uniq Lines Of File In Python"