Skip to content Skip to sidebar Skip to footer

Python, Comparing Two Files

Suppose I have two (huge) files. One contains a list of words. Another contains a list of words followed by some numbers; i.e., the format is like this: file 1: word1 word2 ..

Solution 1:

This will only work if the files are in the same order, and the words in file 1 are are purely a subset of words in file 2:

def gen_overlap(file1, file2):
    for word in file1:
        line = file2.read()
        while word not in line:
            line = file2.read()
        yield line

If they fail to meet either of those conditions, the best method is to create a set of all of the words:

gen_overlap(file1, file2):
    word_set = set(line.split() for line in file1)
    for line in file2:
        if line.split()[0] in word_set:
            yield line

Solution 2:

Use this :-

def file_comp(a_file,b_file):
    with open(a_file,'r') as file1,open(b_file,'r') as file2:
        read1 = file1.read()
        read2 = file2.read()
        return([i for i in read2.split('\n') if i.split(" ")[0] in read1.split('\n')])
print(file_comp('file_1.txt','file_2.txt'))

Post a Comment for "Python, Comparing Two Files"