Best And Efficient Way To Concat Or Append Huge Multiple Xlsx Files In Pandas
New to pandas doing some progress with self learning, so I want the best and efficient way to handle this: I have 3 sometimes more than 3 excel files '.xlsx' each one is about 100M
Solution 1:
You can use multiprocessing
to increase speed loading and use concat
merge all dfs:
import pandas as pd
import multiprocessing
import glob
import time
defread_excel(filename):
return pd.read_excel(filename)
if __name__ == "__main__":
files = glob.glob("./data/*.xlsx")
print("Sequential")
print(f"Loading excel files: {time.strftime('%H:%M:%S', time.localtime())}")
start = time.time()
data = [read_excel(filename) for filename in files]
end = time.time()
print(f"Loaded excel files in {time.strftime('%H:%M:%S', time.gmtime(end-start))}")
df_sq = pd.concat(data).reset_index(drop=True)
print("Multiprocessing")
with multiprocessing.Pool(multiprocessing.cpu_count()) as pool:
print(f"Loading excel files: {time.strftime('%H:%M:%S', time.localtime())}")
start = time.time()
data = pool.map(read_excel, files)
end = time.time()
print(f"Loaded excel files in {time.strftime('%H:%M:%S', time.gmtime(end-start))}")
df_mp = pd.concat(data).reset_index(drop=True)
Example: 50 files of 25MB (gain 2x)
SequentialLoading excel files:09:12:17Loadedexcelfilesin00:00:14MultiprocessingLoading excel files:09:12:33Loadedexcelfilesin00:00:07
Post a Comment for "Best And Efficient Way To Concat Or Append Huge Multiple Xlsx Files In Pandas"