Loading Huge XLS Data Into Oracle Using Python
Solution 1:
If is possible to export your excel fila as a CSV, then all you need is to use sqlldr
to load the file in db
Solution 2:
Here are all of the steps: load xlsx, produce csv (tab delimited) and ctrl file, load with sqlldr.
# %%
import sys
import pandas as pd
import subprocess
# %%
user = 'in_user_name'
password = 'in_password'
host = 'in_host'
database = 'in_service_name'
in_file = r"in_file.xlsx"
in_sheet_name = 'in_sheet'
tablename = 'in_table'
# %%
df = pd.read_excel(in_file, sheet_name=in_sheet_name)
print(f"Loaded {df.shape[0]} records from {in_file}")
# %%
inflie = f'{tablename}.csv'
controlfile = f'{tablename}.ctrl'
# %%,
df.to_csv(inflie, index=False, sep='\t',)
# %%
columns = df.columns.tolist()
with open(controlfile, 'w') as file:
header = f"""OPTIONS (SKIP=1, DIRECT=TRUE )
LOAD DATA
INFILE '{inflie}'
BADFILE '{tablename}.bad'
DISCARDFILE '{tablename}.dsc'
TRUNCATE
INTO TABLE {tablename}
FIELDS TERMINATED BY X'9'
TRAILING NULLCOLS
( """
file.write(header)
for c in columns[:-1]:
file.write(f'{c},\n')
file.write(f'{columns[-1]})')
# %%
sqlldr_command = f"""sqlldr USERID='{user}/{password}@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST={host})(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME ={database}) ))' control={controlfile}"""
print(f"Running sqlldr. Log file: {tablename}.log")
subprocess.call(sqlldr_command, shell=True)
Solution 3:
Excel also comes with ODBC support so you could pump straight from Excel to Oracle assuming you have the drivers. That said, anything that involves transforming a large amount of data in memory (from whatever Excel is using internally) and then passing it to the DB is likely to be less performant than a specialised bulk operation which can be optimised to use less memory. Going through Python just adds another layer to the task (Excel to Python to Oracle), though it might be possible to set this up to use streams.
Solution 4:
Basically for high volume data any language going to be stressed on I/O, except C. Best way is to use native tools/utilities provided by the DB vendor. For oracle the correct fit is SQL Loader.
Refer this link for quick tutorial http://www.thegeekstuff.com/2012/06/oracle-sqlldr/
Here you go… Sample code that runs SQL Loader and gets you back with return code, output & error
sql_ld_command = ['sqlldr ', 'uid/passwd', 'CONTROL=',
'your_ctrl_file_path', 'DATA=', 'your_data_file_path']
sql_ldr_proc = subprocess.Popen(sql_ld_command, stdin=subprocess.PIPE,stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = sql_ldr_proc.communicate()
retn_code = sql_ldr_proc.wait()
Solution 5:
Automate the export of XLSX to CSV as mentioned in a previous answer. But, instead of then calling a sqlldr script, create an external table that uses your sqlldr code. It will load your table from the CSV each time the table is selected from.
Post a Comment for "Loading Huge XLS Data Into Oracle Using Python"