Can't Properly Read Sql Table In Python: Varchar Columns Imported As Comma-separated Characters / Tuples
Solution 1:
This seems to be a problem when using jaydebeapi with jpype. I can reproduce this when connecting to a Oracle db in the same way that you do (in my case Oracle 11gR2, but since you are using ojdbc8.jar, I guess it also happens with other versions).
There are different ways you can solve this:
Change your connection
Since the error only seems to occur in a specific combination of packages, the most sensible thing to do is to try and avoid these and thus the error altogether.
Alternative 1: Use
jaydebeapiwithoutjpype:As noted, I only observe this when using
jaydebeapiwithjpype. However, in my case,jpypeis not needed at all. I have the.jarfile locally and my connection works fine without it:import jaydebeapi as jdba import pandas as pd import os db_host = 'db.host.com' db_port = 1521 db_sid = 'YOURSID' jar=os.getcwd()+'/ojdbc6.jar' conn = jdba.connect('oracle.jdbc.driver.OracleDriver', 'jdbc:oracle:thin:@' + db_host + ':' + str(db_port) + ':' + db_sid, {'user': 'USERNAME', 'password': 'PASSWORD'}, jar ) df_jay = pd.read_sql('SELECT * FROM YOURSID.table1', conn) conn.close()In my case, this works fine and creates the dataframes normally.
Alternative 2: Use
cx_Oracleinstead:The issue also does not occur if I use
cx_Oracleto connect to the Oracle db:import cx_Oracle import pandas as pd import os db_host = 'db.host.com' db_port = 1521 db_sid = 'YOURSID' dsn_tns = cx_Oracle.makedsn(db_host, db_port, db_sid) cx_conn = cx_Oracle.connect('USERNAME', 'PASSWORD', dsn_tns) df_cxo = pd.read_sql('SELECT * FROM YOURSID.table1', con=cx_conn) cx_conn.close()Note: For
cx_Oracleto work you have to have the Oracle Instant Client installed and properly set up (see e.g. cx_Oracle documentation for Ubuntu).
Fix dataframe after the fact:
If for some reason, you cannot use the above connection alternatives, you can also transform your dataframe.
Alternative 3: join tuple entries:
You can use
''.join()to convert tuples to strings. You need to do this for the entries and the column names.# for all entries that are not None, join the tuplesfor col in df.select_dtypes(include=['object']).columns: df[col] = df[col].apply(lambda x: ''.join(x) if x isnotNoneelse x) # also rename the column headings in the same way df.rename(columns=lambda x: ''.join(x) if x isnotNoneelse x, inplace=True)Alternative 4: change dtype of columns:
By changnig the
dtypeof an affected column fromobjecttostring, all entries will also be converted. Note that this may have unwanted side-effects, like e.g. changingNonevalues to the string<N/A>. Also, you will have to rename the column headings separately, as above.for col in df.select_dtypes(include=['object']).columns: df[col] = df[col].astype('string') # again, rename headings df.rename(columns=lambda x: ''.join(x) if x isnotNoneelse x, inplace=True)
All of these should yield more or less the same df in the end (apart from the dtypes and possible replacement of None values):
+---+---------+---------+---------+
| | COLUMN1 | COLUMN2 | COLUMN3 |
+---+---------+---------+---------+
| 1 | test | test2 | 1 |
+---+---------+---------+---------+
| 2 | foo | bar | 100 |
+---+---------+---------+---------+
Post a Comment for "Can't Properly Read Sql Table In Python: Varchar Columns Imported As Comma-separated Characters / Tuples"