Combine Pandas String Columns With Missing Values
I need to concat the strings in 2 or more columns of a pandas dataframe. I found this answer, which works fine if you don't have any missing value. Unfortunately, I have, and this
Solution 1:
You can use apply
with if-else
:
df = df.apply(lambda x: Noneif x.isnull().all() else';'.join(x.dropna()), axis=1)
print (df)
0 val_A;val_B
1 val_B
2 val_A
3None
dtype: object
For faster solution is possible use:
#add separator and replace NaN to empty space#convert to lists
arr = df.add('; ').fillna('').values.tolist()
#list comprehension, replace empty spaces to NaN
s = pd.Series([''.join(x).strip('; ') for x in arr]).replace('^$', np.nan, regex=True)
#replace NaN to None
s = s.where(s.notnull(), None)
print (s)
0 val_A;val_B
1 val_B
2 val_A
3None
dtype: object
#40000rows
df = pd.concat([df]*10000).reset_index(drop=True)
In [70]: %%timeit
...: arr = df.add('; ').fillna('').values.tolist()
...: s = pd.Series([''.join(x).strip('; ') for x in arr]).replace('^$', np.nan, regex=True)
...: s.where(s.notnull(), None)
...:
10 loops, best of3: 74 ms per loop
In [71]: %%timeit
...: df.apply(lambda x: None if x.isnull().all() else';'.join(x.dropna()), axis=1)
...:
1 loop, best of3: 12.7 s per loop
#another solution, but slowier a bit
In [72]: %%timeit
...: arr = df.add('; ').fillna('').values
...: s = [''.join(x).strip('; ') for x in arr]
...: pd.Series([y if y !=''elseNonefor y in s])
...:
...:
10 loops, best of3: 119 ms per loop
Post a Comment for "Combine Pandas String Columns With Missing Values"