Skip to content Skip to sidebar Skip to footer

Removing Na Values From A Dataframe In Python 3.4

import pandas as pd import statistics df=print(pd.read_csv('001.csv',keep_default_na=False, na_values=[''])) print(df) I am using this code to create a data frame which has no NA

Solution 1:

I think you should import the .csv file as it is and then manipulate the data frame. Then, you can use any of the methods below.

foo[foo.notnull()]

or

foo.dropna()

Solution 2:

Method 1 :

 df[['A','C']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)

Use pandas notnull

Method 2 :

df = df[np.isfinite(df['EPS'])]

Method 3 : Using dropna Here

In [24]: df = pd.DataFrame(np.random.randn(10,3))

In [25]: df.ix[::2,0]= np.nan; df.ix[::4,1]= np.nan; df.ix[::3,2]= np.nan;

In [26]: df
Out[26]:0120NaNNaNNaN12.677677-1.466923-0.7503662NaN0.798002-0.90603830.6722010.964789NaN4NaNNaN0.0507425-1.2509700.030561-2.6786226NaN1.036043NaN70.049896-0.3080030.8232958NaNNaN0.6374829-0.3101300.078891NaN

In [27]: df.dropna()#drop all rows that have any NaN values
Out[27]:01212.677677-1.466923-0.7503665-1.2509700.030561-2.67862270.049896-0.3080030.823295

Solution 3:

I got the same error until I added axis=0 and how='any'.

df=df.dropna(axis=0, how='any')

Solution 4:

columsMissng=[]
for i in columns:
   c=df.loc[df[i] == '?', i].count();
   columsMissng.append((i,c));
c=0
dropcolumsMissng=[]
for i in columsMissng:
    if i[1]>20000:
        count=count+1;
        dropcolumsMissng.append(i[0])
newDF=df.drop(columns=dropcolumsMissng)

In place of '?' you can put any value you want to count and if i[1]>20000: you can put your threshold like 50% of data or anything you want.

In case you want to remove 'NaN'

c=newDF.columns.values
dropcolumsMissng=[]

for i in columns:
    num_nans = len(newDF) - newDF[i].count()
    if num_nans>20000:
        dropcolumsMissng.append(i)
newDF=newDF.drop(columns=dropcolumsMissng)

Post a Comment for "Removing Na Values From A Dataframe In Python 3.4"