Skip to content Skip to sidebar Skip to footer

How To Populate Array With Multiple Rows From Csv File Using Python Pandas

I am importing a CSV file using pandas, CSV Column header - Year, Model, Trim, Result The values coming in from the csv file are as follows - Year | Model | Trim | Result 20

Solution 1:

To elaborate on my comment, suppose you have some DataFrame consisting of non-integer values:

>>>df = pd.DataFrame([[np.random.choice(list('abcdefghijklmnop')) for _ inrange(3)] for _ inrange(10)])>>>df
   0  1  2
0  j  p  j
1  d  g  b
2  n  m  f
3  o  b  j
4  h  c  a
5  p  m  n
6  c  c  l
7  o  d  e
8  b  g  h
9  h  o  k

And there is also an output:

>>>df['output'] = np.random.randint(0,2,10)>>>df
   0  1  2  output
0  j  p  j       0
1  d  g  b       0
2  n  m  f       1
3  o  b  j       1
4  h  c  a       1
5  p  m  n       0
6  c  c  l       1
7  o  d  e       0
8  b  g  h       1
9  h  o  k       0

To convert all the string values to integers, use np.unique with return_inverse=True, this inverse will be the array you need, just keep in mind, you need to reshape (because np.unique will have flattened it):

>>> unique, inverse  = np.unique(df.iloc[:,:3].values, return_inverse=True)
>>> unique
array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n',
       'o', 'p'], dtype=object)
>>> inverse
array([ 8, 14,  8,  3,  6,  1, 12, 11,  5, 13,  1,  8,  7,  2,  0, 14, 11,
       12,  2,  2, 10, 13,  3,  4,  1,  6,  7,  7, 13,  9])
>>> input = inverse.reshape(df.shape[0], df.shape[1] - 1)
>>> input
array([[ 8, 14,  8],
       [ 3,  6,  1],
       [12, 11,  5],
       [13,  1,  8],
       [ 7,  2,  0],
       [14, 11, 12],
       [ 2,  2, 10],
       [13,  3,  4],
       [ 1,  6,  7],
       [ 7, 13,  9]])

And you can always go back:

>>> unique[input]
array([['j', 'p', 'j'],
       ['d', 'g', 'b'],
       ['n', 'm', 'f'],
       ['o', 'b', 'j'],
       ['h', 'c', 'a'],
       ['p', 'm', 'n'],
       ['c', 'c', 'l'],
       ['o', 'd', 'e'],
       ['b', 'g', 'h'],
       ['h', 'o', 'k']], dtype=object)

To get an array for the output, again, you simply use the .values of the df taking the appropriate column -- since these are already numpy arrays!

>>>output = df['output'].values>>>output
array([0, 0, 1, 1, 1, 0, 1, 0, 1, 0])

You might want to reshape it, depending on what libraries you are going to use for analysis (sklearn, scipy, etc):

>>> output.reshape(output.size, 1)
array([[0],
       [0],
       [1],
       [1],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0]])

Post a Comment for "How To Populate Array With Multiple Rows From Csv File Using Python Pandas"