How To Populate Array With Multiple Rows From Csv File Using Python Pandas
I am importing a CSV file using pandas, CSV Column header - Year, Model, Trim, Result The values coming in from the csv file are as follows - Year | Model | Trim | Result 20
Solution 1:
To elaborate on my comment, suppose you have some DataFrame consisting of non-integer values:
>>>df = pd.DataFrame([[np.random.choice(list('abcdefghijklmnop')) for _ inrange(3)] for _ inrange(10)])>>>df
0 1 2
0 j p j
1 d g b
2 n m f
3 o b j
4 h c a
5 p m n
6 c c l
7 o d e
8 b g h
9 h o k
And there is also an output:
>>>df['output'] = np.random.randint(0,2,10)>>>df
0 1 2 output
0 j p j 0
1 d g b 0
2 n m f 1
3 o b j 1
4 h c a 1
5 p m n 0
6 c c l 1
7 o d e 0
8 b g h 1
9 h o k 0
To convert all the string values to integers, use np.unique
with return_inverse=True
, this inverse will be the array you need, just keep in mind, you need to reshape (because np.unique
will have flattened it):
>>> unique, inverse = np.unique(df.iloc[:,:3].values, return_inverse=True)
>>> unique
array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n',
'o', 'p'], dtype=object)
>>> inverse
array([ 8, 14, 8, 3, 6, 1, 12, 11, 5, 13, 1, 8, 7, 2, 0, 14, 11,
12, 2, 2, 10, 13, 3, 4, 1, 6, 7, 7, 13, 9])
>>> input = inverse.reshape(df.shape[0], df.shape[1] - 1)
>>> input
array([[ 8, 14, 8],
[ 3, 6, 1],
[12, 11, 5],
[13, 1, 8],
[ 7, 2, 0],
[14, 11, 12],
[ 2, 2, 10],
[13, 3, 4],
[ 1, 6, 7],
[ 7, 13, 9]])
And you can always go back:
>>> unique[input]
array([['j', 'p', 'j'],
['d', 'g', 'b'],
['n', 'm', 'f'],
['o', 'b', 'j'],
['h', 'c', 'a'],
['p', 'm', 'n'],
['c', 'c', 'l'],
['o', 'd', 'e'],
['b', 'g', 'h'],
['h', 'o', 'k']], dtype=object)
To get an array for the output, again, you simply use the .values
of the df
taking the appropriate column -- since these are already numpy
arrays!
>>>output = df['output'].values>>>output
array([0, 0, 1, 1, 1, 0, 1, 0, 1, 0])
You might want to reshape it, depending on what libraries you are going to use for analysis (sklearn, scipy, etc):
>>> output.reshape(output.size, 1)
array([[0],
[0],
[1],
[1],
[1],
[0],
[1],
[0],
[1],
[0]])
Post a Comment for "How To Populate Array With Multiple Rows From Csv File Using Python Pandas"