Create Pandas Dataframe From List Of Generators
Solution 1:
Just turn your data_list
into a generator expression as well. For example:
from collections importnamedtupleMyData= namedtuple("MyData", ["a"])
data = (d.a for d in(MyData(i)for i in range(100)))
df = pd.DataFrame(data)
will work just fine. So what you should do is have:
data = ((record.Timestamp,record.Value, record.Name, record.desc) for record in records)
df = pd.DataFrame(data, columns=["Timestamp", "Value", "Name", "Desc"])
The actual reason why your approach does not work is because you have a single entry in your data_list
which is a generator over - I suppose - 142538 records. Pandas will try to cram that single entry in your data_list
into a single row (so all the 142538 entries, each a list of four elements) and fails, since it expects rather 4 columns to be passed.
Edit: you can of course make the generator expression more complex, here's an example along the lines of your additional loop over events:
from collections importnamedtupleMyData= namedtuple("MyData", ["a", "b"])
data = ((d.a, d.b) for j in range(100)for d in(MyData(j, j+i)for i in range(100)))
pd.DataFrame(data, columns=["a", "b"])
edit: here's also an example using data structures like you are using:
Record = namedtuple("Record", ["Timestamp", "Value", "Name", "desc"])
event_list = [[Record(Timestamp=1, Value=1, Name=1, desc=1),
Record(Timestamp=2, Value=2, Name=2, desc=2)],
[Record(Timestamp=3, Value=3, Name=3, desc=3)]]
data = ((r.Timestamp, r.Value, r.Name, r.desc) for events in event_list for r in events)
pd.DataFrame(data, columns=["timestamp", "value", "name", "desc"])
Output:
timestampvalue name desc011111222223333
Solution 2:
pd.concat(some_generator_yielding_dfs)
will work (this is actually one of the tricks to alleviate the load of big tables).
E.g. one may do like this:
pd.concat((pd.read_csv(x) for x in files))
Solution 3:
Solution
- Make a
dict
with the columns you need as shown below. - Feed the dict to pandas.Dataframe
Note: The use of list(generator)
produces all the data as a list
.
import pandas as pd
import ast
# Method-1: create a dict by direct declaration
d = {
'timestamp': list(record.Timestamp),
'value': list(record.Value),
'name': list(record.Name),
'desc': list(record.desc),
}
# Method-2: create a dict using dict-comprehension
keys = ['Timestamp', 'Value', 'Name', 'desc']
d = dict((str(key).lower(), ast.literal_eval(f'list(record.{key})')) for key in keys)
# Finally create the dataframe using the dictionary
dataframe = pd.DataFrame(d).T
See Also:
Post a Comment for "Create Pandas Dataframe From List Of Generators"