Python Pandas Dataframe Words In Context: Get 3 Words Before And After
I am working in jupyter notebook and have a pandas dataframe 'data': Question_ID | Customer_ID | Answer 1 234 Data is very important to use because ...
Solution 1:
This may work:
import pandas as pd
import re
df = pd.read_csv('data.csv')
for value in df.Answer.values:
non_data = re.split('Data|data', value) # split text removing "data"
terms_list = [term for term in non_data iflen(term) > 0] # skip empty terms
substrs = [term.split()[0:3] for term in terms_list] # slice and grab first three terms
result = [' '.join(term) for term in substrs] # combine the terms back into substringsprint result
output:
['is very important']['We value', 'since we need']
Solution 2:
The solution using generator expression, re.findall
and itertools.chain.from_iterable
functions:
import pandas as pd, re, itertools
data = pd.read_csv('test.csv') # change with your current file path
data_adjacents = ((i for sublist in (list(filter(None,t))
for t in re.findall(r'(\w*?\s*\w*?\s*\w*?\s+)(?=\bdata\b)|(?<=\bdata\b)(\s+\w*\s*\w*\s*\w*)', l, re.I)) for i in sublist)
for l in data.Answer.tolist())
print(list(itertools.chain.from_iterable(data_adjacents)))
The output:
[' is very important', 'We value ', ' since we need']
Post a Comment for "Python Pandas Dataframe Words In Context: Get 3 Words Before And After"