Skip to content Skip to sidebar Skip to footer

Choosing The Minumum Distance Part 2

This question is already here, but now I have added an extra part to the previous question. I have the following dataframe: data = {'id': [0, 0, 0, 0, 0, 0], 'time_order': ['2019-0

Solution 1:

import pandas as pd
import datetime

data = {'id': [0, 0, 0, 0, 0, 0],
'time_order': ['2019-01-01 0:00:00', '2019-01-01 00:11:00', '2019-01-02 00:04:00', '2019-01-02 00:15:00', '2019-01-03 00:07:00', '2019-01-03 00:10:00']}

df_data = pd.DataFrame(data)

df_data['time_order'] = pd.to_datetime(df_data['time_order'])
df_data['day_order'] = df_data['time_order'].dt.strftime('%Y-%m-%d')
df_data['time'] = df_data['time_order'].dt.strftime('%H:%M:%S')

x = '00:00:00'
y = '00:15:00'
s = '00:00:00'

tw = 900
begin = pd.Timestamp(s).to_pydatetime()



for k inrange(10): # 10 times shift will happen
    begin1 = begin + datetime.timedelta(seconds=int(k*60))
    last = begin1 + datetime.timedelta(seconds=int(tw))
    x = begin1.strftime('%H:%M:%S')
    y = last.strftime('%H:%M:%S')
    print('\n========\n',x,y)
    
    diff = (pd.Timedelta(y)-pd.Timedelta(x))/2
    df_data2 = df_data[(last>=pd.to_datetime(df_data['time'])) & (pd.to_datetime(df_data['time'])>begin1)].copy()
    #print(df_data2)
    
    df_data2['diff'] = abs(df_data2['time'] - (diff + pd.Timedelta(x)))
    
    mins = df_data2.groupby('day_order').apply(lambda z: z[z['diff']==min(z['diff'])])

    mins.reset_index(drop=True, inplace=True)

    print(mins)

Output after first 10 shifts:

========00:00:0000:15:00idtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:03:30102019-01-02 00:04:00  2019-01-02  00:04:000days00:03:30202019-01-03 00:07:00  2019-01-03  00:07:000days00:00:30========00:01:0000:16:00idtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:02:30102019-01-02 00:04:00  2019-01-02  00:04:000days00:04:30202019-01-03 00:07:00  2019-01-03  00:07:000days00:01:30302019-01-03 00:10:00  2019-01-03  00:10:000days00:01:30========00:02:0000:17:00idtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:01:30102019-01-02 00:04:00  2019-01-02  00:04:000days00:05:30202019-01-02 00:15:00  2019-01-02  00:15:000days00:05:30302019-01-03 00:10:00  2019-01-03  00:10:000days00:00:30========00:03:0000:18:00idtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:00:30102019-01-02 00:15:00  2019-01-02  00:15:000days00:04:30202019-01-03 00:10:00  2019-01-03  00:10:000days00:00:30========00:04:0000:19:00idtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:00:30102019-01-02 00:15:00  2019-01-02  00:15:000days00:03:30202019-01-03 00:10:00  2019-01-03  00:10:000days00:01:30========00:05:0000:20:00idtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:01:30102019-01-02 00:15:00  2019-01-02  00:15:000days00:02:30202019-01-03 00:10:00  2019-01-03  00:10:000days00:02:30========00:06:0000:21:00idtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:02:30102019-01-02 00:15:00  2019-01-02  00:15:000days00:01:30202019-01-03 00:10:00  2019-01-03  00:10:000days00:03:30========00:07:0000:22:00idtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:03:30102019-01-02 00:15:00  2019-01-02  00:15:000days00:00:30202019-01-03 00:10:00  2019-01-03  00:10:000days00:04:30========00:08:0000:23:00idtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:04:30102019-01-02 00:15:00  2019-01-02  00:15:000days00:00:30202019-01-03 00:10:00  2019-01-03  00:10:000days00:05:30========00:09:0000:24:00idtime_orderday_ordertimediff002019-01-01 00:11:00  2019-01-01  00:11:000days00:05:30102019-01-02 00:15:00  2019-01-02  00:15:000days00:01:30202019-01-03 00:10:00  2019-01-03  00:10:000days00:06:30

Now if you see, there were some iteration where there were 4 rows generated in output. If you see in the diff column you would find that, there could be pairs of rows that can have same time difference. This is due to the fact that we are considering positive and negative time difference as same.

So for example in the above output, the second iteration i.e. 00:01:00 to 00:16:00 we can see that there are two entries for 2019-01-03

202019-01-03 00:07:00  2019-01-03  00:07:000days00:01:30302019-01-03 00:10:00  2019-01-03  00:10:000days00:01:30

And this is because both of their difference are of 00:01:30. The mid for this range will be at 00:01:00 + 00:07:30 = 00:08:30

00:07:30 <----(- 01:30)----00:08:30---(+ 01:30)--->00:10:00

And that's why both orders were displayed

Post a Comment for "Choosing The Minumum Distance Part 2"