pandas使用教程：pandas resample函数处理时间序列数据-灵析社区

时间序列(TimeSeries)

#创建时间序列数据
rng = pd.date_range('1/1/2012', periods=300, freq='S')

ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts

2012-01-01 00:00:00     44
2012-01-01 00:00:01     54
2012-01-01 00:00:02    132
2012-01-01 00:00:03     70
2012-01-01 00:00:04    476
                      ... 
2012-01-01 00:04:55    178
2012-01-01 00:04:56     83
2012-01-01 00:04:57    184
2012-01-01 00:04:58    223
2012-01-01 00:04:59    179
Freq: S, Length: 300, dtype: int32

时间频率转换的参数如下

resample重采样

ts.resample('1T').sum()  #按一分钟重采样之后求和
ts.resample('1T').mean() #按一分钟重采样之后求平均
ts.resample('1T').median() #按一分钟重采样之后求中位数

2023-01-01 00:00:00    275.5
2023-01-01 00:01:00    245.0
2023-01-01 00:02:00    233.5
2023-01-01 00:03:00    284.0
2023-01-01 00:04:00    245.5
Freq: T, dtype: float64

执行多个聚合

使用agg函数执行多个聚合。

ts.resample('1T').agg(['min','max', 'sum'])

	min	max	sum
2023-01-01 00:00:00	0	492	15536
2023-01-01 00:01:00	3	489	15840
2023-01-01 00:02:00	3	466	14282
2023-01-01 00:03:00	2	498	15652
2023-01-01 00:04:00	6	489	15119

上采样和填充值

上采样是下采样的相反操作。它将时间序列数据重新采样到一个更小的时间框架。例如，从小时到分钟，从年到天。结果将增加行数，并且附加的行值默认为NaN。内置的方法ffill()和bfill()通常用于执行前向填充或后向填充来替代NaN。

rng = pd.date_range('1/1/2023', periods=200, freq='H')

ts = pd.Series(np.random.randint(0, 200, len(rng)), index=rng)
ts

2023-01-01 00:00:00     16
2023-01-01 01:00:00     19
2023-01-01 02:00:00    170
2023-01-01 03:00:00     66
2023-01-01 04:00:00     33
                      ... 
2023-01-09 03:00:00     31
2023-01-09 04:00:00     61
2023-01-09 05:00:00     28
2023-01-09 06:00:00     67
2023-01-09 07:00:00    137
Freq: H, Length: 200, dtype: int32

#下采样到分钟
ts.resample('30T').asfreq()

2023-01-01 00:00:00     16.0
2023-01-01 00:30:00      NaN
2023-01-01 01:00:00     19.0
2023-01-01 01:30:00      NaN
2023-01-01 02:00:00    170.0
                       ...  
2023-01-09 05:00:00     28.0
2023-01-09 05:30:00      NaN
2023-01-09 06:00:00     67.0
2023-01-09 06:30:00      NaN
2023-01-09 07:00:00    137.0
Freq: 30T, Length: 399, dtype: float64

通过apply传递自定义功能

import numpy as np

def res(series):
    return np.prod(series)

ts.resample('30T').apply(res)

2023-01-01 00:00:00     16
2023-01-01 00:30:00      1
2023-01-01 01:00:00     19
2023-01-01 01:30:00      1
2023-01-01 02:00:00    170
                      ... 
2023-01-09 05:00:00     28
2023-01-09 05:30:00      1
2023-01-09 06:00:00     67
2023-01-09 06:30:00      1
2023-01-09 07:00:00    137
Freq: 30T, Length: 399, dtype: int32

DataFrame对象

对于DataFrame对象，关键字on可用于指定列而不是重新取样的索引

df = pd.DataFrame(data=9*[range(4)], columns=['a', 'b', 'c', 'd'])
df['time'] = pd.date_range('1/1/2000', periods=9, freq='T')
df.resample('3T', on='time').sum()

Out[81]: 
                     a  b  c  d
time                           
2000-01-01 00:00:00  0  3  6  9
2000-01-01 00:03:00  0  3  6  9
2000-01-01 00:06:00  0  3  6  9