Breeding: Python Pandas Timeseries How to find the largest sequence where the values is higher than a specific value -

Tuesday, 15 May 2012

Python Pandas Timeseries How to find the largest sequence where the values is higher than a specific value -

how find largest sequence in timeseries. illustration have dataframe this:

index      value  1-1-2012   10 1-2-2012   14 1-3-2012   15 1-4-2012   8 1-5-2012   7 1-6-2012   16 1-7-2012   17 1-8-2012   18

now want longest sequence: here sequence 1-6-2012 until 1-8-2012 3 entries.

thanks anja

this bit clunky job. didn't specify 'specific value' mentioned in title, take 12.

import pandas pd  time_indecies = pd.date_range(start='2012-01-01', end='2012-08-01', freq='ms')   info = [10, 14, 15, 8, 7, 16, 17, 18] df = pd.dataframe({'vals': data, 't_indices': time_indecies })  threshold = 12 df['tag'] = df.vals > threshold  #  create df hold info each  part regs_above_thresh = pd.dataframe()  # first row of consecutive  part true preceded false in tags regs_above_thresh['start_idx']  = \     df.index[df['tag'] & ~ df['tag'].shift(1).fillna(false)]  #  lastly row of consecutive  part false preceded true    regs_above_thresh['end_idx']  = \    df.index[df['tag'] & ~ df['tag'].shift(-1).fillna(false)]   # how long each  part regs_above_thresh['spans'] = \     [(spam[0] - spam[1] + 1) spam in \     zip(regs_above_thresh['end_idx'], regs_above_thresh['start_idx'])]  # index of  part longest span       max_idx = regs_above_thresh['spans'].argmax()  # can start , end points of longest  part original dataframe  df.ix[regs_above_thresh.ix[max_idx][['start_idx', 'end_idx']].values]

the consecutive part cleverness behzad.nouri's solution here.

python pandas

Breeding

Tuesday, 15 May 2012

Python Pandas Timeseries How to find the largest sequence where the values is higher than a specific value -

No comments:

Post a Comment