Python Pandas Timeseries How to find the largest sequence where the values is higher than a specific value -
how find largest sequence in timeseries. illustration have dataframe this:
index value 1-1-2012 10 1-2-2012 14 1-3-2012 15 1-4-2012 8 1-5-2012 7 1-6-2012 16 1-7-2012 17 1-8-2012 18 now want longest sequence: here sequence 1-6-2012 until 1-8-2012 3 entries.
thanks anja
this bit clunky job. didn't specify 'specific value' mentioned in title, take 12.
import pandas pd time_indecies = pd.date_range(start='2012-01-01', end='2012-08-01', freq='ms') info = [10, 14, 15, 8, 7, 16, 17, 18] df = pd.dataframe({'vals': data, 't_indices': time_indecies }) threshold = 12 df['tag'] = df.vals > threshold # create df hold info each part regs_above_thresh = pd.dataframe() # first row of consecutive part true preceded false in tags regs_above_thresh['start_idx'] = \ df.index[df['tag'] & ~ df['tag'].shift(1).fillna(false)] # lastly row of consecutive part false preceded true regs_above_thresh['end_idx'] = \ df.index[df['tag'] & ~ df['tag'].shift(-1).fillna(false)] # how long each part regs_above_thresh['spans'] = \ [(spam[0] - spam[1] + 1) spam in \ zip(regs_above_thresh['end_idx'], regs_above_thresh['start_idx'])] # index of part longest span max_idx = regs_above_thresh['spans'].argmax() # can start , end points of longest part original dataframe df.ix[regs_above_thresh.ix[max_idx][['start_idx', 'end_idx']].values] the consecutive part cleverness behzad.nouri's solution here.
python pandas
No comments:
Post a Comment