Wednesday, 15 September 2010

python - Constructing 3D Pandas DataFrame -



python - Constructing 3D Pandas DataFrame -

i'm having difficulty constructing 3d dataframe in pandas. want this

a b c start end start end start end ... 7 20 42 52 90 101 11 21 213 34 56 74 9 45 45 12

where a, b, etc top-level descriptors , start , end subdescriptors. numbers follow in pairs , there aren't same number of pairs a, b etc. observe a has 4 such pairs, b has 1, , c has 3.

i'm not sure how proceed in constructing dataframe. modifying this illustration didn't give me designed output:

import numpy np import pandas pd = np.array(['one', 'one', 'two', 'two', 'three', 'three']) b = np.array(['start', 'end']*3) c = [np.random.randint(10, 99, 6)]*6 df = pd.dataframe(zip(a, b, c), columns=['a', 'b', 'c']) df.set_index(['a', 'b'], inplace=true) df

yielded:

c b 1 start [22, 19, 16, 20, 63, 54] end [22, 19, 16, 20, 63, 54] 2 start [22, 19, 16, 20, 63, 54] end [22, 19, 16, 20, 63, 54] 3 start [22, 19, 16, 20, 63, 54] end [22, 19, 16, 20, 63, 54]

is there way of breaking lists in c own columns?

edit: construction of c important. looks following:

c = [[7,11,56,45], [20,21,74,12], [42], [52], [90,213,9], [101, 34, 45]]

and desired output 1 @ top. represents starting , ending points of subsequences within sequence (a, b. c different sequences). depending on sequence itself, there differing number of subsequences satisfy given status i'm looking for. result, there differing number of start:end pairs a, b, etc

first, think need fill c represent missing values

in [341]: max_len = max(len(sublist) sublist in c) in [344]: sublist in c: ...: sublist.extend([np.nan] * (max_len - len(sublist))) in [345]: c out[345]: [[7, 11, 56, 45], [20, 21, 74, 12], [42, nan, nan, nan], [52, nan, nan, nan], [90, 213, 9, nan], [101, 34, 45, nan]]

then, convert numpy array, transpose, , pass dataframe constructor along columns.

in [288]: c = np.array(c) in [289]: df = pd.dataframe(data=c.t, columns=pd.multiindex.from_tuples(zip(a,b))) in [349]: df out[349]: 1 2 3 start end start end start end 0 7 20 42 52 90 101 1 11 21 nan nan 213 34 2 56 74 nan nan 9 45 3 45 12 nan nan nan nan

python pandas

No comments:

Post a Comment