python 2.7 - Convert 2D numpy.ndarray to pandas.DataFrame -
i have pretty big numpy.ndarray
. array of arrays. want convert pandas.dataframe
. want in code below
from pandas import dataframe cache1 = dataframe([{'id1': 'abc1234'}, {'id1': 'ncmn7838'}]) cache2 = dataframe([{'id2': 3276827}, {'id2': 98567498}, {'id2': 38472837}]) ndarr = [[4.3, 5.6, 6.7], [3.2, 4.5, 2.1]] arr = [] idx, in enumerate(ndarr): id1 = cache1.ix[idx].id1 idx2, val in enumerate(i): id2 = cache2.ix[idx2].id2 if val > 0: arr.append(dict(id1=id1, id2=id2, value=val)) df = dataframe(arr) print(df.head())
i mapping index of outer array , inner array index of 2 dataframe
s ids. cache1
, cache2
pandas.dataframe
. each has ~100k
rows.
this takes really long, few hours complete. there way can speed up?
i suspect ndarr
, if expressed 2d np.array
, has shape of n,m
, n
length of cache1.id1
, m
length of cache2.id2
. , lastly entry in cache2, should {'id2': 38472837}
instead of {'id': 38472837}
. if so, next simple solution may needed:
in [30]: df=pd.dataframe(np.array(ndarr).ravel(), index=pd.multiindex.from_product([cache1.id1.values, cache2.id2.values],names=['idx1', 'idx2']), columns=['val']) in [33]: print df.reset_index() idx1 idx2 val 0 abc1234 3276827 4.3 1 abc1234 98567498 5.6 2 abc1234 38472837 6.7 3 ncmn7838 3276827 3.2 4 ncmn7838 98567498 4.5 5 ncmn7838 38472837 2.1 [6 rows x 3 columns]
actually, think, maintain having multiindex
may improve idea.
python-2.7 pandas multidimensional-array
No comments:
Post a Comment