python - Is there a non-copying constructor for a pandas DataFrame -
reposted https://groups.google.com/forum/#!topic/pydata/5mhuatnal5g
it seems when creating dataframe structured array info copied? similar results if info instead dictionary of numpy arrays.
is there anyway create dataframe structured array or similar without copying or checking?
in [44]: sarray = randn(1e7,10).view([(name, float) name in 'abcdefghij']).squeeze() in [45]: n in [10,100,1000,10000,100000,1000000,10000000]: ...: s = sarray[:n] ...: %timeit z = pd.dataframe(s) ...: 1000 loops, best of 3: 830 µs per loop 1000 loops, best of 3: 834 µs per loop 1000 loops, best of 3: 872 µs per loop 1000 loops, best of 3: 1.33 ms per loop 100 loops, best of 3: 15.4 ms per loop 10 loops, best of 3: 161 ms per loop 1 loops, best of 3: 1.45 s per loop
thanks, dave
this definition coerce dtypes single dtype (e.g. float64
in case). no way around that. view on original array. note helps construction. operations tend create , homecoming copies.
in [44]: s = sarray[:1000000]
original method
in [45]: %timeit dataframe(s) 10 loops, best of 3: 107 ms per loop
coerce ndarray. pass in copy=false
(this doesn't impact structured array, plain single dtyped ndarray).
in [47]: %timeit dataframe(s.view(np.float64).reshape(-1,len(s.dtype.names)),columns=s.dtype.names,copy=false) 100 loops, best of 3: 3.3 ms per loop in [48]: result = dataframe(s.view(np.float64).reshape(-1,len(s.dtype.names)),columns=s.dtype.names,copy=false) in [49]: result2 = dataframe(s) in [50]: result.equals(result2) out[50]: true
note both dataframe.from_dict
, dataframe.from_records
re-create this. pandas keeps like-dtyped ndarrays single ndarray. , expensive np.concatenate
aggregate, done under hood. using view avoids issue.
i suppose default structrured array if passed dtypes same. have inquire why using structured array in first place. (obviously name-access..but reason?)
python pandas
No comments:
Post a Comment