Monday, 15 February 2010

python - Is there a non-copying constructor for a pandas DataFrame -



python - Is there a non-copying constructor for a pandas DataFrame -

reposted https://groups.google.com/forum/#!topic/pydata/5mhuatnal5g

it seems when creating dataframe structured array info copied? similar results if info instead dictionary of numpy arrays.

is there anyway create dataframe structured array or similar without copying or checking?

in [44]: sarray = randn(1e7,10).view([(name, float) name in 'abcdefghij']).squeeze() in [45]: n in [10,100,1000,10000,100000,1000000,10000000]: ...: s = sarray[:n] ...: %timeit z = pd.dataframe(s) ...: 1000 loops, best of 3: 830 µs per loop 1000 loops, best of 3: 834 µs per loop 1000 loops, best of 3: 872 µs per loop 1000 loops, best of 3: 1.33 ms per loop 100 loops, best of 3: 15.4 ms per loop 10 loops, best of 3: 161 ms per loop 1 loops, best of 3: 1.45 s per loop

thanks, dave

this definition coerce dtypes single dtype (e.g. float64 in case). no way around that. view on original array. note helps construction. operations tend create , homecoming copies.

in [44]: s = sarray[:1000000]

original method

in [45]: %timeit dataframe(s) 10 loops, best of 3: 107 ms per loop

coerce ndarray. pass in copy=false (this doesn't impact structured array, plain single dtyped ndarray).

in [47]: %timeit dataframe(s.view(np.float64).reshape(-1,len(s.dtype.names)),columns=s.dtype.names,copy=false) 100 loops, best of 3: 3.3 ms per loop in [48]: result = dataframe(s.view(np.float64).reshape(-1,len(s.dtype.names)),columns=s.dtype.names,copy=false) in [49]: result2 = dataframe(s) in [50]: result.equals(result2) out[50]: true

note both dataframe.from_dict , dataframe.from_records re-create this. pandas keeps like-dtyped ndarrays single ndarray. , expensive np.concatenate aggregate, done under hood. using view avoids issue.

i suppose default structrured array if passed dtypes same. have inquire why using structured array in first place. (obviously name-access..but reason?)

python pandas

No comments:

Post a Comment