Saturday, 15 June 2013

combining ranges for pandas (NumPy? core python?) indexing -



combining ranges for pandas (NumPy? core python?) indexing -

i loading info of size comparable memory limits, conscious efficient indexing , not making copies. need work on columns 3:8 , 9: (also labeled), combining ranges not seem work. rearranging columns in underlying info needlessly costly (an io operation). referencing 2 dataframes , combining them sounds create copies. efficient way this?

import numpy np import pandas pd info = pd.read_stata('s:/data/controls/lasso.dta') x = pd.concat([data.iloc[:,3:8],data.iloc[:,9:888]])

by way, if read in half of info (a random half, even), help, 1 time again not open original info , save another, smaller re-create this.

import numpy np import pandas pd info = pd.read_stata('s:/data/controls/lasso.dta') cols = np.zeros(len(data.columns), np.dtype=bool) cols[3:8] = true cols[9:888] = true x = data.iloc[:, cols] del info

this still makes re-create (but one...). not seem possible homecoming view instead of re-create sort of shape (source).

another suggestion converting .dta file .csv file (howto). pandas read_csv much more flexible: can specify columns interested in (usecols), , how many rows read (nrows). unfortunately requires file copy.

python numpy pandas indexing

No comments:

Post a Comment