combining ranges for pandas (NumPy? core python?) indexing -
i loading info of size comparable memory limits, conscious efficient indexing , not making copies. need work on columns 3:8 , 9: (also labeled), combining ranges not seem work. rearranging columns in underlying info needlessly costly (an io operation). referencing 2 dataframes , combining them sounds create copies. efficient way this?
import numpy np import pandas pd info = pd.read_stata('s:/data/controls/lasso.dta') x = pd.concat([data.iloc[:,3:8],data.iloc[:,9:888]])
by way, if read in half of info (a random half, even), help, 1 time again not open original info , save another, smaller re-create this.
import numpy np import pandas pd info = pd.read_stata('s:/data/controls/lasso.dta') cols = np.zeros(len(data.columns), np.dtype=bool) cols[3:8] = true cols[9:888] = true x = data.iloc[:, cols] del info
this still makes re-create (but one...). not seem possible homecoming view instead of re-create sort of shape (source).
another suggestion converting .dta
file .csv
file (howto). pandas read_csv
much more flexible: can specify columns interested in (usecols
), , how many rows read (nrows
). unfortunately requires file copy.
python numpy pandas indexing
No comments:
Post a Comment