python - Pandas DataFrame selection using text from another DataFrame -
a quick disclosure: come r background , switching pandas (running on python 3.3.3).
i select rows dataframe using text dataframe entry. it's elementry operation not around syntax.
for example, dataframe (sorry line split want create illustration clearer):
films = pandas.dataframe({'$title':[ "the godfather", "pulp fiction", "the godfather: part ii", "fight club"], '$director': [ "coppola, francis ford", "tarantino, quentin", "coppola, francis ford", "fincher, david"]})
if want select films created first director, "coppola, francis ford", command using is:
in [1]: director = films.iloc[[1]]["director"] in [2]: director 1 coppola, francis ford name: director, dtype: object in [3]: = films[ films["director"] == director ] valueerror: series lengths must match compare
if this:
in [4]: = films[ films["director"] == str(director) ]
i empty dataframe. what's going on here? seems i'm missing something.
ok, first of see made couple of style/semantics mistakes mutual r-to-python converts:
you don't need$
signs column names , makes column selection nicer can write films.director
if name 'director'
(it has valid python identifier syntactic sugar work) indexing in python starts @ 0, not 1, select 1st director films.director[0]
assuming removed $
signs dataframe definition, can select movies as:
in [16]: films[films['director'] == films['director'][0]] out[16]: director title 0 coppola, francis ford godfather 2 coppola, francis ford godfather: part ii
or cleaner films[films.director == films.director[0]]
.
using original dataframe can perform query with:
director = films.iloc[[1]]['$director'][1] films[films['$director'] == director]
one error first defined table '$director'
, queried 'director'
column name.
the [1]
in end necessary because indexed dataframe list [1]
, instead of value 1
, got series, ct zhu noticed. list indexing meant more selecting several arbitrary elements such films.iloc[[1, 3]]
. in case clearer write
director = films.iloc[1]['$director']
also, note still gets tarantino , not coppola.
python pandas
No comments:
Post a Comment