python - Pandas DataFrame selection using text from another DataFrame -
a quick disclosure: come r background , switching pandas (running on python 3.3.3).
i select rows dataframe using text dataframe entry. it's elementry operation not around syntax.
for example, dataframe (sorry line split want create illustration clearer):
films = pandas.dataframe({'$title':[ "the godfather", "pulp fiction", "the godfather: part ii", "fight club"], '$director': [ "coppola, francis ford", "tarantino, quentin", "coppola, francis ford", "fincher, david"]}) if want select films created first director, "coppola, francis ford", command using is:
in [1]: director = films.iloc[[1]]["director"] in [2]: director 1 coppola, francis ford name: director, dtype: object in [3]: = films[ films["director"] == director ] valueerror: series lengths must match compare if this:
in [4]: = films[ films["director"] == str(director) ] i empty dataframe. what's going on here? seems i'm missing something.
ok, first of see made couple of style/semantics mistakes mutual r-to-python converts:
you don't need$ signs column names , makes column selection nicer can write films.director if name 'director' (it has valid python identifier syntactic sugar work) indexing in python starts @ 0, not 1, select 1st director films.director[0] assuming removed $ signs dataframe definition, can select movies as:
in [16]: films[films['director'] == films['director'][0]] out[16]: director title 0 coppola, francis ford godfather 2 coppola, francis ford godfather: part ii or cleaner films[films.director == films.director[0]].
using original dataframe can perform query with:
director = films.iloc[[1]]['$director'][1] films[films['$director'] == director] one error first defined table '$director' , queried 'director' column name.
the [1] in end necessary because indexed dataframe list [1], instead of value 1, got series, ct zhu noticed. list indexing meant more selecting several arbitrary elements such films.iloc[[1, 3]]. in case clearer write
director = films.iloc[1]['$director'] also, note still gets tarantino , not coppola.
python pandas
No comments:
Post a Comment