Sunday, 15 September 2013

python - Pandas DataFrame selection using text from another DataFrame -



python - Pandas DataFrame selection using text from another DataFrame -

a quick disclosure: come r background , switching pandas (running on python 3.3.3).

i select rows dataframe using text dataframe entry. it's elementry operation not around syntax.

for example, dataframe (sorry line split want create illustration clearer):

films = pandas.dataframe({'$title':[ "the godfather", "pulp fiction", "the godfather: part ii", "fight club"], '$director': [ "coppola, francis ford", "tarantino, quentin", "coppola, francis ford", "fincher, david"]})

if want select films created first director, "coppola, francis ford", command using is:

in [1]: director = films.iloc[[1]]["director"] in [2]: director 1 coppola, francis ford name: director, dtype: object in [3]: = films[ films["director"] == director ] valueerror: series lengths must match compare

if this:

in [4]: = films[ films["director"] == str(director) ]

i empty dataframe. what's going on here? seems i'm missing something.

ok, first of see made couple of style/semantics mistakes mutual r-to-python converts:

you don't need $ signs column names , makes column selection nicer can write films.director if name 'director' (it has valid python identifier syntactic sugar work) indexing in python starts @ 0, not 1, select 1st director films.director[0]

assuming removed $ signs dataframe definition, can select movies as:

in [16]: films[films['director'] == films['director'][0]] out[16]: director title 0 coppola, francis ford godfather 2 coppola, francis ford godfather: part ii

or cleaner films[films.director == films.director[0]].

using original dataframe can perform query with:

director = films.iloc[[1]]['$director'][1] films[films['$director'] == director]

one error first defined table '$director' , queried 'director' column name.

the [1] in end necessary because indexed dataframe list [1], instead of value 1, got series, ct zhu noticed. list indexing meant more selecting several arbitrary elements such films.iloc[[1, 3]]. in case clearer write

director = films.iloc[1]['$director']

also, note still gets tarantino , not coppola.

python pandas

No comments:

Post a Comment