Monday, 15 August 2011

Python dataframes -



Python dataframes -

i have dataframe (df) , trying append info specific row

index fruit rank 0 banana 1 1 apple 2 2 mango 3 3 melon 4

the goal compare fruit @ rank 1 each rank , append value. i'm using difflib.sequencematcher create comparison. right i'm able append df end appending same value each row. i'm struggling loop , append. pointers much appreciated.

here of code:

new_entry = df[(df.rank ==1)] new_fruit = new_entry['fruit'] prev_entry = df[(df.rank ==2)] prev_fruit = prev_entry['fruit'] similarity_score = difflib.sequencematcher(none, str(new_fruit).lower(), str(prev_fruit).lower()).ratio() df['similarity_score'] = similarity_score

the result this:

index fruit rank similarity_score 0 banana 1 0.3 1 apple 2 0.3 2 mango 3 0.3 3 melon 4 0.3

the desired result is:

index fruit rank similarity_score 0 banana 1 n/a 1 apple 2 0.4 2 mango 3 0.5 3 melon 4 0.6

thanks.

this doesn't give similarity score order want, calculates sequencematcher ratio rank 1 value ('banana') , each row , adds column.

import pandas pd import difflib df = pd.dataframe({'fruit': ['banana', 'apple', 'mango', 'melon'], 'rank': [1, 2, 3, 4]}) top = df['fruit'][df.rank == 1][0] df['similarity_score'] = df['fruit'].apply(lambda x: difflib.sequencematcher( none, top, x).ratio())

python python-2.7 pandas difflib

No comments:

Post a Comment