Tuesday, 15 February 2011

python - Get a list of words from a string that are not in another list -



python - Get a list of words from a string that are not in another list -

i have long string named tekst (600 mb read file) , list of 11.000 words called nlwoorden . want have in tekst, not in nlwoorden.

belangrijk=[woord woord in tekst.split() if woord not in nlwoorden]

would produce want. obviously, takes long compute. there more efficient way?

thanks!

using set-based solution gives o(len(nlwoorden)) for whole thing. should take o(len(nlwoorden)) + o(len(tekst)) to create 2 sets.

so snippet you're looking 1 listed in comment:

belangrijk=list(set(tekst.split()) - set(nlwoorden))

(assuming want list 1 time again @ end)

python python-2.7

No comments:

Post a Comment