python - Get a list of words from a string that are not in another list -
i have long string named tekst (600 mb read file) , list of 11.000 words called nlwoorden . want have in tekst, not in nlwoorden.
belangrijk=[woord woord in tekst.split() if woord not in nlwoorden] would produce want. obviously, takes long compute. there more efficient way?
thanks!
using set-based solution gives o(len(nlwoorden)) for whole thing. should take o(len(nlwoorden)) + o(len(tekst)) to create 2 sets.
so snippet you're looking 1 listed in comment:
belangrijk=list(set(tekst.split()) - set(nlwoorden)) (assuming want list 1 time again @ end)
python python-2.7
No comments:
Post a Comment