python - How to tokenize a line of text from a file -
suppose file shakespeare.txt contained single line. famously spoken juliet in romeo , juliet: "o romeo, romeo! wherefore fine art thou romeo?"
then running command $ shakesort should produce next output
art o romeo thou wherefore my code far:
def main(): s = scanner("shakespeare.txt") tokens = ("o romeo, romeo! wherefore fine art thou romeo?") str1 = s.readtoken() str2 = s.readtoken() str3 = s.readtoken() str4 = s.readtoken() str5 = s.readtoken() str6 = s.readtoken() str7 = s.readtoken() print(str1) print(str2) print(str3) print(str4) print(str5) print(str6) print(str7) s.close homecoming 0; main() my problem returns first 7 strings of entire file, rather token specified. how go specified 7 words total shakespeare.txt(which contains millions of words) without making new file , listing words?
something this:
uniqwords = {} open('shakespeare.txt') f: ln in f: words = ln.split() word in words: word = word.replace('?', '').replace('!', '').replace(',', '').lower() uniqwords.setdefault(word, 0) word in sorted(uniqwords.keys()): print word python
No comments:
Post a Comment