Parsing a tweet inside a csv column in Python -
i trying extract hashtags in tweet. of tweets in 1 column in csv file. although, there resources on parsing strings , putting extracted hashtags list, haven't come across solution on how parse tweets stored in list or dictionary. here code:
with open('hash.csv', 'rb') f: reader = csv.reader(f, delimiter=',') line in reader: tweet = line[1:2] #this column contains tweets x in tweet: match = re.findall(r"#(\w+)", x) if match: print x
i predictably 'typeerror: expected string or buffer', because it's true, 'tweet' in case not string- list.
here research has taken me far:
parsing tweet extract hashtags array in python
http://www.tutorialspoint.com/python/python_reg_expressions.htm
so i'm iterating through match list , i'm still getting whole tweet , not hashtagged item. able strip hashtag away want strip everything hashtag.
with open('hash.csv', 'rb') f: reader = csv.reader(f, delimiter=',') line in reader: tweet = line[1:2] print tweet x in tweet: match = re.split(r"#(\w+)", x) hashtags = [i in tweet if match]
actually, problem syntax problem. calling tweet = line[1:2]
. in python, says 'take piece 1 - 2', logically want. unfortunately, returns reply list -- end [tweet] instead of tweet!
try changing line tweet = line[1]
, see if fixes problem.
on separate note, typo on part, think might want check indentation -- think should like
for line in reader: tweet = line[1:2] #this column contains tweets x in tweet: match = re.findall(r"#(\w+)", x) if match: print x
unless i'm misunderstanding logic.
python csv twitter
No comments:
Post a Comment