Saturday, 15 January 2011

Parsing a tweet inside a csv column in Python -



Parsing a tweet inside a csv column in Python -

i trying extract hashtags in tweet. of tweets in 1 column in csv file. although, there resources on parsing strings , putting extracted hashtags list, haven't come across solution on how parse tweets stored in list or dictionary. here code:

with open('hash.csv', 'rb') f: reader = csv.reader(f, delimiter=',') line in reader: tweet = line[1:2] #this column contains tweets x in tweet: match = re.findall(r"#(\w+)", x) if match: print x

i predictably 'typeerror: expected string or buffer', because it's true, 'tweet' in case not string- list.

here research has taken me far:

parsing tweet extract hashtags array in python

http://www.tutorialspoint.com/python/python_reg_expressions.htm

so i'm iterating through match list , i'm still getting whole tweet , not hashtagged item. able strip hashtag away want strip everything hashtag.

with open('hash.csv', 'rb') f: reader = csv.reader(f, delimiter=',') line in reader: tweet = line[1:2] print tweet x in tweet: match = re.split(r"#(\w+)", x) hashtags = [i in tweet if match]

actually, problem syntax problem. calling tweet = line[1:2]. in python, says 'take piece 1 - 2', logically want. unfortunately, returns reply list -- end [tweet] instead of tweet!

try changing line tweet = line[1] , see if fixes problem.

on separate note, typo on part, think might want check indentation -- think should like

for line in reader: tweet = line[1:2] #this column contains tweets x in tweet: match = re.findall(r"#(\w+)", x) if match: print x

unless i'm misunderstanding logic.

python csv twitter

No comments:

Post a Comment