Friday, 15 March 2013

python - Putting website titles into an Excel spreadsheet -



python - Putting website titles into an Excel spreadsheet -

i trying utilize beautifulsoup list website titles, , set them excel spreadsheet.

text file “c:\websites.txt” contains contents below:

www.dailynews.com www.dailynews.lk www.dailynews.co.zw www.gulf-daily-news.com www.dailynews.gov.bw

the workout:

from bs4 import beautifulsoup import urllib2 import xlwt list_open = open('c:\\websites.txt') read_list = list_open.read() line_in_list = read_list.split('\n') websites in line_in_list: url = "http://" + websites page = urllib2.urlopen(url) soup = beautifulsoup(page.read()) site_title = soup.find_all("title") print site_title

it works fine , generates site titles. when add together in below:

book = xlwt.workbook(encoding='utf-8', style_compression = 0) sheet = book.add_sheet('sheet1', cell_overwrite_ok = true) cor, lmn in enumerate(line_in_list): sheet.write (cor, 0, site_title) book.save("c:\\site_titles.xls")

trying have them nicely input column of excel spread sheet, 1 one, doesn’t work.

the error seek save beautifulsoup object

exception: unexpected info type <class 'bs4.element.tag'>

try write text value of object , file written fine

for cor, lmn in enumerate(line_in_list): sheet.write (cor, 0, site_title[0].text)

write loop wrong, seek this: final script:

from bs4 import beautifulsoup import urllib2 import xlwt line_in_list = ['www.dailynews.com','www.elpais.com'] #get urls file book = xlwt.workbook(encoding='utf-8', style_compression = 0) sheet = book.add_sheet('sheet1', cell_overwrite_ok = true) cor,websites in enumerate(line_in_list): url = "http://" + websites page = urllib2.urlopen(url) soup = beautifulsoup(page.read()) site_title = soup.find_all("title") print site_title sheet.write (cor, 0, site_title[0].text) book.save("site_titles.xls")

python beautifulsoup export-to-excel xlwt

No comments:

Post a Comment