python - Putting website titles into an Excel spreadsheet -
i trying utilize beautifulsoup list website titles, , set them excel spreadsheet.
text file “c:\websites.txt” contains contents below:
www.dailynews.com www.dailynews.lk www.dailynews.co.zw www.gulf-daily-news.com www.dailynews.gov.bw the workout:
from bs4 import beautifulsoup import urllib2 import xlwt list_open = open('c:\\websites.txt') read_list = list_open.read() line_in_list = read_list.split('\n') websites in line_in_list: url = "http://" + websites page = urllib2.urlopen(url) soup = beautifulsoup(page.read()) site_title = soup.find_all("title") print site_title it works fine , generates site titles. when add together in below:
book = xlwt.workbook(encoding='utf-8', style_compression = 0) sheet = book.add_sheet('sheet1', cell_overwrite_ok = true) cor, lmn in enumerate(line_in_list): sheet.write (cor, 0, site_title) book.save("c:\\site_titles.xls") trying have them nicely input column of excel spread sheet, 1 one, doesn’t work.
the error seek save beautifulsoup object
exception: unexpected info type <class 'bs4.element.tag'> try write text value of object , file written fine
for cor, lmn in enumerate(line_in_list): sheet.write (cor, 0, site_title[0].text) write loop wrong, seek this: final script:
from bs4 import beautifulsoup import urllib2 import xlwt line_in_list = ['www.dailynews.com','www.elpais.com'] #get urls file book = xlwt.workbook(encoding='utf-8', style_compression = 0) sheet = book.add_sheet('sheet1', cell_overwrite_ok = true) cor,websites in enumerate(line_in_list): url = "http://" + websites page = urllib2.urlopen(url) soup = beautifulsoup(page.read()) site_title = soup.find_all("title") print site_title sheet.write (cor, 0, site_title[0].text) book.save("site_titles.xls") python beautifulsoup export-to-excel xlwt
No comments:
Post a Comment