Breeding: python - Parsing JS with Beautiful soup -

Friday, 15 July 2011

python - Parsing JS with Beautiful soup -

i have page parsed beautiful soup. there have js code :

<script type="text/javascript">      var utag_data = {             customer_id   : "_phl2883198554",              customer_type : "new",             loyalty_id : "n",             declined_loyalty_interstitial : "false",             site_version  : "desktop site",             site_currency: "de_de_euro",             site_region: "uk",             site_language: "en-gb",               customer_address_zip : "",             customer_email_hash :  "",             referral_source :  "",             page_type : "product",             product_category_name : ["lingerie"],             product_category_id :[jquery("meta[name=defaultparent]").attr("content")],             product_id : ["5741462261401"],             product_image_url : ["http://images.urbanoutfitters.com/is/image/urbanoutfitters/5741462261401_001_b?$detailmain$"],             product_brand : ["pretty polly"],             product_selling_price : ["20.0"],             promo_id : "6",             product_referral : ["womens-shapewear-lingerie-solutions-eu"],             product_name : ["pretty polly shape tummy shaping camisole"],             is_online_only : true,             is_back_in_stock : false } </script>

how can values input? should work illustration text? mean write variable , split , take data?

thanks

once have text of script via

js_text = soup.find('script', type="text/javascript").text

for example. can utilize regex find data, i'm sure there easier way regex shouldn't hard well.

import re regex =  re.compile('\n^(.*?):(.*?)$|,', re.multiline) #compile regex js_text = re.findall(regex, js_text) #  find first item @ new line : , 2nd item @ : end of line or ,  js_text = [jt.strip() jt in js_text] #  strip away of white space.

this homecoming list of names , values in name|value|name2|value2... order can mess around or convert dictionary later on.

python web-scraping html-parsing beautifulsoup

Breeding

Friday, 15 July 2011

python - Parsing JS with Beautiful soup -

No comments:

Post a Comment