python - Parsing JS with Beautiful soup -
i have page parsed beautiful soup. there have js code :
<script type="text/javascript"> var utag_data = { customer_id : "_phl2883198554", customer_type : "new", loyalty_id : "n", declined_loyalty_interstitial : "false", site_version : "desktop site", site_currency: "de_de_euro", site_region: "uk", site_language: "en-gb", customer_address_zip : "", customer_email_hash : "", referral_source : "", page_type : "product", product_category_name : ["lingerie"], product_category_id :[jquery("meta[name=defaultparent]").attr("content")], product_id : ["5741462261401"], product_image_url : ["http://images.urbanoutfitters.com/is/image/urbanoutfitters/5741462261401_001_b?$detailmain$"], product_brand : ["pretty polly"], product_selling_price : ["20.0"], promo_id : "6", product_referral : ["womens-shapewear-lingerie-solutions-eu"], product_name : ["pretty polly shape tummy shaping camisole"], is_online_only : true, is_back_in_stock : false } </script> how can values input? should work illustration text? mean write variable , split , take data?
thanks
once have text of script via
js_text = soup.find('script', type="text/javascript").text for example. can utilize regex find data, i'm sure there easier way regex shouldn't hard well.
import re regex = re.compile('\n^(.*?):(.*?)$|,', re.multiline) #compile regex js_text = re.findall(regex, js_text) # find first item @ new line : , 2nd item @ : end of line or , js_text = [jt.strip() jt in js_text] # strip away of white space. this homecoming list of names , values in name|value|name2|value2... order can mess around or convert dictionary later on.
python web-scraping html-parsing beautifulsoup
No comments:
Post a Comment