Friday, 15 July 2011

python - re.compile not obeying case of text when using with beautifulsoup -



python - re.compile not obeying case of text when using with beautifulsoup -

i'm using beautifulsoup , looping through series of li objects , 2 causing me issues next two:

<li><span class="prefix">teams</span>6</li> <li><span class="prefix">new teams</span>4</li>

i'm matching based on .find seen below:

if newdetail.find(text=re.compile("teams")):

however reason re.compile registering each of li objects under if statement, want create case sensitive finds following:

<li><span class="prefix">teams</span> 6</li>

anyone got ideas on how solve issue?

the problem html im parsing doesnt have same html parts

i'm not sure if mean info need not in lists , spans, or what, here's how parsed info , extracted totals wanted.

from bs4 import beautifulsoup page_filename = "tester.html" html_file = open(page_filename, 'r').read() soup = beautifulsoup(html_file) lists = soup.find_all('li') item in lists: span = item.find('span') if "teams" in span.string: span.replacewith('') print item.text

if teams , total in line not in lists or spans, or not consistently associated in same way within line, have problems getting want. ideally determine patterns team , total can found with, utilize bs4's built-in methods find matches, , utilize regex rest of way if needed.

python regex beautifulsoup

No comments:

Post a Comment