Thursday, 15 March 2012

regex - Retreiving text from Pattern1 to Pattern2 - Python -



regex - Retreiving text from Pattern1 to Pattern2 - Python -

i have input file below

pattern1 ptr1 blah blah blah needthis blah blah blah thisoneaswell blah blah blah pattern2 pattern1 ptr2 blah blah blah needthis blah blah blah thisoneaswell blah blah blah pattern2 ............................ ............................ pattern1 ptrn blah blah needthis blah blah blah thisoneaswell blah blah blah pattern2

i need function homecoming first column entries pattern1 pattern2 below,

ptr1 needthis thisoneaswell ptr2 needthis thisoneaswell ...................... ...................... ptrn needthis thisoneaswell

ptr1 , ptr2 ...... ptrn each different texts. pattern1 & pattern2 different consistently nowadays in file.

how can accomplish in python?

i still beginner in python , trying accomplish utilize re.findall() not getting desired o/p:

def retreive(): file = open("filename","r") string = re.findall(r"pattern1",file.read()) print string

you nest 2 regexes:

txt='''\ pattern1 ptr1 blah blah blah needthis1 blah blah blah thisoneaswell1 blah blah blah pattern2 pattern1 ptr2 blah blah blah needthis2 blah blah blah thisoneaswell2 blah blah blah pattern2 ............................ ............................ pattern1 ptrn blah blah needthisn blah blah blah thisoneaswelln blah blah blah pattern2''' import re m in re.finditer(r'^pattern1\s*(.*?)(?=^pattern2)', txt, re.m | re.s): print re.findall(r'(^\w+)', m.group(1), re.m)

prints:

['ptr1', 'needthis1', 'thisoneaswell1'] ['ptr2', 'needthis2', 'thisoneaswell2'] ['ptrn', 'needthisn', 'thisoneaswelln']

edit 1

if using file fit in memory:

with open(fn) f: txt=f.read() m in re.finditer(r'^pattern1\s*(.*?)(?=^pattern2)', txt, re.m | re.s): print re.findall(r'(^\w+)', m.group(1), re.m)

use mmap larger files won't fit in memory.

edit 2

just append results list after joining string:

with open(fn) f: results=[] txt=f.read() m in re.finditer(r'^pattern1\s*(.*?)(?=^pattern2)', txt, re.m | re.s): results.append('\n'.join(re.findall(r'(^\w+)', m.group(1), re.m)) print '\n===\n'.join(results)

python regex python-2.7

No comments:

Post a Comment