Saturday, 15 September 2012

python - Memory-efficent way to iterate over part of a large file -



python - Memory-efficent way to iterate over part of a large file -

i avoid reading files this:

with open(file) f: list_of_lines = f.readlines()

and utilize type of code instead.

f = open(file) line in file: #do

unless have iterate on few lines in file (and know lines are) think easier take slices of list_of_lines. has come bite me. have huge file (reading memory not possible) don't need iterate on of lines few of them. have code completed finds first line , finds how many lines after need edit. don't have nay thought how write loop.

n = #grep number of lines start = #pattern match start line f=open('big_file') #some loop on f start o start + n #edit lines

edit: title may have lead debate rather answer.

if understand question correctly, problem you're encountering storing all lines of text in list , taking piece uses much memory. want read file line-by-line, while ignoring set of lines (say, lines [17,34) example).

try using enumerate maintain track of line number you're on iterate through file. here generator-based approach uses yield output interesting lines 1 @ time:

def read_only_lines(f, start, finish): ii,line in enumerate(f): if ii>=start , ii<finish: yield line elif ii>=finish: homecoming f = open("big text file.txt", "r") line in read_only_lines(f, 17, 34): print line

this read_only_lines function reimplements itertools.islice standard library, utilize create more compact implementation:

from itertools import islice line in islice(f, 17, 34): print line

if want capture lines of involvement in list rather generator, cast them list:

from itertools import islice lines_of_interest = list( islice(f, 17, 34) ) do_something_awesome( lines_of_interest ) do_something_else( lines_of_interest )

python iteration large-files

No comments:

Post a Comment