Saturday, 15 February 2014

python - Merge overlapping numeric ranges into continuous ranges -



python - Merge overlapping numeric ranges into continuous ranges -

i trying merge range of genomic coordinates continuous ranges, additional alternative merging across gaps.

for example, if had genomic ranges [[0, 1000], [5, 1100]] want result [0, 1100]. if offset alternative set 100, , input [[0, 1000], [1090, 1000]] 1 time 1 time again want result [0, 1100].

i have implemented way of doing steps through alignments sequentially , tries merge on previous ending position , next starting position, fails because actual results have varying lengths. illustration have results [[138, 821],[177, 1158], [224, 905], [401, 1169]] in list sorted start positions. reply should [138, 1169] instead [[138, 1158], [177, 905], [224, 1169]]. need take more business relationship previous ending , next start, haven't found solution (preferably 1 isn't huge nest of if statements). have suggestions?

def overlap_alignments(align, gene, overlap): #make sure alignments sorted first chromosome start pos on chrom align = sorted(align, key = lambda x: (x[0], x[1])) merged = list() in xrange(1, len(align)): prv, nxt = align[i-1], align[i] if prv[0] == nxt[0] , prv[2] + overlap >= nxt[1]: start, end = prv[1], nxt[2] chrom = prv[0] merged.append([chrom, start, end, gene]) homecoming merged

well, how keeping track of every start , end , number of ranges each position belongs to?

def overlap_alignments(align, overlap): # create list of starts , ends stends = [ (a[0], 1) in align ] stends += [ (a[1] + overlap, -1) in align ] stends.sort(key=lambda x: x[0]) # should have list of starts , ends ordered position, # e.g. if ranges 5..10, 8..15, , 12..13, have # (5,1), (8,1), (10,-1), (12,1), (13,-1), (15,-1) # next, form cumulative sum of s = 0 cs = [] se in stends: s += se[1] cs.append((se[0], s)) # is, numbers above, (5,1), (8,2), (10,1), (12,2), (13,1), (15,0) # so, 5..8 belongs 1 range, 8..10 belongs 2 overlapping range, # 10..12 belongs 1 range, etc # we'll find contiguous ranges # when traverse through list of depths (number of overlapping ranges), new # range starts when before number of overlapping ranges has been 0 # range ends when new number of overlapping ranges 0 prevdepth = 0 start = 0 combined = [] pos, depth in cs: if prevdepth == 0: start = pos elif depth == 0 combined.append((start, pos-overlap)) prevdepth = depth homecoming combined

this easier draw explain. (and yes, cumulative sum written in shorter space, find clearer way.)

to explain graphically, lets take input ([5,10],[8,15],[12,13],[16,20]) , overlap=1.

.....xxxxxo.............. (5-10) ........xxxxxxxo......... (8-15) ............xo........... (12-13) ................xxxxo.... (16-20) .....1112221221111111.... number of ranges @ each position .....----------------.... number of ranges > 0 .....---------------..... overlap corrected (5-20)

python algorithm bioinformatics

No comments:

Post a Comment