Friday, 15 May 2015

haskell - Lazy ByteString : memory exploding in certain cases -



haskell - Lazy ByteString : memory exploding in certain cases -

below have 2 seemingly functionally equivalent programs. first memory remains constant, whereas sec memory explodes (using ghc 7.8.2 & bytestring-0.10.4.0 in ubuntu 14.04 64-bit):

non-exploding :

--noexplode.hs --ghc -o3 noexplode.hs module main import data.bytestring.lazy bl import data.bytestring.lazy.char8 blc num = 1000000000 bytenull = blc.pack "" countdatapoint arg sum | arg == bytenull = sum | otherwise = countdatapoint (bl.tail arg) (sum+1) test1 = bl.last $ bl.take num $ blc.cycle $ blc.pack "abc" test2 = countdatapoint (bl.take num $ blc.cycle $ blc.pack "abc") 0 main = print test1 print test2

exploding :

--explode.hs --ghc -o3 explode.hs module main import data.bytestring.lazy bl import data.bytestring.lazy.char8 blc num = 1000000000 bytenull = blc.pack "" countdatapoint arg sum | arg == bytenull = sum | otherwise = countdatapoint (bl.tail arg) (sum+1) longbytestr = bl.take num $ blc.cycle $ blc.pack "abc" test1 = bl.last $ longbytestr test2 = countdatapoint (bl.take num $ blc.cycle $ blc.pack "abc") 0 main = print test1 print test2

additional details :

the difference inexplode.hs have taken bl.take num $ blc.cycle $ blc.pack "abc" out of definition of test1, , assigned own value longbytestr.

strangely if comment out either print test1 or print test2 in explode.hs (but not both), programme not explode.

is there reason memory exploding in explode.hs , not in noexplode.hs, , why exploding programme (explode.hs) requires both print test1 , print test2 in order exlode?

why ghc performs mutual look elimination in 1 case, not in other? knows. maybe mutual expressions killed inlining. depends on internal implementation.

regarding -ddump-simp, see question: reading ghc core

i reproduced ghc-7.8.2. performs mutual look elimination. can check output of -ddump-simpl. creating 1 lazy bytestring.

in first version create 2 lazy bytestrings. print test1 forces first one, garbage collected on fly because nobody else uses it. same print test2 -- forces sec bytestring, , gc'ed on fly.

in sec version create 1 lazy bytestring. print test1 forces it, can't gc'ed because needed print test2. result, after first print have entire bytestring loaded memory.

if remove 1 print, bytestring gc'ed on fly again. because not used anywhere else.

ps. "gc'ed on fly" means: print takes first chunk , outputs stdout. chunk becomes available gc. prints takes sec chunk, etc...

haskell lazy-evaluation

No comments:

Post a Comment