Wednesday, 15 June 2011

c# - Working with large data sets and memory limitations -



c# - Working with large data sets and memory limitations -

i working code comparing big collections of objects , storing matches.

unsurprisingly, have encountered system.outofmemoryexception

how can go solving this?

during comparing should writing memory, have else write results disk/rdbms. i.e. create buffer.

in fact depend on environment, particularly on operation scheme either x86 or x64. check more details here: memory in depth

1.you have advanced scenario streaming need. exact solution depends on pulling data. in case of pulling info sql database can utilize streaming sqldatareader tightly coupled async in case, sample code:

using (sqldatareader reader = await command.executereaderasync(commandbehavior.sequentialaccess)) { if (await reader.readasync()) { if (!(await reader.isdbnullasync(0))) { using (var datastream = reader.getstream(0)) { //process info } } } }

this link reveal bit more details: retrieving big info set. however, maintain in mind such approach forces utilize async in connection string deal async code, additional complexity, when want cover specs/tests.

2.yet approach batching, i.e. buffering info acceptable limit , exposing batch consuming code, after go on fetching new batch of info unless loaded, sample code:

while(true) { int count = 0; bool canread = reader.read(); while(canread) { canread = reader.read(); count++; if (count >= batchsize) break; } if (!canread) break; }

the size of batch can calculate estimating size of 1 row of info (based on table schema, msdn article) or create configurable , play around suitable value. main advantage of approach need minimal changes in code , code remains synchronous. disadvantage have maintain either active connection or open new connection every time , instead maintain records have read , still need fetch.

finally both options forcefulness take care more advanced questions, such should if part of info fetched , after connection lost (need fail-over mechanism), ability cancel long-running retrieving operation after timeout, etc.

to conclude, if not want handle additional complexity big info introduces, delegate task whatever there available on market, i.e. database or 3rd party framework. if sense team have plenty skills this, go ahead , implement - maintain result of comparing in disk file, utilize in-memory cache or force info database

c# out-of-memory large-data

No comments:

Post a Comment