Thursday, 15 April 2010

Hadoop: HDFS File Writes & Reads -



Hadoop: HDFS File Writes & Reads -

i have basic question regarding file writes , reads in hdfs.

for example, if writing file, using default configurations, hadoop internally has write each block 3 info nodes. understanding each block, first client writes block first info node in pipeline inform sec , on. 1 time 3rd info node receives block, provides acknowledgement info node 2 , client through info node 1. only after receiving acknowledgement block, write considered successful , client proceeds write next block.

if case, isn't time taken write each block more traditional file write due -

the replication factor (default 3) , the write process happening sequentially block after block.

please right me if wrong in understanding. also, next questions below:

my understanding file read / write in hadoop doesn't have parallelism , best can perform same traditional file read or write (i.e. if replication set 1) + overhead involved in distributed communication mechanism. parallelism provided during info processing phase via map reduce, not during file read / write client.

though above explanation of file write correct, a datanode can read , write info simultaneously. hdfs architecture guide:

a datanode can receiving info previous 1 in pipeline , @ same time forwarding info next 1 in pipeline

a write operation takes more time on traditional file scheme (due bandwidth issues , general overhead) not much 3x (assuming replication factor of 3).

hadoop hdfs

No comments:

Post a Comment