Monday, 15 February 2010

nic - Hadoop Cluster datanode network error -



nic - Hadoop Cluster datanode network error -

i have little cluster cloudera hadoop installation. after few days, noticed there errors/dropped/frame when run ifconfig -a command. (from highlevel perspective, map cut down job run smoonthly without error , there no errors end user perspective, wondering if something, performance much better)

all nodes, including namenode, installed , configured same redhat kickstart server, next same recipe , "same". however, did not notice network errors on namenode , network errors exist on datanode consistently.

for example, namenode looks like:

namenode.datafireball.com | success | rc=0 >> eth4 link encap:ethernet hwaddr ... inet addr:10.0.188.84 bcast:10.0.191.255 mask:... inet6 addr: xxxfe56:5632/64 scope:link broadcast running multicast mtu:9000 metric:1 rx packets:11711470 errors:0 dropped:0 overruns:0 frame:0 tx packets:6195067 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 rx bytes:6548704769 (6.0 gib) tx bytes:12093046450 (11.2 gib)

data node:

datanode1.datafireball.com | success | rc=0 >> eth4 link encap:ethernet hwaddr ... inet addr:10.0.188.87 bcast:10.0.191.255 mask:... inet6 addr: xxxff24/64 scope:link broadcast running multicast mtu:9000 metric:1 rx packets:27474152 errors:0 dropped:36072 overruns:36072 frame:36072 tx packets:28905940 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 rx bytes:158509736560 (147.6 gib) tx bytes:180857576718 (168.4 gib)

i did stress test next michael's tutorial , can see errors increasing job goes. error left when first set up.

fyi, have 2 nic cards in 1 box, first 4 ports embedded nic card 03:00.0 ethernet controller: broadcom corporation netxtreme ii bcm5709 gigabit ethernet (rev 20) not using @ all, using 0e:00.0 ethernet controller: mellanox technologies mt26448 [connectx en 10gige, pcie 2.0 5gt/s] (rev b0) 10gb nic.

this output of firmware , general info nic card:

$ ethtool -i eth4 driver: mlx4_en version: 2.0 (dec 2011) firmware-version: 2.8.600 bus-info: 0000:0e:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no

i surprised find info node have network errors , namenode doesn't since have same set , configuration. can give me guidance?

hadoop nic ifconfig

No comments:

Post a Comment