bash - GNU Parallel - which job failed? -
i'm running job on several different servers (up 25) using gnu parallel.
the shell script implements does:
parallel --tag --nonall -s $some_list_of_servers "some_command" state=$? echo -n "result: " if [ "$state" -eq "0" ] echo "all jobs successful" else echo "$state jobs failed" fi homecoming $state where some_list_of_servers array, , install_command is, instance, git fetch.
what want lot more info how many jobs failed. want know command, , server, failed.
i've been through man page, , google, , can't find switch(es) i'm looking for.
any help gratefully appreciated.
weedom
edit in response reply 1:
i tried that, , odd happening.
weedom@host1: ~/$ parallel --tag --nonall -j8 --joblog test.log -s host1,host2 uptime host2 10:41:17 36 days, 20:45, 1 user, load average: 0.00, 0.00, 0.00 host1 10:41:17 22:34, 3 users, load average: 0.06, 0.11, 0.04 weedom@host1: ~/$ cat test.log seq host starttime runtime send receive exitval signal command 1 host1 1403689277.067 0.519999980926514 0 0 0 0 uptime no matter how many hosts add together -s, seem lastly 1 finish test.log
i've added follow-up question here: gnu parallel - --joblog logging lastly job
you want utilize --joblog option, shown in docs. gnu parallel allows restarting failed ones --resume-failed.
eg, running script:
#!/bin/bash jobmod=$(( $1 % 3 )) if [ $jobmod == 0 ] exit 1 else exit 0 fi on several hosts this:
$ seq 1 10 | parallel --joblog out.log -s "srv01,srv02,srv03,srv04" ./failjob gives
$ more out.log seq host starttime runtime send receive exitval signal command 1 srv01 1403542514.713 0.267 0 0 0 0 ./failjob 1 3 srv02 1403542514.717 0.266 0 0 1 0 ./failjob 3 4 srv03 1403542514.719 0.266 0 0 0 0 ./failjob 4 2 srv04 1403542514.715 0.397 0 0 0 0 ./failjob 2 5 srv01 1403542514.983 0.231 0 0 0 0 ./failjob 5 6 srv02 1403542514.986 0.368 0 0 1 0 ./failjob 6 7 srv03 1403542514.988 0.388 0 0 0 0 ./failjob 7 8 srv04 1403542515.121 0.437 0 0 0 0 ./failjob 8 9 srv01 1403542515.221 0.343 0 0 1 0 ./failjob 9 10 srv02 1403542515.356 0.388 0 0 0 0 ./failjob 10 bash parallel-processing
No comments:
Post a Comment