Scripting (bash currently) - check various process status on a cluster of other hosts quickly Scripting (bash currently) - check various process status on a cluster of other hosts quickly hadoop hadoop

Scripting (bash currently) - check various process status on a cluster of other hosts quickly


You should use Ganglia or Ambari to monitor large clusters. They are free and open source. They have monitoring as well as alerting capabilities based up on thresholds.


There are utilities like pdsh (parallel distributed shell) https://code.google.com/p/pdsh/wiki/UsingPDSH

This can be used to run process checks in parallel on many nodes.

Parallel SSH was archived (read-only) in Google Code. For more up-to-date releases see https://github.com/pkittenis/parallel-ssh .

Another option is Fabric:http://www.fabfile.org/


Found a working solution without scope creeping into a major project;

Instead of going to the well each time for getting process status on node via SSH; grab the ps ax once on every node then assign to local variable. Then interrogate the variable each time for current process status.

Instead of doing (amount of nodes X amount of services) = SSH connections; now it only does (amount of nodes) = SSH connections.

From there; I may background / fork each SSH...

fun_grabps () {psout=`ssh $1 "ps ax"`}fun_zookeeper () {zup=`echo $psout | grep -v grep | grep zoo | wc -l`  if [ $zup -gt 0 ]; then    zoo=online  else    zoo="_"  fi}