Mar 27th, 2009

… and suddenly four nodes stopped accepting ssh connections, with each attempted connection resulting to yet another process accumulating (which is exactly what happened in the past with n0008). Tried everything I could think of, got nowhere. Finally, gave-up and risked with /etc/init.d/nfs restart, and voila: all accumulated jobs disappeared and the nodes are back to normal. But are they? What NFS does when it is restarted with with several open file descriptor over it ? We will have to wait and see whether anything strange appears in the trajectories during their analyses.

2009/03/27 21:44

Mar 25th, 2009

Thunderstorm caused momentary power failure which killed the (non_UPSed) nodes 7 & 8. Node n0008 came back immediately. N0007 was left in a miserable non-responding state. The moral is that we need a fourth UPS box.

⇒ Which we got and installed on Mar 26th.

2009/03/26 13:00

<< Newer entries | Older entries >>

The full maintenance archive is kept here

…and finally, The infamous MBG's Power Failure Log

about/maintenance.txt · Last modified: 2011/01/31 17:56 (external edit)