Feb 14th, 2009

  • First long job stopped after ~38 nsec. Start heating for next one.
  • Slurm woes: communication problems (as always).
  • n0002 must be cursed: two cores disappeared never to be seen again (and dmesg contains the line SMP: Allowing 4 CPUs, 2 hotplug CPUs. Looks like hardware again.
2009/02/14 22:14 · 1 Comment

Feb 9th, 2009

A/C unit installed. Initially recorded temperatures quite good (that is, low). Later in the evening (after equilibration) not brilliant. Run a temperature monitoring script to see how it goes. Needless to say that nefeli's A/C was immediately switched-off with marked results:

Motherboard temperature for nefeli after A/C was turned off

Hopefully, even if the A/C in norma fails, the crontab should save the day:

crontab script for emergency cluster shutdown

→ Read more...

2009/02/09 23:59 · 0 Comments

<< Newer entries | Older entries >>

The full maintenance archive is kept here

…and finally, The infamous MBG's Power Failure Log

about/maintenance.txt · Last modified: 2011/01/31 17:56 (external edit)