To get the job state and what nodes your process is running on, use the command squeue -j <jobID> . Here jobID is the id number for the process you are looking to debug.
Example Using squeue
maya$ squeue -j 4583521 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 4583521 batch matlab jcorn2 R 17:18 1 n67 |
You can check on the status of the process by using ssh to get into each of the nodes your process is running on. Once successfully ssh onto the node, the following code can be used to attach a debugger to your active process.
How to Attach a Debugger
n67$ ps aux | grep jcorn2 jcorn2 17668 0.0 0.0 106108 1224 ? S 11:09 0:00 /bin/bash /cm/local/apps/slurm/var/spool/job4583521/slurm_script jcorn2 17672 0.0 0.0 502220 4836 ? Sl 11:09 0:00 srun matlab jcorn2 17673 0.0 0.0 26996 704 ? S 11:09 0:00 srun matlab jcorn2 17704 0.0 0.0 112476 11076 ? SL 11:09 0:00 /home/jcorn2/bin/ jcorn2 17705 0.0 0.0 112476 11064 ? SL 11:09 0:00 /home/jcorn2/bin/ jcorn2 17706 0.0 0.0 112476 11064 ? SL 11:09 0:00 /home/jcorn2/bin/ jcorn2 17707 0.0 0.0 112476 11064 ? SL 11:09 0:00 /home/jcorn2/bin/ root 26975 0.0 0.0 103308 868 pts /0 S+ 11:54 0:00 grep jcorn2 n67$ strace -p 17705 Process 17705 attached accept(3, ^CProcess 17705 detached <detached ...> |