User Tools

Site Tools


check_workload_cpu_gpu

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
check_workload_cpu_gpu [2016/06/06 15:53] hjcheck_workload_cpu_gpu [2017/07/10 20:05] hj
Line 1: Line 1:
  === Check CPUs workload: ===  === Check CPUs workload: ===
-     - We have a simple script for you to check the workload of all machines, you may run:  //cs/home/hj/bin/available_computers.pl//. Every time when you submit a new job, please use this command to look for a free or light-loaded machine. For a 6-core machine, we normally should not have its workload over 6. +     - We have a simple script for you to check the workload of all machines, you may run:  **/cs/home/hj/bin/available_computers.pl**. Every time when you submit a new job, please use this command to look for a free or light-loaded machine. For a 6-core machine, we normally should not have its workload over 6. 
-    - Run Linux 'top' command to check the CPU load, memory usage in each machines.+    - Run Linux '**htop**' command to check the CPU load, memory usage in each machines.
     - If your program consumes lots of memory (over 10G), DON’T submit it more than once to a single machine.     - If your program consumes lots of memory (over 10G), DON’T submit it more than once to a single machine.
  
 + === Check GPUs workload: ===
  
 +  - We have a simple script for you to check the workload of all machines, you may run:  **/cs/home/hj/bin/AllGPUStat.sh**.  
 +  - To check one server equipped with GPU, the GPU summary can be retried by “**nvidia-smi**”. As long as the remaining memory meets your memory need, it’s runnable. However, it may not progress since the GPU utilization is high. If there are 2 programs executing on the same GPU and one of them allocates too much memory, BOTH programs crash. “nvidia-smi” is not available on OSX. 
 +
 +
 +In most machine learning framework, the first GPU is picked by default. Tensorflow, for example, will pre-allocate a chunk of memory on EVERY SINGLE GPU if you don’t explicitly mask the unneeded. Masking can be done by, for example “**setenv CUDA_VISIBLE_DEVICES 1**”, if you only want to expose the second GPU (GPU is 0-indexing). 
 +
 +{{:pastedgraphic-2.jpg?0x400|}}
check_workload_cpu_gpu.txt · Last modified: 2017/07/10 20:06 by hj