check_workload_cpu_gpu
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
check_workload_cpu_gpu [2016/06/06 15:53] – hj | check_workload_cpu_gpu [2017/07/10 20:06] (current) – hj | ||
---|---|---|---|
Line 1: | Line 1: | ||
=== Check CPUs workload: === | === Check CPUs workload: === | ||
- | - We have a simple script for you to check the workload of all machines, you may run: | + | |
- | - Run Linux 'top' command to check the CPU load, memory usage in each machines. | + | - Run Linux '**htop**' command to check the CPU load, memory usage in each machines. |
- | - If your program consumes lots of memory (over 10G), DON’T submit it more than once to a single machine. | + | - If your program consumes lots of memory (over 10G), DON’T submit it more than once to a single machine. |
+ | === Check GPUs workload: === | ||
+ | - We have a simple script for you to check the workload of all machines, you may run: **/ | ||
+ | - To check one server equipped with GPU, the GPU summary can be retried by “**nvidia-smi**”. As long as the remaining memory meets your memory need, it’s runnable. However, it may not progress since the GPU utilization is high. If there are 2 programs executing on the same GPU and one of them allocates too much memory, BOTH programs crash. “nvidia-smi” is not available on OSX. | ||
+ | |||
+ | |||
+ | In most machine learning framework, the first GPU is picked by default. Tensorflow, for example, will pre-allocate a chunk of memory on EVERY SINGLE GPU if you don’t explicitly mask the unneeded. Masking can be done by, for example “**setenv CUDA_VISIBLE_DEVICES 1**”, if you only want to expose the second GPU (GPU is 0-indexing). | ||
+ | |||
+ | {{: |
check_workload_cpu_gpu.1465228385.txt.gz · Last modified: 2016/06/06 15:53 by hj