Setting GPU Stateful Set
Stateful Set will allow you to place the LLM model over multiple GPUs in order for prompts to utilize multiple GPUs.
Caution
When editing the file, make sure only to add what is needed. Deleting or adding irrelevant fields can disrupt Layars ability to operate.
- SSH into your environment.
- Run sudo -i
- Run
vim /data/layar/layar.config
- Press i to start editing.
- Set
TGI_NUM_GPU: 1
- Set
VLLM_GPU_COUNT
to the desired amount of GPUs to clone the model across. IEVLLM_GPU_COUNT: 6
- Ensure
ALL_MODEL_INFO
is populated without GPU IDs.- An example of this would be as follows
VLLM_GPU_COUNT: 6 ALL_MODEL_INFO: ' [{"model_name":"hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4", "model_server":"http://certara-tgi:9000/v1/completions", "model_args":"--max-model-len 100000 --quantization gptq --gpu-memory-utilization 0.97 --trust-remote-code --enforce-eager"}] ' TGI_NUM_GPU: 1
- An example of this would be as follows
- Press ESC to exit input mode.
- Run
:wq
to commit the changes.
Updated 24 days ago