HomeGuidesRecipesAPI EndpointsRelease NotesCommunity
Log In
Guides

Setting GPU Stateful Set

Stateful Set will allow you to place the LLM model over multiple GPUs in order for prompts to utilize multiple GPUs.

🚧

Caution

When editing the file, make sure only to add what is needed. Deleting or adding irrelevant fields can disrupt Layars ability to operate.

  1. SSH into your environment.
  2. Run sudo -i
  3. Run vim /data/layar/layar.config
  4. Press i to start editing.
  5. Set TGI_NUM_GPU: 1
  6. SetVLLM_GPU_COUNTto the desired amount of GPUs to clone the model across. IE VLLM_GPU_COUNT: 6
  7. Ensure ALL_MODEL_INFO is populated without GPU IDs.
    1. An example of this would be as follows
      VLLM_GPU_COUNT: 6
      ALL_MODEL_INFO: '
                        [{"model_name":"hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4",
                        "model_server":"http://certara-tgi:9000/v1/completions",
                        "model_args":"--max-model-len 100000 --quantization gptq --gpu-memory-utilization 0.97 --trust-remote-code    --enforce-eager"}]
                       '
      TGI_NUM_GPU: 1
      
  8. Press ESC to exit input mode.
  9. Run :wq to commit the changes.