HomeGuidesRecipesAPI EndpointsRelease NotesCommunity
Log In
Guides

Assigning Models to GPUs

Introduction

The Layar config file allows you to assign specific models to specific GPUs. This allows users to utilize multiple models depending on the use case.

Accessing Layar.Config

  1. SSH into your environment.
  2. Run sudo -i
  3. Run vim /data/layar/layar.config
  4. Press i to start editing

⚠️Be careful not to change anything other than what is outlined in this guide

Assign GPUs to Model Server

TGI_NUM_GPUdictates how many GPUs will be used to service the model server. The amount of GPUs assigned to the model server will determine throughput of the model.

📘

Model GPU Requirements

Before proceeding, please review Model Requirements.

All_Model_Info

Dictates which models are used on which GPUs. The value is string containing a JSON dictionaries.

All_Model_Info = '
                  [{"model_name":"mistralai/Mistral-7B-Instruct-v0.1",
                  "model_server":"http://certara-tgi:9000/v1/completions",
                  "model_args":"--max-model-len 8192 --gpu-memory-utilization 0.45 --trust-remote-code --enforce-eager --kv-cache-dtype fp8",
                  "gpu_ids":[0]
                  },
                  {"model_name":"ibm-granite/granite-3b-code-instruct",
                  "model_server":"http://certara-tgi:8000/v1/completions",
                  "model_args":"--max-model-len 2048 --gpu-memory-utilization 0.45 --trust-remote-code --enforce-eager --kv-cache-dtype fp8",
                  "gpu_ids":[0]
                  },
                  {"model_name":"casperhansen/llama-3-70b-instruct-awq",
                  "model_server":"http://certara-tgi:7000/v1/completions",
                  "model_args":"--max-model-len 8192 --quantization marlin --gpu-memory-utilization 0.97 --trust-remote-code --enforce-eager --kv-cache-dtype fp8",
                  "gpu_ids":[1]
                  }]
               '

The above is assuming you have 2 GPUs available to assign to models. Lets go over each individual field in the dictionary.

model_name

A string value that contains the full name of the model. For example, the default mixtral model full name is casperhansen/mixtral-instruct-awq

model_server

A string value that contains the URL of the model server. The port number needs to be different for each model added to the dictionary.

model_args

A string value that contains the model arguments to be used for that specific model.

📘

Important Consideration

If you are planning on using a GPU to service multiple models, you must include the gpu-memory-utilizationargument. These value must be less than or equal to 1. In the example above, the Mistral and Granite model are given 45 percent of the GPUs memory.

gpu_ids

A list of integers. Each value will correlate back a specific GPU id which can be found by running nvidia-smi. Adding more ids to the list will allow the model to be used across multiple GPUs, increasing prompt throughput.