Assigning Models to GPUs
Introduction
The Layar config file allows you to assign specific models to specific GPUs. This allows users to utilize multiple models depending on the use case.
Accessing Layar.Config
- SSH into your environment.
- Run sudo -i
- Run vim /data/layar/layar.config
- Press i to start editing
⚠️Be careful not to change anything other than what is outlined in this guide
Assign GPUs to Model Server
TGI_NUM_GPU
dictates how many GPUs will be used to service the model server. The amount of GPUs assigned to the model server will determine throughput of the model.
Model GPU Requirements
Before proceeding, please review Model Requirements.
All_Model_Info
All_Model_Info
Dictates which models are used on which GPUs. The value is string containing a JSON dictionaries.
All_Model_Info = '
[{"model_name":"mistralai/Mistral-7B-Instruct-v0.1",
"model_server":"http://certara-tgi:9000/v1/completions",
"model_args":"--max-model-len 8192 --gpu-memory-utilization 0.45 --trust-remote-code --enforce-eager --kv-cache-dtype fp8",
"gpu_ids":[0]
},
{"model_name":"ibm-granite/granite-3b-code-instruct",
"model_server":"http://certara-tgi:8000/v1/completions",
"model_args":"--max-model-len 2048 --gpu-memory-utilization 0.45 --trust-remote-code --enforce-eager --kv-cache-dtype fp8",
"gpu_ids":[0]
},
{"model_name":"casperhansen/llama-3-70b-instruct-awq",
"model_server":"http://certara-tgi:7000/v1/completions",
"model_args":"--max-model-len 8192 --quantization marlin --gpu-memory-utilization 0.97 --trust-remote-code --enforce-eager --kv-cache-dtype fp8",
"gpu_ids":[1]
}]
'
The above is assuming you have 2 GPUs available to assign to models. Lets go over each individual field in the dictionary.
model_name
model_name
A string value that contains the full name of the model. For example, the default mixtral model full name is casperhansen/mixtral-instruct-awq
model_server
model_server
A string value that contains the URL of the model server. The port number needs to be different for each model added to the dictionary.
model_args
model_args
A string value that contains the model arguments to be used for that specific model.
Important Consideration
If you are planning on using a GPU to service multiple models, you must include the
gpu-memory-utilization
argument. These value must be less than or equal to 1. In the example above, the Mistral and Granite model are given 45 percent of the GPUs memory.
gpu_ids
gpu_ids
A list of integers. Each value will correlate back a specific GPU id which can be found by running nvidia-smi
. Adding more ids to the list will allow the model to be used across multiple GPUs, increasing prompt throughput.
Updated about 2 months ago