Model Arguments
Introduction
Default model arguments are used when launching Layar. However, some models require additional arguments to be added to the Layar config file. This guide contains specific arguments per model.
Lean GPU configuration
If you want to use small GPUs (A10s or less), please review the GPU Considerations Guide
Llama 3.1 70B
The following arguments need to be used for both quantized and full versions of Llama 3.1 70B
TGI_MODEL: "hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4"
MODEL_ARGS: "--max-model-len 128000 --quantization gptq --gpu-memory-utilization 0.99 --trust-remote-code --enforce-eager"
TGI_MODEL: "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4"
MODEL_ARGS: "--max-model-len 128000 --quantization awq --gpu-memory-utilization 0.99 --trust-remote-code --enforce-eager"
TGI_MODEL: "meta-llama/Meta-Llama-3.1-70B-Instruct"
MODEL_ARGS: "--max-model-len 70000 --dtype half --gpu-memory-utilization 0.99 --trust-remote-code --enforce-eager"
Updated 3 months ago