Model Arguments

Introduction

Default model arguments are used when launching Layar. However, some models require additional arguments to be added to the Layar config file. This guide contains specific arguments per model.

📘
Lean GPU configuration
If you want to use small GPUs (A10s or less), please review the GPU Considerations Guide

Llama 3.1 70B

The following arguments need to be used for both quantized and full versions of Llama 3.1 70B

TGI_MODEL: "hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4"
MODEL_ARGS: "--max-model-len 128000 --quantization gptq --gpu-memory-utilization 0.99 --trust-remote-code --enforce-eager"

TGI_MODEL: "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4"
MODEL_ARGS: "--max-model-len 128000 --quantization awq --gpu-memory-utilization 0.99 --trust-remote-code --enforce-eager"

TGI_MODEL: "meta-llama/Meta-Llama-3.1-70B-Instruct"
MODEL_ARGS: "--max-model-len 70000 --dtype half --gpu-memory-utilization 0.99 --trust-remote-code --enforce-eager"