Model Requirements
Learn how to change the system model for Composer to any supported LLM
Models & Requirements
Default Model: Mixtral 8x7B AWQ
Model | Name | Quantization | Minimum GPU Requirements | VRAM Requirements | Minimum Layar Version |
---|---|---|---|---|---|
Mistral 7B V0.1 | mistralai/Mistral-7B-Instruct-v0.1 | None | A10 x1 | 24GB | 1.7 |
Mixtral 8x7B | casperhansen/mixtral-instruct-awq | AWQ | A100 x1 | 80GB | 1.7 |
Llama 3 70B | casperhansen/llama-3-70b-instruct-awq | AWQ | A10 x4 | 80GB | 1.8 |
Llama 3 70B | meta-llama/llama-3-70b-instruct | None | A100 x2 | 160GB | 1.8 |
Llama 3.1 70B | meta-llama/Meta-Llama-3.1-70B-Instruct | None | A100 x2 | 160GB | 1.9 |
Llama 3.1 70B | hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 | GPTQ | A100 x1 | 80GB | 1.9 |
Llama 3.1 70B | hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 | AWQ | A100 x1 | 80GB | 1.9 |
Llama 3.1 8B | meta-llama/Meta-Llama-3.1-8B-Instruct | None | A10 x1 | 40GB | 1.9.1 |
Granite-34b | ibm-granite/Granite-34b-Instruct | None | A100 x1 | 80GB | 1.10 |
Granite-20b | ibm-granite/Granite-20b-Instruct | None | A100 x1 | 80GB | 1.10 |
Granite-8b-128k | ibm-granite/Granite-8b-128k-Instruct | None | A100 x1 | 80GB | 1.10 |
Granite-8b | ibm-granite/Granite-8b-Instruct | None | A100 x1 | 80GB | 1.10 |
Granite-3b | ibm-granite/Granite-3b-Instruct | None | A10 x1 | 24GB | 1.10 |
Llama 3.1 VRAM Limitations
If you are on Layar 1.9, you will need to have GPUs with 80gb of VRAM. If there are further questions about this, please e-mail [email protected]
Which Model Should I Choose?
Mixtral 8x7B AWQ comes default for ease of install. However, using a more robust model will result in improved quality of responses. Factors you want to consider are quality of responses, throughput, and material the model was trained on. Larger models like Llama 3.1 70B have a higher parameter count allowing which contributes towards the quality of response. Throughput is lower on these models sizes because of the increased parameter count. A smaller model like Llama 3.1 8B requires smaller hardware while also allowing for increased throughput. However, the lower parameter count can result in lower quality of response.
Sources Matter
When choosing a model it's important to understand what sort of information the model will be presented. Smaller parameter count means the model may not be able to infer complex pieces of information.
The last critical factor is what data the model was trained on. The difference between Granite and Llama is training material. Granite is trained on different coding languages and repositories. This allows users to prompt Granite to review or explain code. Llama is trained on a generalized set of data which makes it good for a lot of uses cases but doesn't excel in any specific one.
Setting Your System to a New Model
Please review Assigning Models to GPUs for steps on how to set Layar to use a New Model.
If you have issues, please contact Certara Support at [email protected]
Updated about 2 months ago