Model Requirements

Models & Requirements

Default Model: Mixtral 8x7B AWQ

Model	Name	Quantization	Minimum GPU Requirements	VRAM Requirements	Minimum Layar Version
Mistral 7B V0.1	mistralai/Mistral-7B-Instruct-v0.1	None	A10 x1	24GB	1.7
Mixtral 8x7B	casperhansen/mixtral-instruct-awq	AWQ	A100 x1	80GB	1.7
Llama 3 70B	casperhansen/llama-3-70b-instruct-awq	AWQ	A10 x4	80GB	1.8
Llama 3 70B	meta-llama/llama-3-70b-instruct	None	A100 x2	160GB	1.8
Llama 3.1 70B	meta-llama/Meta-Llama-3.1-70B-Instruct	None	A100 x2	160GB	1.9
Llama 3.1 70B	hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4	GPTQ	A100 x1	80GB	1.9
Llama 3.1 70B	hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4	AWQ	A100 x1	80GB	1.9
Llama 3.1 8B	meta-llama/Meta-Llama-3.1-8B-Instruct	None	A10 x1	40GB	1.9.1
Llama 3.3 70B	meta-llama/Llama-3.3-70B-Instruct	None	A100 x2	80GB	1.12
Llama 3.3 70B	casperhansen/llama-3.3-70b-instruct-awq	AWQ	A100 x1	40GB	1.12
Llama 4 Scout	meta-llama /Llama-4-Scout-17B-16E	None	H100 x4	320GB	1.15
Llama 4 Maverick	meta-llama /Llama-4-Maverick-17B-128E-Original	None	H100 x8	640GB	1.15
Granite-34b	ibm-granite/Granite-34b-Instruct	None	A100 x1	80GB	1.10
Granite-20b	ibm-granite/Granite-20b-Instruct	None	A100 x1	80GB	1.10
Granite-8b-128k	ibm-granite/Granite-8b-128k-Instruct	None	A100 x1	80GB	1.10
Granite-8b	ibm-granite/Granite-8b-Instruct	None	A100 x1	80GB	1.10
Granite-3b	ibm-granite/Granite-3b-Instruct	None	A10 x1	24GB	1.10
GPT-oss-20b	openai/gpt-oss-20b	None	H100 x1	80GB	1.18
GPT-oss-120b	openai/gpt-oss-120b	None	H100 x1	80GB	1.18

🚧
Llama 3.1 VRAM Limitations
If you are on Layar 1.9, you will need to have GPUs with 80gb of VRAM. If there are further questions about this, please e-mail [email protected]

Which Model Should I Choose?

Llama 3.1 8B comes default for ease of install. However, using a more robust model will result in improved quality of responses. Factors you want to consider are quality of responses, throughput, and material the model was trained on. Larger models like Llama 3.3 70B have a higher parameter count allowing which contributes towards the quality of response. Throughput is lower on these models sizes because of the increased parameter count. A smaller model like Llama 3.1 8B requires smaller hardware while also allowing for increased throughput. However, the lower parameter count can result in lower quality of response.

📘
Sources Matter
When choosing a model it's important to understand what sort of information the model will be presented. Smaller parameter count means the model may not be able to infer complex pieces of information.

The last critical factor is what data the model was trained on. The difference between Granite and Llama is training material. Granite is trained on different coding languages and repositories. This allows users to prompt Granite to review or explain code. Llama is trained on a generalized set of data which makes it good for a lot of uses cases but doesn't excel in any specific one.

Setting Your System to a New Model

Please review Assigning Models to GPUs for steps on how to set Layar to use a New Model.

Models & Requirements

Default Model: Mixtral 8x7B AWQ

Llama 3.1 VRAM Limitations

Which Model Should I Choose?

Sources Matter

Setting Your System to a New Model

If you have issues, please contact Certara Support at [email protected]