[AI] (10) LLM large model inference GPU memory VRAM estimation

Little scum · Posted on 3/10/2025 2:46:38 PM

Requirements: When deploying a large language model (DeepSeek, qwen2.5), the VRAM memory of the required GPU varies depending on the number of parameters, activation, processing batch size, and accuracy factors of the model.

VRAM Introduction

VRAM (English: Video RAM, i.e. Video Random Access Memory) is a type of computer memory dedicated to storing graphics data such as pixels. DRAM (memory) used as a graphics card and graphics card is a dual-port random access memory that allows RAMDAC to be accessed simultaneously with image processing. It can generally include two parts, the first is the digital electronic part, which is used to accept the command of the microprocessor and format the received data. the other is the image generator part, which is used to further form the above data into a video signal.

Manual calculation

The VRAM usage estimation formula is as follows:

Reference address:The hyperlink login is visible.

VRAM Estimator

This tool can estimate the GPU VRAM usage of transformer-based models for inference and training. It can allow the input of various parameters such as model name, precision, maximum sequence length, batch size, number of GPUs. Provides a detailed breakdown of parameters, activations, outputs, and VRAM usage for CUDA cores.

Address:The hyperlink login is visible., as shown in the figure below:

Hugging Face Accelerate Model Memory Calculator

This tool calculates the memory usage of the model used for inference and training. Because it's a link to Hugging Face, you can enter the model name or URL, and the tool will provide a comprehensive breakdown of memory usage, including data type, largest tier, total size, and training memory usage using different optimizers.

Address:The hyperlink login is visible.

Can I Run This LLM

This is a more comprehensive Transformer-based tool that allows for the input of various parameters and provides a detailed breakdown of memory usage. Provides insight into how memory is allocated and utilized during inference and training.

Address:The hyperlink login is visible., as shown in the figure below:

[AI] (10) LLM large model inference GPU memory VRAM estimation

Related Posts

Sections viewed