Hyperbus - Using System Ram eliminating VRAM
This revision is from 2025/02/02 15:50. You can Restore it.
Makes onboard VRAM redundant and makes 1TB of VRam becomes possible to run LLM's
The latest Nvidia GPUs, particularly the RTX 40 series:
- GDDR6 Memory: The primary VRAM used in consumer-grade GPUs, offering bus speeds up to 21 Gbps. The RTX 4090, for example, utilizes GDDR6 with a 384-bit bus, achieving a bandwidth of 1008 GB/s at 21 Gbps.
- HBM2e Memory: Employed in high-end and professional cards, such as the RTX A6000, this memory type provides higher bandwidth through a wider bus, with speeds up to 3.2 Gbps per pin.
- Bus Width Consideration: While bus speed is crucial, the total bandwidth is also influenced by the bus width. For instance, a 384-bit bus at 21 Gbps offers significant bandwidth compared to narrower buses.
In conclusion, the latest consumer-grade Nvidia GPUs, as of 2023, achieve bus speeds up to 21 Gbps using GDDR6, while professional models leverage HBM2e for even higher bandwidths through wider buses.
HyperPath Bus: The Bandwidth Engine
The proposed bus is:
- a 256-lane, bidirectional bus etched into the motherboard’s PCB layers, directly linking the GPU’s PCIe slot to the DDR5 memory controller and HBM cache.
- use 32 GT/s PAM-4 signaling, achieving 1 TB/s bandwidth (256 lanes × 4 bits/lane × 32 GT/s ÷ 8).
- embedded into the PCIe slot as auxiliary pins (300 extra contacts) for simultaneous PCIe 6.0 x16 (128 GB/s) and HyperPath operation.
- GPU-Direct Access: Bypasses the CPU and PCIe protocol overhead. The GPU’s memory controller communicates directly with DDR5 via HyperPath, treating system RAM as its own memory pool.
- Priority-Based Arbitration: Critical GPU requests (e.g., texture fetches) override CPU tasks to minimize latency.
Advanced Buffering for Low Latency
- Caching
- Lossless memory compression
- Distributed training and inference cache
GPU Card Redesign: "VRAM-Less" Architecture
The GPU card becomes a pure processor:
- No GDDR Chips: Removes all VRAM, reducing PCB complexity, cost, and power draw.
- HyperPath PHY Layer: A dedicated chip on the GPU converts memory requests into HyperPath signals.
- Unified Memory Controller (UMC): Directly maps system RAM addresses to GPU memory space.