Hyperbus - Using System Ram eliminating VRAM

This revision is from 2025/02/02 16:16. You can Restore it.

Makes onboard VRAM redundant and 1TB of VRam becomes possible to run LLM's

Shared Graphics Memory: https://en.wikipedia.org/wiki/Shared_graphics_memory

The latest Nvidia GPUs, particularly the RTX 40 series:

  • GDDR6 Memory: The primary VRAM used in consumer-grade GPUs, offering bus speeds up to 21 Gbps. The RTX 4090, for example, utilizes GDDR6 with a 384-bit bus, achieving a bandwidth of 1008 GB/s at 21 Gbps.
  • HBM2e Memory: Employed in high-end and professional cards, such as the RTX A6000, this memory type provides higher bandwidth through a wider bus, with speeds up to 3.2 Gbps per pin.
  • Bus Width Consideration: While bus speed is crucial, the total bandwidth is also influenced by the bus width. For instance, a 384-bit bus at 21 Gbps offers significant bandwidth compared to narrower buses.

In conclusion, the latest consumer-grade Nvidia GPUs, as of 2023, achieve bus speeds up to 21 Gbps using GDDR6, while professional models leverage HBM2e for even higher bandwidths through wider buses.

HyperPath Bus: The Bandwidth Engine

Physical Layer:

  • 384-lane bus embedded in the PCB, using low-cost PAM-3 signaling (3 bits per cycle) at 24 GT/s.
  • Bandwidth: 384 lanes × 24 GT/s × 3 bits ÷ 8 = 3.456 TB/s (matches GDDR6X).
  • Integrated into PCIe 5.0 x16 slots via 200 auxiliary pins or uses a ZIF socket and byspasses PCIe.

Protocol:

  • Direct RAM Mapping: GPU sees system RAM as contiguous VRAM space.
  • GPU-Direct Access: Bypasses the CPU and PCIe protocol overhead. The GPU’s memory controller communicates directly with DDR5 via HyperPath, treating system RAM as its own memory pool.
  • Burst Mode: Aggregates small GPU requests into large 512B packets to maximize bus efficiency
  • Priority-Based Arbitration: Critical GPU requests (e.g., texture fetches) override CPU tasks to minimize latency.

Advanced Buffering for Low Latency

  • Caching
  • Lossless memory compression
  • Distributed training and inference cache

GPU Card Redesign: "VRAM-Less" Architecture

The GPU card becomes a pure processor:

  • Could still utilize an L1 cache.
  • No GDDR Chips: Removes all VRAM, reducing PCB complexity, cost, and power draw.
  • HyperPath PHY Layer: A dedicated chip on the GPU converts memory requests into HyperPath signals.
  • Unified Memory Controller (UMC): Directly maps system RAM addresses to GPU memory space.
  

📝 📜 ⏱️ ⬆️