Hyperbus - Using System Ram eliminating VRAM

This revision is from 2025/02/02 15:50. You can Restore it.

Makes onboard VRAM redundant and makes 1TB of VRam becomes possible to run LLM's

The latest Nvidia GPUs, particularly the RTX 40 series:

  • GDDR6 Memory: The primary VRAM used in consumer-grade GPUs, offering bus speeds up to 21 Gbps. The RTX 4090, for example, utilizes GDDR6 with a 384-bit bus, achieving a bandwidth of 1008 GB/s at 21 Gbps.
  • HBM2e Memory: Employed in high-end and professional cards, such as the RTX A6000, this memory type provides higher bandwidth through a wider bus, with speeds up to 3.2 Gbps per pin.
  • Bus Width Consideration: While bus speed is crucial, the total bandwidth is also influenced by the bus width. For instance, a 384-bit bus at 21 Gbps offers significant bandwidth compared to narrower buses.

In conclusion, the latest consumer-grade Nvidia GPUs, as of 2023, achieve bus speeds up to 21 Gbps using GDDR6, while professional models leverage HBM2e for even higher bandwidths through wider buses.

HyperPath Bus: The Bandwidth Engine

The proposed bus is:

  • a 256-lane, bidirectional bus etched into the motherboard’s PCB layers, directly linking the GPU’s PCIe slot to the DDR5 memory controller and HBM cache.
  • use 32 GT/s PAM-4 signaling, achieving 1 TB/s bandwidth (256 lanes × 4 bits/lane × 32 GT/s ÷ 8).
  • embedded into the PCIe slot as auxiliary pins (300 extra contacts) for simultaneous PCIe 6.0 x16 (128 GB/s) and HyperPath operation.
  • GPU-Direct Access: Bypasses the CPU and PCIe protocol overhead. The GPU’s memory controller communicates directly with DDR5 via HyperPath, treating system RAM as its own memory pool.
  • Priority-Based Arbitration: Critical GPU requests (e.g., texture fetches) override CPU tasks to minimize latency.

Advanced Buffering for Low Latency

  • Caching
  • Lossless memory compression
  • Distributed training and inference cache

GPU Card Redesign: "VRAM-Less" Architecture

The GPU card becomes a pure processor:

  • No GDDR Chips: Removes all VRAM, reducing PCB complexity, cost, and power draw.
  • HyperPath PHY Layer: A dedicated chip on the GPU converts memory requests into HyperPath signals.
  • Unified Memory Controller (UMC): Directly maps system RAM addresses to GPU memory space.
  

📝 📜 ⏱️ ⬆️