Immortality Coin: Invest in Your Cure for Aging with Science

Hyperbus - Using System Ram eliminating VRAM

This revision is from 2025/02/02 18:59. You can Restore it.

Makes onboard VRAM redundant and 1TB of VRam becomes possible to run LLM's

Shared Graphics Memory: https://en.wikipedia.org/wiki/Shared_graphics_memory

The latest Nvidia GPUs, particularly the RTX 40 series:

GDDR6 Memory: The primary VRAM used in consumer-grade GPUs, offering bus speeds up to 21 Gbps. The RTX 4090, for example, utilizes GDDR6 with a 384-bit bus, achieving a bandwidth of 1008 GB/s at 21 Gbps.
HBM2e Memory: Employed in high-end and professional cards, such as the RTX A6000, this memory type provides higher bandwidth through a wider bus, with speeds up to 3.2 Gbps per pin.
Bus Width Consideration: While bus speed is crucial, the total bandwidth is also influenced by the bus width. For instance, a 384-bit bus at 21 Gbps offers significant bandwidth compared to narrower buses.

In conclusion, the latest consumer-grade Nvidia GPUs, as of 2023, achieve bus speeds up to 21 Gbps using GDDR6, while professional models leverage HBM2e for even higher bandwidths through wider buses.

HyperPath Bus: The Bandwidth Engine

Physical Layer:

384-lane bus embedded in the PCB, using low-cost PAM-3 signaling (3 bits per cycle) at 24 GT/s.
Bandwidth: 384 lanes × 24 GT/s × 3 bits ÷ 8 = 3.456 TB/s (matches GDDR6X).
Integrated into PCIe 5.0 x16 slots via 200 auxiliary pins or uses a ZIF socket and bypasses PCIe.

Protocol:

Direct RAM Mapping: GPU sees system RAM as contiguous VRAM space.
GPU-Direct Access: Bypasses the CPU and PCIe protocol overhead. The GPU’s memory controller communicates directly with DDR5 via HyperPath, treating system RAM as its own memory pool.
Burst Mode: Aggregates small GPU requests into large 512B packets to maximize bus efficiency
Priority-Based Arbitration: Critical GPU requests (e.g., texture fetches) override CPU tasks to minimize latency.

Advanced Buffering for Low Latency

Caching
Lossless memory compression
Distributed training and inference cache
AirLLM, make AirLLM possible. https://github.com/lyogavin/airllm

GPU Card Redesign: "VRAM-Less" Architecture

The GPU card becomes a pure processor:

Could still utilize an L1 cache.
No GDDR Chips: Removes all VRAM, reducing PCB complexity, cost, and power draw.
HyperPath PHY Layer: A dedicated chip on the GPU converts memory requests into HyperPath signals.
Unified Memory Controller (UMC): Directly maps system RAM addresses to GPU memory space.

📝 📜 ⏱️ ⬆️