Frontpage
The most efficient data center accelerator for high-performance LLM and multimodal deployment
Tensor Contraction Architecture (TCA)
Tensor Contraction Architecture (TCA) is the architecture behind all Furiosa accelerators – designed to unlock powerful performance and unparalleled energy efficiency on the most capable AI models.
Llama 2 7B
L40S | H100 | RNGD | ||
---|---|---|---|---|
Perf/Watt (tokens/sec/W) | B=16, IL=2K, OL=2K | 1.52 | 6.24 | |
B=32, IL=2K, OL=2K | 3.19 | 8.62 |
L40S | H100 | RNGD | ||
---|---|---|---|---|
Latency (ms) | B=1, L=128 | 14 | 7 | 8 |
L40S | H100 | RNGD | ||
---|---|---|---|---|
Throughput tokens (tokens/s) | B=16, IL=2K, OL=2K | 531 | 935 | |
B=32, IL=2K, OL=2K | 2230 | 1293 |
Disclaimer: Measurements by FuriosaAI internally on current specifications and/or internal engineering calculations. Nvidia results were retrieved from Nvidia website, https://developer.nvidia.com/deep-learning-performance-training-inference/ai-inference, on February 14, 2024.
L40S | H100 | RNGD | |
---|---|---|---|
Technology | TSMC 5nm | TSMC 4nm | TSMC 5nm |
BF16/FP8 (TFLOPS) | 362/733 | 989/1979 | 256/512 |
INT8/INT4 (TOPS) | 733/733 | 1979/- | 512/1024 |
Memory Capacity (GB) | 48 | 80 | 48 |
Memory Capacity (TB/s) | 0.86 | 3.35 | 1.5 |
Host I/F | Gen4 x16 | Gen5 x16 | Gen5 x16 |
TDP (W) | 350 | 700 | 150 |
Purpose-built for tensor contraction
How Furiosa TCA unlocks powerful performance and energy efficiency
AI models structure data in tensors of various dimensions. The architecture adapts to each tensor contraction via software-defined tactics.
Intermediary tensors are maintained in the on-chip memory (SRAM), akin to model-wise operator fusion.
This allows the chip to fully exploit parallelism and maximize data reuse for maximum utilization for inference deployment.
Meet Renegade
The most efficient data center accelerator for high-performance LLM and multimodal deployment
- 512 TFLOPS
- 64 TFLOPS (FP8) x 8 Processing Elements
- 48GB
- HBM3 Memory Capacity
- 1.5TB/s
- Memory Bandwidth
RNGD Series
RNGD-S
Leadership performance for creatives, media and entertainment, and video AI
RNGD
Versatile cloud and on-prem LLM and Multimodal deployment
RNGD-Max
Powerful cloud and on-prem LLM and Multimodal deployment