杭州跃芯智能科技

Performance Whitepaper

Complete performance comparison of VXI SRAM-CIM processor vs mainstream Edge AI solutions, plus interactive performance simulator.

Benchmark Comparison

Performance Comparison

VXI SRAM-CIM vs RK3588 NPU vs Jetson Orin NX — LLM on-device inference benchmark data

Metric	VXI	RK3588	Orin NX	Note
1.5B Decode (tok/s)	35-45	5-8	25-30	VXI-EM5 @ 5W vs RK3588 @ 8W vs Orin NX @ 15W
7B Decode (tok/s)	15-20	N/A	8-12	VXI-XC8 @ 12W vs Orin NX @ 15W
System Efficiency (TOPS/W)	27.8	3.2	5.6	LLM scenario measured, including DRAM data movement power
Power (W)	5	8	15	1.5B inference scenario typical power
KV Cache on-chip	16 MB	0	0	VXI fully on-chip, competitors rely on DRAM
3-Year TCO (USD)	~$15	~$45	~$120	Including module + electricity + cooling costs

* Data above are pre-silicon design targets. Actual performance is subject to production silicon measurements.

Performance Simulator

Select a model, adjust parameters, and instantly estimate VXI processor inference performance and resource requirements.

Quantization

Context Length：2,048

25640968192

Inference Speed

~37.4

tok/s

Throughput

~37.4

tok/s (batch=1)

Weight Size

715 MB

KV Cache

14.0 MB

Weights require batch loading(715 / 64 MB)

KV Cache fully resides on-chip(14.0 / 32 MB)

Recommended SKU

VXI-XC8

~8.4W estimated power

* This is a simplified estimation. Actual performance varies with model architecture, quantization method, batch size, etc.

Our technical team can provide customized performance reports for your specific models and scenarios.