vllm.model_executor.layers.quantization.utils.mxfp8_utils ¶
dequant_mxfp8_to_bf16 ¶
Dequantize MXFP8 tensor to BF16.
Source code in vllm/model_executor/layers/quantization/utils/mxfp8_utils.py
mxfp8_e4m3_quantize_fake ¶
Fake implementation for torch.compile tracing.