vllm.model_executor.model_loader.utils ¶
Utilities for selecting and loading models.
_MODEL_ARCH_BY_HASH module-attribute ¶
Caches the outputs of _get_model_architecture.
ParamMapping dataclass ¶
A class to handle parameter mapping for model weight loading. It creates a bidirectional mapping between packed parameters and their constituent parts.
Source code in vllm/model_executor/model_loader/utils.py
configure_quant_config ¶
configure_quant_config(
quant_config: QuantizationConfig,
model_class: type[Module],
)
Pass packed_modules_mapping by reference to quant_config so that quant_config can properly match fused modules
Note that model attributes are passed by reference to quant_config, enabling them to be updated by model_class.new (ex. chatglm, qwen)
Once the SupportsQuant mixin has been added to all models, this function can be removed
Source code in vllm/model_executor/model_loader/utils.py
initialize_model ¶
initialize_model(
vllm_config: VllmConfig,
*,
prefix: str = "",
model_class: type[Module] | None = None,
model_config: ModelConfig | None = None,
) -> Module
Initialize a model with the given configurations.