vllm.model_executor.models.keye_vl1_5 ¶
KeyeVL1_5ImageEmbeddingInputs ¶
Bases: TensorSchema
Dimensions
- nf: Number of image features
- hs: Hidden size (must match the hidden size of language model backbone)
- ni: Number of images
- g: Grid dimensions (3 for t, h, w)
Source code in vllm/model_executor/models/keye_vl1_5.py
KeyeVL1_5ImagePixelInputs ¶
Bases: TensorSchema
Dimensions
- bnp: Batch size * Number of patches
- c: Number of channels
- ps: Patch size
- ni: Number of images
- g: Grid dimensions (3 for t, h, w)
Source code in vllm/model_executor/models/keye_vl1_5.py
KeyeVL1_5VideoEmbeddingInputs ¶
Bases: TensorSchema
Dimensions
- nf: Number of video features
- hs: Hidden size (must match the hidden size of language model backbone)
- nv: Number of videos
- g: Grid dimensions (3 for t, h, w)
Source code in vllm/model_executor/models/keye_vl1_5.py
KeyeVL1_5VideoPixelInputs ¶
Bases: TensorSchema
Dimensions
- bnp: Batch size * Number of patches
- c: Number of channels
- ps: Patch size
- ni: Number of images
- g: Grid dimensions (3 for t, h, w)
Source code in vllm/model_executor/models/keye_vl1_5.py
get_num_patches ¶
Return num_patches per video.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid_thw | Tensor | Tensor with shape [N, 3] containing temporal, height, width dimensions | required |
num_frames | list[int] | Tensor | List or tensor indicating the number of frames per video | required |
Returns:
| Type | Description |
|---|---|
list[int] | List of ints representing the number of patches for each video |
Examples:
>>> # Suppose there are 2 videos with a total of 3 grids
>>> grid_thw = torch.tensor(
... [
... [2, 2, 2], # grid 0: 2*2*2=8 patches
... [2, 2, 2], # grid 1: 2*2*2=8 patches
... [1, 1, 1],
... ]
... ) # grid 2: 1*1*1=1 patches
>>> num_frames = [2, 1] # The first video contains 2 grids,
the second contains 1 grid.
>>> get_num_patches(grid_thw, num_frames)
tensor([16, 1]) # Total patches for first video: 8+8=16,
second video: 1.
Source code in vllm/model_executor/models/keye_vl1_5.py
split_thw ¶
Split grid_thw in t dimension.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid_thw | Tensor | [N, 3] tensor of [t, h, w] | required |
Returns:
| Type | Description |
|---|---|
Tensor | [Σt, 3] tensor where each row is [1, h, w] |
Example:
grid_thw = torch.tensor([[2, 3, 4], [1, 5, 6]]) split_thw(grid_thw) tensor([[1, 3, 4], [1, 3, 4], [1, 5, 6]])