-
Notifications
You must be signed in to change notification settings - Fork 680
Pull requests: InternLM/lmdeploy
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
add explicit trust_remote_code controls to resolve the security issue
#4511
opened Apr 8, 2026 by
lvhan028
Loading…
feat: Add TurboQuant (quant_policy=42) support for KV Cache Quantization
#4510
opened Apr 8, 2026 by
windreamer
•
Draft
make fp8 model quantized by llm-compressor can be inferenced in turbomind
enhancement
New feature or request
#4509
opened Apr 8, 2026 by
43758726
Loading…
[Fix]: Handle None scales in generate_zero_point for mixed-format layers
#4505
opened Apr 7, 2026 by
lingyezhixing
Loading…
3 tasks done
fix: handle missing KV cache without crashing engine
Bug:P0
#4497
opened Apr 4, 2026 by
lvhan028
Loading…
Reject requests on stale session or sleeping engine
improvement
#4496
opened Apr 4, 2026 by
lvhan028
Loading…
feat(turbomind): integrate cublasGemmGroupedBatchedEx for Qwen3.5 MoE inference on Blackwell GPUs with memory copy optimizations
enhancement
New feature or request
#4490
opened Apr 3, 2026 by
hd9568
Loading…
Integrate deep-ep nccl backend
enhancement
New feature or request
#4477
opened Mar 27, 2026 by
irexyc
Loading…
[refactor] [api_server] [1/N] Improve reasoning and tool-call parsers
improvement
#4468
opened Mar 26, 2026 by
lvhan028
Loading…
feat: Turbomind linear gdn prefix caching
enhancement
New feature or request
#4465
opened Mar 25, 2026 by
lapy
Loading…
feat: implement Turbomind vision encoder support for Qwen3VL/3.5 families
enhancement
New feature or request
#4460
opened Mar 24, 2026 by
lapy
Loading…
[Feature] Support n parameter in /v1/chat/completions and /v1/completions
improvement
#4419
opened Mar 17, 2026 by
ziyangliu-666
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.