Skip to content

Added MDP generation to QEff Compile#930

Open
quic-mohmeh wants to merge 3 commits intoquic:mainfrom
quic-mohmeh:mdp
Open

Added MDP generation to QEff Compile#930
quic-mohmeh wants to merge 3 commits intoquic:mainfrom
quic-mohmeh:mdp

Conversation

@quic-mohmeh
Copy link
Copy Markdown

This PR adds the MDP generation required in case of disaggregated serving for Prefill. This supports both Pipeline Prefill + Tensor Slicing and also supports passing custom cores to the MDP generator

Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>
Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>
Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>
@quic-mohmeh
Copy link
Copy Markdown
Author

Tested and working on the following model classes

  • CodeLlama-7b-Instruct
  • falcon-7b-instruct
  • gemma-2-9b-it
  • gpt-oss-20b
  • granite-3.1-8b-instruct
  • Llama-3.2-1B-Instruct
  • Llama-3.2-3B
  • Phi-3-mini-4k-instruct

@quic-rishinr
Copy link
Copy Markdown
Contributor

@mamtsing @ochougul please review the PR

@quic-mohmeh
Copy link
Copy Markdown
Author

@quic-rishinr @mamtsing @ochougul A gentle reminder for review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add warning that ignores mdp_ts_num_partitions whenever seq_len==1
Also, add a warning that ignores when ts_num_devices>1 and seq_len> and mdp_ts_num_partitions>1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants