Skip to content

Commit b3f0b9f

Browse files
committed
Slow down extension_ai_analysis to reduce chance of hitting rate limits
1 parent 7f20af7 commit b3f0b9f

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

scripts/extension_ai_analysis.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@
1919
# It offers capable models for free with an OpenAI-compatible API.
2020
INFERENCE_URL = "https://inference.nebulablock.com/v1/chat/completions"
2121
INFERENCE_MODEL = "mistralai/Mistral-Small-3.2-24B-Instruct-2506"
22-
INFERENCE_RESPONSE_PER_MINUTE_LIMIT = 5
22+
INFERENCE_RESPONSE_PER_MINUTE_LIMIT = 4 # slow down to not exceed token per minute (tpm) limit of 60k
2323
INFERENCE_API_KEY = os.getenv("NEBULA_API_KEY")
24-
INFERENCE_MAX_CHARACTERS = 100000 # max characters in all files provided to the model, approximately 25k tokens
24+
INFERENCE_MAX_CHARACTERS = 100000 # max characters in all files provided to the model, approximately 25k tokens (limit is 32k)
2525

2626
QUESTIONS = [
2727
["Is there a EXTENSION_DESCRIPTION variable in the CMakeLists.txt file that describes what the extension does in a few sentences that can be understood by a person knowledgeable in medical image computing?", ["cmake"]],

0 commit comments

Comments
 (0)