Skip to content

Commit 2f91f45

Browse files
UN-3230 [FEAT] Implement back-off retry mechanism for API deployment client
Add exponential back-off retry with full jitter to APIDeploymentsClient for improved reliability against transient 5xx errors and 429 rate limits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2511336 commit 2f91f45

8 files changed

Lines changed: 1066 additions & 29 deletions

File tree

README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ Then, create an instance of the `APIDeploymentsClient`:
2929
client = APIDeploymentsClient(api_url="url", api_key="your_api_key")
3030
```
3131

32+
> **Note:** Pass the raw API key **without** the `"Bearer "` prefix — the client adds it automatically.
33+
3234
Now, you can use the client to interact with the Unstract API deployments API:
3335

3436
```python
@@ -61,10 +63,42 @@ except APIDeploymentsClientException as e:
6163

6264
## Parameter Details
6365

66+
`api_url`: The URL of the Unstract API deployment.
67+
`api_key`: Your raw API key. **Do not** include the `"Bearer "` prefix — the client adds it automatically.
6468
`api_timeout`: Set a timeout for API requests, e.g., `api_timeout=10`.
6569
`logging_level`: Set logging verbosity (e.g., "`DEBUG`").
6670
`include_metadata`: If set to `True`, the response will include additional metadata (cost, tokens consumed and context) for each call made by the Prompt Studio exported tool.
6771

72+
## Retry Configuration
73+
74+
The client includes built-in exponential backoff retry with the following behavior:
75+
76+
- **Async mode** (`api_timeout=0`): POST requests are retried on transient failures (5xx, 429) and connection errors, since the server returns immediately after queuing.
77+
- **Sync mode** (`api_timeout > 0`, the default): POST requests are **not** retried, because the server blocks during processing — a failure may mean the request was processed but the response was lost.
78+
- **Status polling** (`check_execution_status`): GET requests are always retried, as they are idempotent.
79+
80+
Retries are enabled by default and can be customized:
81+
82+
```python
83+
client = APIDeploymentsClient(
84+
api_url="url",
85+
api_key="your_api_key",
86+
max_retries=4, # Max retry attempts (default: 4, set to 0 to disable)
87+
initial_delay=2.0, # Initial delay in seconds (default: 2.0)
88+
max_delay=60.0, # Maximum delay cap in seconds (default: 60.0)
89+
backoff_factor=2.0, # Multiplier per retry (default: 2.0)
90+
)
91+
```
92+
93+
| Parameter | Default | Description |
94+
|-----------|---------|-------------|
95+
| `max_retries` | `4` | Maximum number of retry attempts. Set to `0` to disable retries. |
96+
| `initial_delay` | `2.0` | Initial delay in seconds before the first retry. |
97+
| `max_delay` | `60.0` | Maximum delay cap in seconds between retries. |
98+
| `backoff_factor` | `2.0` | Multiplier applied to the delay for each subsequent retry. |
99+
100+
The retry logic uses exponential backoff with full jitter and respects the `Retry-After` header on 429 responses.
101+
68102

69103
## Questions and Feedback
70104

src/unstract/api_deployments/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = "1.1.0"
1+
__version__ = "1.2.0"
22

33
from .client import APIDeploymentsClient
44

src/unstract/api_deployments/client.py

Lines changed: 191 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,12 @@
1010
import logging
1111
import ntpath
1212
import os
13+
import random
14+
import time
1315
from urllib.parse import urlparse
1416

1517
import requests
16-
from requests.exceptions import JSONDecodeError
18+
from requests.exceptions import ConnectionError, JSONDecodeError, Timeout
1719

1820
from unstract.api_deployments.utils import UnstractUtils
1921

@@ -54,14 +56,22 @@ def __init__(
5456
api_timeout: int = 300,
5557
logging_level: str = "INFO",
5658
include_metadata: bool = False,
57-
verify: bool = True
59+
verify: bool = True,
60+
max_retries: int = 4,
61+
initial_delay: float = 2.0,
62+
max_delay: float = 60.0,
63+
backoff_factor: float = 2.0,
5864
):
5965
"""Initializes the APIClient class.
6066
6167
Args:
6268
api_key (str): The API key to authenticate the API request.
6369
api_timeout (int): The timeout to wait for the API response.
6470
logging_level (str): The logging level to log messages.
71+
max_retries (int): Maximum number of retry attempts for failed requests.
72+
initial_delay (float): Initial delay in seconds before the first retry.
73+
max_delay (float): Maximum delay in seconds between retries.
74+
backoff_factor (float): Multiplier applied to delay for each retry.
6575
"""
6676
if logging_level == "":
6777
logging_level = os.getenv("UNSTRACT_API_CLIENT_LOGGING_LEVEL", "INFO")
@@ -88,6 +98,21 @@ def __init__(
8898
self.__save_base_url(api_url)
8999
self.include_metadata = include_metadata
90100
self.verify = verify
101+
self.max_retries = max_retries
102+
self.initial_delay = initial_delay
103+
self.max_delay = max_delay
104+
self.backoff_factor = backoff_factor
105+
106+
def _is_retryable_status(self, status_code: int) -> bool:
107+
"""Checks whether a status code should trigger a retry.
108+
109+
Args:
110+
status_code (int): The HTTP status code to check.
111+
112+
Returns:
113+
bool: True if the request should be retried.
114+
"""
115+
return status_code >= 500 or status_code == 429
91116

92117
def __save_base_url(self, full_url: str):
93118
"""Extracts the base URL from the full URL and saves it.
@@ -99,6 +124,124 @@ def __save_base_url(self, full_url: str):
99124
self.base_url = parsed_url.scheme + "://" + parsed_url.netloc
100125
self.logger.debug("Base URL: " + self.base_url)
101126

127+
def _calculate_delay(self, attempt: int) -> float:
128+
"""Calculates the delay before the next retry using exponential backoff
129+
with full jitter.
130+
131+
Args:
132+
attempt (int): The current retry attempt number (0-indexed).
133+
134+
Returns:
135+
float: The delay in seconds.
136+
"""
137+
exp_delay = min(
138+
self.initial_delay * (self.backoff_factor**attempt), self.max_delay
139+
)
140+
return random.uniform(0, exp_delay)
141+
142+
def _get_retry_delay(self, response, attempt: int) -> float:
143+
"""Returns the delay before the next retry.
144+
145+
For 429 responses, respects the Retry-After header if present.
146+
Otherwise falls back to exponential backoff with jitter.
147+
"""
148+
if response is not None and response.status_code == 429:
149+
retry_after = response.headers.get("Retry-After")
150+
if retry_after is not None:
151+
try:
152+
return float(retry_after)
153+
except (ValueError, TypeError):
154+
pass
155+
return self._calculate_delay(attempt)
156+
157+
@staticmethod
158+
def _rewind_files(files):
159+
"""Rewinds file objects so they can be re-sent on retry."""
160+
for file_tuple in files:
161+
file_obj = file_tuple[1]
162+
if hasattr(file_obj, "seek"):
163+
file_obj.seek(0)
164+
elif isinstance(file_obj, tuple) and len(file_obj) >= 2:
165+
if hasattr(file_obj[1], "seek"):
166+
file_obj[1].seek(0)
167+
168+
def _request_with_retry(self, method: str, url: str, **kwargs) -> requests.Response:
169+
"""Makes an HTTP request with exponential backoff retry logic.
170+
171+
Args:
172+
method (str): The HTTP method (e.g., "GET", "POST").
173+
url (str): The request URL.
174+
**kwargs: Additional keyword arguments passed to requests.request().
175+
176+
Returns:
177+
requests.Response: The response from the request.
178+
179+
Raises:
180+
ConnectionError: If a connection error persists after all retries.
181+
Timeout: If a timeout persists after all retries.
182+
"""
183+
response = None
184+
185+
for attempt in range(self.max_retries + 1):
186+
# Rewind file objects for retry attempts
187+
if attempt > 0:
188+
files = kwargs.get("files")
189+
if files:
190+
self._rewind_files(files)
191+
192+
try:
193+
response = requests.request(method, url, **kwargs)
194+
195+
if not self._is_retryable_status(response.status_code):
196+
return response
197+
198+
if attempt < self.max_retries:
199+
delay = self._get_retry_delay(response, attempt)
200+
self.logger.warning(
201+
"Request to %s returned %d. Retrying in %.1fs "
202+
"(attempt %d/%d).",
203+
url,
204+
response.status_code,
205+
delay,
206+
attempt + 1,
207+
self.max_retries,
208+
)
209+
time.sleep(delay)
210+
else:
211+
self.logger.warning(
212+
"Request to %s returned %d. Retries exhausted (%d/%d).",
213+
url,
214+
response.status_code,
215+
self.max_retries,
216+
self.max_retries,
217+
)
218+
219+
except (ConnectionError, Timeout) as exc:
220+
response = None
221+
if attempt < self.max_retries:
222+
delay = self._get_retry_delay(None, attempt)
223+
self.logger.warning(
224+
"%s during request to %s. Retrying in %.1fs "
225+
"(attempt %d/%d).",
226+
type(exc).__name__,
227+
url,
228+
delay,
229+
attempt + 1,
230+
self.max_retries,
231+
)
232+
time.sleep(delay)
233+
else:
234+
self.logger.warning(
235+
"%s during request to %s. Retries exhausted (%d/%d).",
236+
type(exc).__name__,
237+
url,
238+
self.max_retries,
239+
self.max_retries,
240+
)
241+
raise
242+
243+
return response
244+
102245
def structure_file(self, file_paths: list[str]) -> dict:
103246
"""Invokes the API deployed on the Unstract platform.
104247
@@ -115,7 +258,10 @@ def structure_file(self, file_paths: list[str]) -> dict:
115258
"Authorization": "Bearer " + self.api_key,
116259
}
117260

118-
data = {"timeout": self.api_timeout, "include_metadata": self.include_metadata}
261+
form_data = {
262+
"timeout": self.api_timeout,
263+
"include_metadata": self.include_metadata,
264+
}
119265

120266
files = []
121267

@@ -133,13 +279,28 @@ def structure_file(self, file_paths: list[str]) -> dict:
133279
except FileNotFoundError as e:
134280
raise APIDeploymentsClientException("File not found: " + str(e))
135281

136-
response = requests.post(
137-
self.api_url,
138-
headers=headers,
139-
data=data,
140-
files=files,
141-
verify=self.verify,
142-
)
282+
if self.api_timeout == 0:
283+
# Async mode: server returns immediately after queuing.
284+
# A 5xx means queuing failed — safe to retry.
285+
response = self._request_with_retry(
286+
"POST",
287+
self.api_url,
288+
headers=headers,
289+
data=form_data,
290+
files=files,
291+
verify=self.verify,
292+
)
293+
else:
294+
# Sync mode: server blocks during processing.
295+
# A 5xx may mean it processed but response was lost — don't retry
296+
# to avoid duplicate executions.
297+
response = requests.post(
298+
self.api_url,
299+
headers=headers,
300+
data=form_data,
301+
files=files,
302+
verify=self.verify,
303+
)
143304
self.logger.debug(response.status_code)
144305
self.logger.debug(response.text)
145306
# The returned object is wrapped in a "message" key.
@@ -194,14 +355,16 @@ def structure_file(self, file_paths: list[str]) -> dict:
194355
"extraction_result": extraction_result,
195356
}
196357

197-
# Check if the status is pending or if it's successful but lacks a result
198-
if 200 <= response.status_code < 300:
199-
if execution_status in self.in_progress_statuses or (
200-
execution_status == "SUCCESS" and not extraction_result
201-
):
202-
obj_to_return.update(
203-
{"status_check_api_endpoint": status_api_endpoint, "pending": True}
204-
)
358+
# Check if the status is pending or if it's successful but lacks a result.
359+
# Per the Unstract Status API migration guide (Option 1), we determine
360+
# pending state from the response body alone, ignoring the HTTP status
361+
# code — the server currently returns 422 for PENDING/EXECUTING.
362+
if execution_status in self.in_progress_statuses or (
363+
execution_status == "SUCCESS" and not extraction_result
364+
):
365+
obj_to_return.update(
366+
{"status_check_api_endpoint": status_api_endpoint, "pending": True}
367+
)
205368

206369
return obj_to_return
207370

@@ -221,7 +384,8 @@ def check_execution_status(self, status_check_api_endpoint: str) -> dict:
221384
}
222385
status_call_url = self.base_url + status_check_api_endpoint
223386
self.logger.debug("Checking execution status via endpoint: " + status_call_url)
224-
response = requests.get(
387+
response = self._request_with_retry(
388+
"GET",
225389
status_call_url,
226390
headers=headers,
227391
params={"include_metadata": self.include_metadata},
@@ -265,10 +429,14 @@ def check_execution_status(self, status_check_api_endpoint: str) -> dict:
265429
# If the execution status is pending, extract the execution ID from the response
266430
# and return it in the response.
267431
# Later, users can use the execution ID to check the status of the execution.
268-
if (
269-
200 <= response.status_code < 500
270-
and obj_to_return["execution_status"] in self.in_progress_statuses
271-
):
432+
if obj_to_return["execution_status"] in self.in_progress_statuses:
272433
obj_to_return["pending"] = True
434+
elif self._is_retryable_status(response.status_code):
435+
obj_to_return["pending"] = True
436+
self.logger.warning(
437+
"Status check returned %d after retries; "
438+
"marking as pending to continue polling.",
439+
response.status_code,
440+
)
273441

274442
return obj_to_return

tests/README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Tests
2+
3+
## Unit Tests
4+
5+
Mocked tests that require no external setup:
6+
7+
```bash
8+
uv run pytest -s -v tests/
9+
```
10+
11+
## Integration Test (`client_test.py`)
12+
13+
This test runs against a live Unstract API deployment.
14+
15+
### Setup
16+
17+
1. Copy `tests/sample.env` to `.env` in the **project root**:
18+
```bash
19+
cp tests/sample.env .env
20+
```
21+
2. Fill in the values:
22+
- `API_URL` — your API deployment URL
23+
- `UNSTRACT_API_DEPLOYMENT_KEY` — your raw API key (**without** the `"Bearer "` prefix; the client adds it automatically)
24+
- `TEST_FILES` — comma-separated paths to files for structuring (e.g. `/path/to/test1.pdf,/path/to/test2.pdf`)
25+
26+
### Run
27+
28+
```bash
29+
uv run python tests/client_test.py
30+
```

tests/client_test.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@ def main():
1616
adc = APIDeploymentsClient(
1717
api_url=os.getenv("API_URL"),
1818
api_key=os.getenv("UNSTRACT_API_DEPLOYMENT_KEY"),
19-
api_timeout=10,
19+
api_timeout=0,
2020
logging_level="DEBUG",
2121
include_metadata=False,
2222
)
23-
# Replace files with pdfs
24-
response = adc.structure_file(["<files>"])
23+
file_paths = os.getenv("TEST_FILES", "").split(",")
24+
response = adc.structure_file(file_paths)
2525
print(response)
2626
if response["pending"]:
2727
while True:

tests/sample.env

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
1-
API_URL=
2-
UNSTRACT_API_DEPLOYMENT_KEY=
1+
API_URL="http://localhost:8000/deployment/api/<org_id>/<api_name>/"
2+
UNSTRACT_API_DEPLOYMENT_KEY="your-api-key-without-bearer-prefix"
3+
TEST_FILES="/path/to/test1.pdf,/path/to/test2.pdf"

0 commit comments

Comments
 (0)