[FC-0118] docs: add ADR for standardizing pagination across APIs#38300
Conversation
- Proposes DefaultPagination from edx-drf-extensions as platform-wide standard - Documents migration path for LimitOffsetPagination and unpaginated endpoints - Includes code examples for ListAPIView, APIView, and mobile pagination - Outlines rollout plan and alternatives considered
| * **User Accounts API** (``/api/user/v1/accounts/``) — pagination behavior differs from other user-related APIs, making it difficult for consumers to use a single data-loading pattern. | ||
| * **Course Members API** (``/api/courses/v1/.../members/``) — returns all enrollments without pagination, relying on a ``COURSE_MEMBER_API_ENROLLMENT_LIMIT`` setting (default 1000) to cap results and raising ``OverEnrollmentLimitException`` instead of paginating. | ||
| * **Enrollment API** (``/api/enrollment/v1/``) — some list endpoints return full result sets without pagination support. | ||
| * **Course Blocks API** (``/api/courses/v2/blocks/``) — intentionally returns unpaginated data for the entire course structure, which can result in very large response payloads. |
There was a problem hiding this comment.
In general, pagination of tree structures is complicated, to say the least.
Does a "page" size of 10 refer to 10 top-level items, which may potentially have hundreds of children included? Or a "varying shape" response with 1 top-level item + 9 children, or 8 top-level items + 2 children, and even more complexities with grandchildren and great-grandchildren? Or do we limit to returning 1 depth level at a time to avoid this?
Claude suggests the following:
The most principled approach distinguishes between two different questions clients are asking:
"What is the shape of this tree?" — This is a structural query. The answer (IDs, types, parent-child relationships, display names) is typically small and bounded even for large courses. It should be returned in full, without pagination, at controlled depth. A course with 500 blocks has maybe 5–15KB of structure. Trying to paginate this creates more problems than it solves.
"What is the full data for these nodes?" — This is a content query. Node content (student view data, completion state, grade details) can be large per node. This is where you paginate — but over a flat list of node IDs, not the tree itself.
Specifically, for the ADR, that would mean stating something like this:
Tree-shaped endpoints must not apply standard item-count pagination to the full node set. Instead, they must choose one of:
- Return the complete structural representation (IDs, types, relationships) and paginate separately over node content when requested, or
- Return the tree to a fixed maximum depth and provide explicit child-fetch URLs for any subtrees beyond that depth.
CC @jesperhodge re taxonomy pagination.
Note: Claude also said:
The course blocks API is actually a reasonable example of getting this mostly right already —
requested_fieldslets you strip the response down to structural metadata, and you can fetch full block detail separately. Its main gap is that the approach isn't documented as an explicit standard, so other tree-shaped APIs have reinvented things differently. ADR 0036 should probably make this the pattern explicitly.
There was a problem hiding this comment.
Hmm, I guess this is actually explored in #38305 - why not just combine that ADR into this one?
| Alternatives Considered | ||
| ----------------------- | ||
|
|
||
| * **Standardize on LimitOffsetPagination instead of PageNumberPagination**: Rejected because ``edx-drf-extensions`` already ships ``DefaultPagination`` based on ``PageNumberPagination``, and a significant portion of the platform already uses it. Additionally, ``limit``/``offset`` pagination degrades in performance with large offsets because the database must scan and skip all preceding rows, making it unsuitable for large Open edX datasets such as enrollments and completions. |
There was a problem hiding this comment.
Additionally,
limit/offsetpagination degrades in performance with large offsets because the database must scan and skip all preceding rows, making it unsuitable for large Open edX datasets such as enrollments and completions.
This doesn't make any sense. limit/offset pagination and page number pagination have exactly the same database performance characteristics if implemented naively. But this is just the client-facing API shape; technically, there are ways to implement either page number pagination or limit/offset pagination using a cursor internally to improve performance.
The main reasons to prefer page number pagination are that it's already widely used, and it's much easier for humans to understand than limit/offset.
| ----------------------- | ||
|
|
||
| * **Standardize on LimitOffsetPagination instead of PageNumberPagination**: Rejected because ``edx-drf-extensions`` already ships ``DefaultPagination`` based on ``PageNumberPagination``, and a significant portion of the platform already uses it. Additionally, ``limit``/``offset`` pagination degrades in performance with large offsets because the database must scan and skip all preceding rows, making it unsuitable for large Open edX datasets such as enrollments and completions. | ||
| * **Adopt CursorPagination as the platform standard**: Rejected because cursor-based pagination, while performant for large and frequently-changing datasets, does not support random page access (jumping to page N). This would break existing MFE patterns that display numbered page controls. Cursor pagination also requires a stable, unique, sequential sort key on every queryset, which not all Open edX models guarantee today. |
There was a problem hiding this comment.
Cursor pagination does not require a sort keys to be sequential nor unique. It just requires that you can define a deterministic ORDER BY on every QuerySet, and that the sort key is indexed (for performance).
While "basic" cursor-based pagination works like WHERE id > :last_seen_id ORDER BY id LIMIT :page_size, you could instead use WHERE (sort_key, id) > (last_value, last_id) ORDER BY sort_key, id to make cursor-based pagination work for any comparable, indexed type — timestamps, strings, UUIDs, whatever.
Currently, Open edX REST APIs implement pagination inconsistently across endpoints — some use page/page_size, others use limit/offset, and several return full unbounded result sets entirely. This forces every API consumer, whether an MFE, mobile client, or AI agent, to implement custom data-loading logic per endpoint, and risks overloading clients with large unpaginated payloads. This ADR proposes standardizing all list-type endpoints on the existing DefaultPagination class from edx-drf-extensions, enforcing a consistent response envelope across the platform and enabling consumers to implement a single reusable pagination loop for all Open edX APIs.
Issue: http://github.com/openedx/openedx-platform/issues/38266