[spark] Add scan.max.records.per.partition config to split log table input partitions by Yohahaha · Pull Request #3260 · apache/fluss

Yohahaha · 2026-05-07T02:52:22Z

Purpose

Linked issue: close #3215

Brief change log

Introduce scan.max.records.per.partition config option for Spark log table reads. When set, each Fluss
bucket whose offset range exceeds this value will be split into multiple Spark input partitions, improving
read parallelism for large offset ranges.
Update BucketOffsetsRetrieverImpl to support fetching real earliest offsets when needed.

Tests

SparkLogTableReadTest: "Spark Read: split partition by config"

API and Format

Documentation

Yohahaha · 2026-05-07T02:56:02Z

@YannByron

Yohahaha · 2026-05-07T06:14:58Z

@luoyuxia @fresh-borzoni PTAL!

fresh-borzoni

@Yohahaha Ty for the PR, overall LGTM, left minor comments, PTAL

fresh-borzoni · 2026-05-08T15:24:49Z

+        stopOffset: Long,
+        maxRecords: Long): Seq[InputPartition] = {
+      if (
+        startOffset < 0 || stopOffset <= startOffset || stopOffset <= (startOffset + maxRecords)


for the earliest mode we have sentinel -2L, I think it would result in a bug here, since we clamp to 1 split

fresh-borzoni · 2026-05-08T15:37:05Z

 public class BucketOffsetsRetrieverImpl implements OffsetsInitializer.BucketOffsetsRetriever {
    private final Admin flussAdmin;
    private final TablePath tablePath;
+    private final Boolean fetchEarliestOffset;


mb better to use primitive, since it doesn't allow null

fresh-borzoni · 2026-05-08T15:38:18Z

+      ) {
+        return Seq(
+          FlussAppendInputPartition(tableBucket, startOffset, stopOffset)
+            .asInstanceOf[InputPartition])


nit: I think it's redundant

fresh-borzoni · 2026-05-08T15:38:25Z

+        .map {
+          from =>
+            FlussAppendInputPartition(tableBucket, from, math.min(from + step, stopOffset))
+              .asInstanceOf[InputPartition]


fresh-borzoni · 2026-05-08T15:39:36Z

    }
  }
+
+  test("Spark Read: split partition by config") {


The test only checks row order/values, not partition count. Also what about partitioned tables? Earliest mode?

fresh-borzoni · 2026-05-08T15:43:16Z

-                .asInstanceOf[InputPartition]
+              Seq(
+                FlussAppendInputPartition(tableBucket, startOffset, stopOffset)
+                  .asInstanceOf[InputPartition])


split scan partition by conf

75e67cc

Yohahaha marked this pull request as ready for review May 7, 2026 02:52

fresh-borzoni reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Add scan.max.records.per.partition config to split log table input partitions#3260

[spark] Add scan.max.records.per.partition config to split log table input partitions#3260
Yohahaha wants to merge 1 commit intoapache:mainfrom
Yohahaha:spark-split-partition

Yohahaha commented May 7, 2026

Uh oh!

Yohahaha commented May 7, 2026

Uh oh!

Yohahaha commented May 7, 2026

Uh oh!

fresh-borzoni left a comment

Uh oh!

fresh-borzoni May 8, 2026

Uh oh!

fresh-borzoni May 8, 2026

Uh oh!

fresh-borzoni May 8, 2026

Uh oh!

fresh-borzoni May 8, 2026

Uh oh!

fresh-borzoni May 8, 2026

Uh oh!

fresh-borzoni May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yohahaha commented May 7, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Yohahaha commented May 7, 2026

Uh oh!

Yohahaha commented May 7, 2026

Uh oh!

fresh-borzoni left a comment

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni May 8, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni May 8, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni May 8, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni May 8, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni May 8, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants