Skip to content

ci: run PR builds in fork and mirror status to apache repo #4976

Open
Ma77Ball wants to merge 3 commits intoapache:mainfrom
Ma77Ball:feat/auto-ci-run
Open

ci: run PR builds in fork and mirror status to apache repo #4976
Ma77Ball wants to merge 3 commits intoapache:mainfrom
Ma77Ball:feat/auto-ci-run

Conversation

@Ma77Ball
Copy link
Copy Markdown
Contributor

@Ma77Ball Ma77Ball commented May 7, 2026

What changes were proposed in this PR?

Moves PR builds off the Apache repo and into the contributor's fork, so PRs from forks no longer need first-time-contributor approval and don't burn Apache's GitHub Actions minutes.

  • Adds fork-ci.yml that runs the full build matrix and license header check on every push to a branch in the contributor's fork. It is a no-op when run inside apache/texera.
  • Adds notify_test_workflow.yml that posts a Build status on the PR and links it to the matching fork CI run.
  • Adds update_build_status.yml that keeps the Build status in sync with the fork run as it progresses.
  • Removes the pull_request trigger from required-checks.yml and check-header.yml so they no longer duplicate fork CI on the Apache side.
  • Switches automatic-email-notif-on-ddl-change.yml from a pull_request: closed trigger to a push: main trigger so it stops queuing for approval on every fork PR.
  • Updates .asf.yaml so the required status checks are Build (mirrored from fork CI) and Validate PR title.

Any related issues, documentation, discussions?

Closes: #4290

How was this PR tested?

Tested manually on a fork:

  • Pushed branches and confirmed the full matrix ran in the fork.
  • Confirmed the Build status appears on the PR and updates to success or failure based on the fork run.
  • Confirmed pushes to main, release/**, and ci-enable/** in a fork are correctly skipped with a clear PR comment.
  • Confirmed required-checks.yml still runs on push to main, release/**, and ci-enable/** in the Apache repo.

Was this PR authored or co-authored using generative AI tooling?

Co-Authored with Claude Opus 4.7 in compliance with ASF

@github-actions github-actions Bot added feature ci changes related to CI labels May 7, 2026
@Ma77Ball
Copy link
Copy Markdown
Contributor Author

Ma77Ball commented May 7, 2026

@Yicong-Huang, @aglinxinyuan, and @chenlica, please review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures CI so that pull request builds run in contributors’ forks (avoiding first-time-contributor approval and conserving apache/texera Actions minutes) while mirroring results back to the PR via a single required Build status.

Changes:

  • Add a fork-owned CI workflow (fork-ci.yml) plus PR-side workflows to create and continuously update a Build commit status that links to the fork run.
  • Stop running duplicate PR CI in the Apache repo by removing pull_request triggers from existing workflows.
  • Adjust post-merge automation and ASF branch protection to key off the mirrored Build status.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
.github/workflows/fork-ci.yml Runs the full build matrix + header check in contributor forks (no-op in apache/texera).
.github/workflows/notify_test_workflow.yml On PR updates, posts/updates a Build commit status and PR comments pointing to the fork CI run (or explaining failure).
.github/workflows/update_build_status.yml Periodically/after notify completion, syncs the Build status to match fork run progress.
.github/workflows/required-checks.yml Removes PR triggering and ensures the base repo build matrix runs only for non-PR events.
.github/workflows/check-header.yml Stops running license-header checks on PR events in the base repo.
.github/workflows/automatic-email-notif-on-ddl-change.yml Moves DDL email notification to a push: main trigger scoped to sql/updates/**.
.asf.yaml Updates required status checks to Build and Validate PR title.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -23,13 +23,6 @@ on:
- 'ci-enable/**'
- 'main'
- 'release/**'
Comment on lines +116 to +127
// Failure scenarios: red ✗ Build status, with target_url pointing
// somewhere useful for the committer to dig in. The detailed
// explanation lives in the postOnceComment on the PR conversation.
async function failCheck(title, target_url) {
await setBuildStatus('failure', title, target_url || notifyRunUrl)
core.setFailed(title)
}

// Success scenario (non-fork PR): leaves the workflow green.
async function passCheck(title, target_url) {
await setBuildStatus('success', title, target_url || notifyRunUrl)
}
Comment on lines +20 to +53
on:
workflow_run:
workflows: ["On pull request update"]
types: [completed]
schedule:
- cron: "*/5 * * * *"
workflow_dispatch:

jobs:
update:
name: Update build status
runs-on: ubuntu-latest
# workflow_run mode polls up to 60 min waiting for fork CI completion.
# Other modes (cron, workflow_dispatch) finish in seconds.
timeout-minutes: 65
permissions:
actions: read
statuses: write
steps:
- name: "Update build status"
uses: actions/github-script@v8
env:
TRIGGER_EVENT: ${{ github.event_name }}
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const isLongPoll = process.env.TRIGGER_EVENT === 'workflow_run'
const maxIterations = isLongPoll ? 120 : 1 // 120 * 30s = 60 min
const sleepMs = 30000
const statusContext = 'Build'

console.log('=== update_build_status: ' + new Date().toISOString() + ' ===')
console.log('trigger=' + process.env.TRIGGER_EVENT + ' mode=' + (isLongPoll ? 'long-poll up to ' + maxIterations + ' iters' : 'single pass'))

Comment on lines +73 to +79
- name: Find added SQL update file
if: steps.pr.outputs.skip == 'false'
id: get_sql_file
run: |
FILE=$(git diff --name-only --diff-filter=A \
${{ github.event.pull_request.base.sha }} \
${{ github.event.pull_request.merge_commit_sha }} \
-- 'sql/updates/')
FILE=$(git diff --name-only --diff-filter=A HEAD~1 HEAD -- 'sql/updates/')
echo "sql_file=$FILE" >> $GITHUB_OUTPUT

Comment on lines +174 to +187
await postOnceComment(
'<!-- fork-ci-not-applicable -->',
':information_source: **Fork CI is not applicable to this PR.**\n' +
'\n' +
'This PR is opened from a branch inside `' + baseRepo + '` itself, not from a fork. ' +
'Fork CI is the system that runs the full build matrix in *contributor forks* and surfaces the result on PRs to `apache/texera`. ' +
'Since there\'s no fork involved here, the `Build` status has been auto-passed.\n' +
'\n' +
'In-tree branches like this one are gated by the **Required Checks** status check (which runs builds directly in `' + baseRepo + '`).\n'
)
await passCheck(
'Fork CI not applicable (in-tree PR)',
context.serverUrl + '/' + baseRepo + '/actions/workflows/required-checks.yml'
)
# Other modes (cron, workflow_dispatch) finish in seconds.
timeout-minutes: 65
permissions:
actions: read
Copy link
Copy Markdown
Contributor

@aglinxinyuan aglinxinyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two findings I think aren't already covered by Copilot's review — see inline. The bigger items Copilot caught (backport regression, in-tree PR auto-pass with no CI, missing pull-requests: read, failCheck marking notify run as failed, DDL diff edge cases) are all worth addressing too.

// points at the PR or notify run — never at the fork run.
if (current && current.state === 'failure') {
const isForkTarget = current.target_url && current.target_url.includes('/' + pr.head.repo.full_name + '/actions/runs/')
if (!isForkTarget) continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sticky-failure bug: this skip rule prevents recovery from notify's "detection failed" path.

In notify_test_workflow.yml line 337-338, the "Fork CI run not detected" path sets:

const forkActionsUrl = 'https://github.com/' + headRepo + '/actions'
await failCheck('Fork CI run not detected for ' + head_sha.substring(0, 8), forkActionsUrl)

That target_url is the fork's Actions tab — it has no /runs/<id>/, so isForkTarget here is false, so the continue fires and update never re-checks this PR. Even if the fork CI run eventually registers and completes successfully, the Build status stays red until the contributor force-pushes.

This breaks the "5-min cron picks up late runs" recovery path that the 60s detection budget in notify implicitly assumes. Under heavy GitHub Actions load, queued-run registration past 60s is plausible.

Fix options:

  • Use a sentinel target_url for detection-failed that the updater treats as "keep looking" (e.g. point at .../actions/workflows/fork-ci.yml, which isForkTarget would still skip but a new isLookAgainTarget check would catch).
  • Or special-case detection-failed by description (current.description.startsWith('Fork CI run not detected')) — distinct from the genuinely terminal failures (blocked branch, fork inaccessible) the comment is justifying.

Comment on lines +55 to +60
run_frontend: true
run_amber: true
run_amber_integration: true
run_platform: true
run_python: true
run_agent_service: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding all run_*: true bypasses the label-driven CI gating documented in AGENTS.md → "CI labels & gating" and implemented by LABEL_STACKS in required-checks.yml.

Before this PR, a docs-only PR (label union → empty) skipped every build stack; a frontend-only PR ran only frontend; etc. After this PR, fork CI runs the full matrix on every push regardless of labels. Apache's runner budget is unaffected (public-fork minutes are free), but this means slower PR feedback for contributors and wasted compute on every docs-only change.

Two ways to keep parity:

  1. Mirror the LABEL_STACKS precheck logic into fork-ci.yml (re-fetch PR labels for head_ref via the GitHub API on push, set the run_* outputs accordingly). The fork has read access to PR labels on the upstream repo.
  2. Accept the regression for now and update the AGENTS.md "CI labels & gating" section to reflect that label gating now applies only to push events / backports, not PR builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-run CI build workflow for all PRs

3 participants