Skip to content

Remove the ldk-node integration build from CI until we have a better process#4564

Draft
joostjager wants to merge 1 commit intolightningdevkit:mainfrom
joostjager:remove-ldk-node-integration-test
Draft

Remove the ldk-node integration build from CI until we have a better process#4564
joostjager wants to merge 1 commit intolightningdevkit:mainfrom
joostjager:remove-ldk-node-integration-test

Conversation

@joostjager
Copy link
Copy Markdown
Contributor

@joostjager joostjager commented Apr 14, 2026

The LDK Node Integration Tests workflow is not currently providing a useful CI signal on main.

Since January 1, 2026, its decisive-run pass rate was 29.05% (43/148), with the first completed run in that window on January 5, 2026. At that level, a failure is not very informative, and it distracts from otherwise green builds.

Proposed change

Remove the ldk-node integration workflow from this repository until there is a process that makes it a trustworthy signal again.

This check likely belongs in ldk-node CI instead, since that is also where the fix often needs to be made.

This can always be reverted later if needed.

Before reintroducing it

  • Clear ownership for keeping the integration green.
  • A documented process for coordinating rust-lightning and ldk-node changes.
  • A reliability threshold that the job is expected to meet before returning to the main CI path.

ldk-node-ci-history-since-2026-01-01

@ldk-reviews-bot
Copy link
Copy Markdown

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

Copy link
Copy Markdown
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that we don't have a process for fixing it and usually (at least recently) people are opening the fixes relatively quick. However, they simply need time to land on their own right, which is why the CI is often failing.

We introduced the CI job for good reason, exactly to communicate API breakage. So just dropping it again doesn't make the problem go away, it will just reduce awareness (and hence motivation for people to act on it).

I agree we might need a better process for fixing it, but just dropping the CI job and going back to the previous status quo is not an improvement, IMO.

@joostjager
Copy link
Copy Markdown
Contributor Author

I think the problem is simply that the time to land a fix in ldk-node is too long for this to be a useful rust-lightning CI signal, because the check can stay red for an extended period even when the follow-up is already understood, which makes it mostly noise in otherwise green builds; running the same check in ldk-node would preserve the integration signal while breaking the build in the repository where the fix usually needs to land.

@tnull
Copy link
Copy Markdown
Contributor

tnull commented Apr 14, 2026

running the same check in ldk-node would preserve the integration signal while breaking the build in the repository where the fix usually needs to land.

Well, but it would send the signal where the contributors that need to receive it are likely not listening.

@joostjager
Copy link
Copy Markdown
Contributor Author

I think the current setup has the same problem: the check in rust-lightning also often ends up where the people who need to fix it are not the ones listening, except it additionally adds noise to rust-lightning CI. The status history supports that, it stays red for long stretches.

More broadly, I do not think we should keep low-signal jobs in main CI, regardless of whether the cause is technical or process-related.

@TheBlueMatt
Copy link
Copy Markdown
Collaborator

Then we should probably move to automatically opening an issue and tagging the original PR author when ldk node breaks after a merge, no? That seems like a process that is maybe more likely to work.

@joostjager
Copy link
Copy Markdown
Contributor Author

That may help with awareness, but I do not think it fully solves the underlying problem either, because the breakage can stack up while fixes are pending, so it is still not a particularly clean or timely signal. In any case, I think this false flag should be moved out of the rust-lightning repo, even if we want a separate process for notifying authors when ldk-node breaks after a merge.

@tnull
Copy link
Copy Markdown
Contributor

tnull commented Apr 14, 2026

That may help with awareness, but I do not think it fully solves the underlying problem either, because the breakage can stack up while fixes are pending, so it is still not a particularly clean or timely signal. In any case, I think this false flag should be moved out of the rust-lightning repo, even if we want a separate process for notifying authors when ldk-node breaks after a merge.

Yup, just from my experience I can report that I usually ignore #4511, as 99% I'm mentioned there it's pre-existing breakage.

@TheBlueMatt
Copy link
Copy Markdown
Collaborator

I definitely mentally track it - if there's existing breakage I look at it once every few days to make sure its still just the fuzz job . Note that 4511 does't fire for ldk node breakage.

@joostjager
Copy link
Copy Markdown
Contributor Author

I think that is different, though, because those notifications are about breakage within the same repo and for checks that are not expected to fail very often, they mostly just exercise a broader platform matrix. ldk-node breakage is a cross-repo API signal that currently happens often enough that it no longer reads as exceptional, which is why I do not think it belongs in the main rust-lightning CI surface.

@joostjager
Copy link
Copy Markdown
Contributor Author

I think the key question is whether we have a concrete path to making this a high-signal CI job again. From my perspective, the process so far has not achieved that, and the diagram makes that pretty clear. If the view is that ldk-node coverage should stay in rust-lightning, what specific process changes and reliability target would make it a trustworthy signal again, and on what timeline?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants