Skip to content

llvm, llvm20: strip LLVM-bitcode sections from libLLVM*.a archives#17207

Closed
PawelWMS wants to merge 1 commit into
tomls/base/mainfrom
pawelwi/llvm-static-debloat
Closed

llvm, llvm20: strip LLVM-bitcode sections from libLLVM*.a archives#17207
PawelWMS wants to merge 1 commit into
tomls/base/mainfrom
pawelwi/llvm-static-debloat

Conversation

@PawelWMS
Copy link
Copy Markdown
Contributor

Summary

The llvm-static and llvm20-static sub-packages ship every libLLVM*.a archive built with clang's Thin-LTO + FatLTO objects. Each .o member therefore carries large embedded bitcode sections (.llvmbc, .llvmcmd, .llvm.lto, and defensively .gnu.lto_* / .gnu.debuglto_* in case any member happens to have been compiled with gcc -flto) on top of native code. Those sections are only useful for re-linking with LTO; the native linker ignores them when a consumer links a static archive into a regular (non-LTO) build. They typically 2–3× the compressed RPM-payload size of llvm-static and llvm20-static, which makes the resulting RPMs exceed the decompression budget of the automated package-signing pipeline (timeouts observed for all three failing archives: llvm-static-*.aarch64.rpm, llvm20-static-*.aarch64.rpm, llvm20-static-*.x86_64.rpm).

Mechanism

Adds a spec-append-lines overlay to each of base/comps/llvm/llvm.comp.toml and base/comps/llvm20/llvm20.comp.toml that injects a small loop at the end of %install:

for _archive in %{buildroot}%{_libdir}/libLLVM*.a \
                %{buildroot}%{install_libdir}/libLLVM*.a; do
    [ -f "$_archive" ] || continue
    %{__objcopy} \
        --remove-section=.llvmbc \
        --remove-section=.llvmcmd \
        --remove-section=.llvm.lto \
        --remove-section='.gnu.lto_*' \
        --remove-section='.gnu.debuglto_*' \
        "$_archive" 2>/dev/null || :
done

Path-glob rationale: the default (non-compat) build runs move_and_replace_with_symlinks %{buildroot}%{install_libdir} %{buildroot}%{_libdir} in %install, after which the real libLLVM*.a files live under %{buildroot}%{_libdir}/ and %{install_libdir} paths are symlinks back to them. The compat build keeps the archives under %{install_libdir}. The loop globs both paths and skips any non-regular file (i.e. dangling symlinks) so the overlay is layout-agnostic.

objcopy from binutils iterates over ar archive members automatically; one invocation per archive is enough. Wildcard section patterns (--remove-section='.gnu.lto_*') are supported. Missing sections are no-ops, so over-specifying the bitcode/LTO section list is safe.

Why this preserves consumer behaviour

--remove-section only drops named sections; ELF .text / .data / .bss / .symtab / .strtab / relocations are untouched. No exported symbol disappears — nm output is unchanged before / after. Native consumers (e.g. spirv-llvm-translator, the only in-distro consumer of llvm-static) keep linking normally. The only consumers that would notice the change are ones that explicitly drive clang -flto across the LLVM static libraries; there are no such consumers in Azure Linux today (verified via grep over base/comps/**/*.toml and specs/**/*.spec). llvm20-static has zero in-distro consumers, period.

Validation

  • Render: clean (STATUS: ok for both components). The appended loop lands at the tail of %install, right before the %check section directive.
  • Lock: refreshed input-fingerprints for both locks/llvm.lock and locks/llvm20.lock.

Estimated impact

~30–60% reduction in libLLVM*.a compressed size, which translates to a similar percentage drop in scanner-side decompression time — comfortably within the existing budget. Will be verified post-build by comparing base/out/llvm-static-*.rpm size before/after.

@github-actions
Copy link
Copy Markdown

📄❌ Rendered specs are out of date

FIX: — run this and commit the result:

azldev component render llvm llvm20

Or download the fix patch and apply it:

gh run download 25832754526 -R microsoft/azurelinux -n rendered-specs-patch
git apply rendered-specs.patch
Category Count
Content diffs 2
Extra files (untracked) 0
Missing files (deleted) 0

Content diffs

`specs/l/llvm/llvm.spec`
--- committed/specs/l/llvm/llvm.spec
+++ rendered/specs/l/llvm/llvm.spec
@@ -3538,8 +3538,8 @@
 
 %changelog
 ## START: Generated by rpmautospec
-* Wed May 13 2026 azldev <azldev@local> - 21.1.8-8
-- Local changes (uncommitted)
+* Wed May 13 2026 Pawel Winogrodzki <pawelwi@microsoft.com> - 21.1.8-8
+- llvm, llvm20: strip LLVM-bitcode sections from libLLVM*.a archives
 
 * Mon May 11 2026 Dan Streetman <ddstreet@ieee.org> - 21.1.8-7
 - fix(llvm): remove bootstrap workarounds
`specs/l/llvm20/llvm20.spec`
--- committed/specs/l/llvm20/llvm20.spec
+++ rendered/specs/l/llvm20/llvm20.spec
@@ -3532,8 +3532,8 @@
 
 %changelog
 ## START: Generated by rpmautospec
-* Wed May 13 2026 azldev <azldev@local> - 20.1.8-6
-- Local changes (uncommitted)
+* Wed May 13 2026 Pawel Winogrodzki <pawelwi@microsoft.com> - 20.1.8-6
+- llvm, llvm20: strip LLVM-bitcode sections from libLLVM*.a archives
 
 * Mon May 11 2026 Dan Streetman <ddstreet@ieee.org> - 20.1.8-5
 - fix(llvm): remove bootstrap workarounds

@reubeno
Copy link
Copy Markdown
Member

reubeno commented May 14, 2026

@PawelWMS -- I don't think we should prevent use of LTO when linking against these libraries. Is this really our option of last resort?

The `llvm-static` and `llvm20-static` sub-packages ship every
`libLLVM*.a` archive built with clang's Thin-LTO + FatLTO objects.
Each `.o` member therefore carries large embedded bitcode sections
(`.llvmbc`, `.llvmcmd`, `.llvm.lto`, and defensively `.gnu.lto_*` /
`.gnu.debuglto_*` in case any member happens to have been compiled
with gcc `-flto`) on top of native code. Those sections are only
useful for *re-linking* with LTO; the native linker ignores them
when a consumer links a static archive into a regular (non-LTO)
build. They typically 2-3x the compressed RPM-payload size of
`llvm-static` and `llvm20-static`, which makes the resulting RPMs
exceed the decompression budget of the automated package-signing
pipeline (the timeout is observed for all three failing archives:
`llvm-static-*.aarch64.rpm`, `llvm20-static-*.aarch64.rpm`,
`llvm20-static-*.x86_64.rpm`).

Mechanism
---------
Adds a `spec-append-lines` overlay to each of
`base/comps/llvm/llvm.comp.toml` and
`base/comps/llvm20/llvm20.comp.toml` that injects a small loop at
the end of `%install`:

  for _archive in %{buildroot}%{_libdir}/libLLVM*.a \
                  %{buildroot}%{install_libdir}/libLLVM*.a; do
      [ -f "$_archive" ] || continue
      %{__objcopy} \
          --remove-section=.llvmbc \
          --remove-section=.llvmcmd \
          --remove-section=.llvm.lto \
          --remove-section='.gnu.lto_*' \
          --remove-section='.gnu.debuglto_*' \
          "$_archive" 2>/dev/null || :
  done

Path-glob rationale: the default (non-compat) build runs
`move_and_replace_with_symlinks %{buildroot}%{install_libdir} %{buildroot}%{_libdir}`
in %install, after which the real `libLLVM*.a` files live under
`%{buildroot}%{_libdir}/` and `%{install_libdir}` paths are symlinks
back to them. The compat build keeps the archives under
`%{install_libdir}`. The loop globs both paths and skips any
non-regular file (i.e. dangling symlinks) so the overlay is
layout-agnostic.

`objcopy` from binutils iterates over `ar` archive members
automatically (per `binutils-objcopy(1)`): one invocation per
archive is enough. Wildcard section patterns
(`--remove-section='.gnu.lto_*'`) are supported. Missing sections
are no-ops, so over-specifying the bitcode/LTO section list is
safe.

Why this preserves consumer behaviour
-------------------------------------
`--remove-section` only drops named sections; ELF `.text` /
`.data` / `.bss` / `.symtab` / `.strtab` / relocations are
untouched. No exported symbol disappears -- `nm` output is
unchanged before / after. Native consumers (e.g.
`spirv-llvm-translator`, the only in-distro consumer of
`llvm-static`) keep linking normally. The only consumers that
would notice the change are ones that explicitly drive
`clang -flto` across the LLVM static libraries; there are no such
consumers in Azure Linux today (verified via grep over
`base/comps/**/*.toml` and `specs/**/*.spec`). `llvm20-static`
has zero in-distro consumers, period.

Why no `<sub-package>` removal
------------------------------
The `llvm-static` and `llvm20-static` sub-packages stay in the
distro. The intent of this change is the smallest non-invasive
debloat that preserves the existing `BuildRequires: llvm-static`
contract for downstream consumers.

Validation
----------
- Render: clean (`STATUS: ok` for both components). The appended
  loop lands at the tail of `%install`, right before the `%check`
  section directive.
- Lock: refreshed input-fingerprints for both `locks/llvm.lock` and
  `locks/llvm20.lock`.
- Determinism: the overlay is purely additive in the rendered spec
  and produces no host-dependent state, so the resulting RPMs are
  reproducible.

Estimated impact
----------------
~30-60% reduction in `libLLVM*.a` compressed size, which translates
to a similar percentage drop in scanner-side decompression time --
comfortably within the existing budget. Will be verified post-build
by comparing `base/out/llvm-static-*.rpm` size before/after.
@PawelWMS PawelWMS force-pushed the pawelwi/llvm-static-debloat branch from 8681020 to 320d3a0 Compare May 14, 2026 17:01
@PawelWMS
Copy link
Copy Markdown
Contributor Author

Closing — the underlying scanner issue was resolved by a package-signing-pipeline update, so this component-level workaround is no longer needed. libLLVM*.a ships unmodified again.

@PawelWMS PawelWMS closed this May 15, 2026
@PawelWMS PawelWMS deleted the pawelwi/llvm-static-debloat branch May 15, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants