Skip to content

CI improvemens: General improvements#3984

Merged
BsAtHome merged 10 commits into
LinuxCNC:masterfrom
hdiethelm:ci_improvemens
May 11, 2026
Merged

CI improvemens: General improvements#3984
BsAtHome merged 10 commits into
LinuxCNC:masterfrom
hdiethelm:ci_improvemens

Conversation

@hdiethelm
Copy link
Copy Markdown
Contributor

This is the follow up to #3983

It will be more in depth changes so riskier and will take some time.

There will be a few pushes, including some that should fail on purpose to test if everything works as desired. Just tell me if I abuse the CI to much and I will find a different solution.

It is experimental for now but I need a PR so the CI runs.

@hdiethelm hdiethelm changed the title Ci improvemens: General cleanup Ci improvemens: General improvements Apr 30, 2026
@hdiethelm hdiethelm changed the title Ci improvemens: General improvements CI improvemens: General improvements Apr 30, 2026
@hdiethelm
Copy link
Copy Markdown
Contributor Author

Remove eatmydata tested:
Before: 4h 42m 44s
After: 4h 34m 41s + 15m estimated of a stalled process -> 4h 49m 41s
So no big impact and better readable.

@hdiethelm hdiethelm force-pushed the ci_improvemens branch 3 times, most recently from f59978a to 1921e21 Compare April 30, 2026 07:31
@BsAtHome
Copy link
Copy Markdown
Contributor

Did you see this message at the "Complete job" stage:

Warning: Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v2. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

@hdiethelm
Copy link
Copy Markdown
Contributor Author

Did you see this message at the "Complete job" stage:

Warning: Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v2. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

Yes, commit is already there, I just wait for the last CI to pass before the next push.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

@BsAtHome Feel free to cancel any pipeline that has failed jobs. I think I don't have the access rights to do so.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented Apr 30, 2026

So first success, with 4 CPU's, down to 3h 32
Before: 4h 38m usage / 38m runtime
After: 3h 42m usage / 24m runtime

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented Apr 30, 2026

Now the bigger change will be to run all in prepared dockers or runners. It probably won't reduce the runtime a lot, estimated: 3-5 min. However, it is just nice for the debian to not hammer their package repo just for fun. But let's see how well that works.

@BsAtHome Debian is the main target, right? So running rip-* under a debian container is also fine? At the moment, they run with an ubuntu-24.04 runner.

@BsAtHome
Copy link
Copy Markdown
Contributor

Debian is the primary target, yes.

Nice to see improvement and it is primarily in the independent packages building the documentation. Those were the slowest all the time.

The tests are not going to be significantly faster because they run in sequence. We've been discussing parallel execution, but that requires #2722 to be fully implemented. There are a significant number of issues not addressed in that PR (some noted, others implied, still others to be discovered).

@andypugh
Copy link
Copy Markdown
Collaborator

#3983 is in.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

#3983 is in.

Thanks, re-based on top of master.

Debian is the primary target, yes.

Nice to see improvement and it is primarily in the independent packages building the documentation. Those were the slowest all the time.

The tests are not going to be significantly faster because they run in sequence. We've been discussing parallel execution, but that requires #2722 to be fully implemented. There are a significant number of issues not addressed in that PR (some noted, others implied, still others to be discovered).

It depends what the target is. For local usage, #2722 is the most comfortable way for users. But it can be also done another way:

  • One Build
  • Run multiple tests in parallel, each testing one part. Can also be separated using:
    • Multiple runners (This needs an artifact, the way linuxcnc is built right now, this can be cumbersome bit it might work)
    • Might be multiple namespaces in the same runner. If this works, this can be used also locally. Namespaces is one part how docker separates itself from the host but it can be also used directly. I would have to try it out.

Due to the test runner will only shorten the CI if the package-indep gets below 12min, I see this as low prio for the CI.

@andypugh Objections if I push docker images to the linuxcnc github? They will appear somewhere here: https://github.com/LinuxCNC/linuxcnc/packages I can probably do that in CI without any additional rights. However, I might need you if I mess something up and packages have to be deleted. I will try it in my account first with the free credits but to do something meaningful, I have to do it in the linuxcnc CI.

@BsAtHome
Copy link
Copy Markdown
Contributor

Maybe the curl progress meter can be shut off at download.

@NTULINUX, when do these kernels move to linuxcnc.org? Is there a procedure?

@grandixximo
Copy link
Copy Markdown
Contributor

Opened #3992 as a small stopgap for the firefox snap flake (issue #3991). It adds one apt remove firefox line before each upgrade. If you would prefer the larger CI cleanup in #3984 to land first I can close mine; otherwise rebasing #3984 over my change should be a one-line conflict at most. Whichever lands first works for me.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented May 1, 2026

@grandixximo This is fine for me. The main target here is to not have to do all this installs at all. But this will take some time.

I could extract the --cpu improvements (25% faster package build) and if desired:

  • Actions update (A bit risky, I can not test the release target, but I don't expect that there is an issue)
  • Split into dependency / build / test (I don't see any risks)
  • Remove apt-get upgrade
  • Remove eatmydata

in a separate PR, so this can be already merged while I am busy here.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented May 1, 2026

So, I have a container based prototype running in my gitlab:
https://github.com/hdiethelm/linuxcnc-fork/tree/ci_docker_prototyping
hdiethelm/linuxcnc-fork@ci_improvemens...hdiethelm:linuxcnc-fork:ci_docker_prototyping
The containers are visible here and in the project page, will be similar on the original repo:
https://github.com/hdiethelm?tab=packages&repo_name=linuxcnc-fork
Docker build:
https://github.com/hdiethelm/linuxcnc-fork/actions/runs/25216626662
LinuxCNC rip-and-test:
https://github.com/hdiethelm/linuxcnc-fork/actions/runs/25216626671

Do you think such an approach is viable? If yes, I will try how to automatically update the docker images from time to time and then port all other targets.

Advantage:

  • Faster, rip-and-test uses 3m30 less time, will be even more for the debian package build
  • Will not download any packages except if new dependency's are added until the docker images are updated
  • Also rip-and-test targets could use a matrix to test arm / all Debian variants
  • You can use the build docker images also locally

Disadvantage:

  • Due to preinstalled package, missing dependency's might be discovered later
  • Possibly a bit more difficult to maintain
  • Will use storage for images but depending on your contract, this is for free

I will not yet merge this it this PR. As soon as I do this, packages will most probably appear in the original linuxcnc repo.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

@grandixximo

might want to look at: ROS2, Yocto, buildroot for inspiration

Let's continue the discussion here. LinuxCNC is a bit special due to it need's a ton of depency's to build. The container needed is like 3.4GB in size.

If you find something, just tell me. I had some inspiration from https://github.com/open-webui/open-webui due to I know they build containers from CI and many different manuals / articles.

@BsAtHome
Copy link
Copy Markdown
Contributor

BsAtHome commented May 1, 2026

The size of the container should be acceptable and manageable if it is in a local repo for quick download. If there need to be many different versions, that could be a problem.

However, I have no clue what deals have been made with github. @andypugh, do you know of any deals with github?

BTW, you need to rebase after I merged the snapped firefox killer.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented May 1, 2026

The size of the container should be acceptable and manageable if it is in a local repo for quick download. If there need to be many different versions, that could be a problem.

However, I have no clue what deals have been made with github. @andypugh, do you know of any deals with github?

BTW, you need to rebase after I merged the snapped firefox killer.

Looks like it is no issue due to the docker images are on github. I could not let it be to finalize my prototype. Now there is the same build like before but all using docker containers and I am still not out of free credits:

https://github.com/hdiethelm/linuxcnc-fork/actions/runs/25235809082
https://github.com/hdiethelm/linuxcnc-fork/actions/runs/25234109241

The time is now down by the estimated 3 min / job. The build-arch could be reduced by future 30s due to they install dependency packages (257MB / 268 packages). However, I don't have an idea how to pipe this list back to the docker build yet. And I am not sure if it is worth the effort.

Master CI: 4h 41m / 38m runtime
Without docker: 3h 42m usage / 24m runtime
With docker: 3h01 usage / 21m duration

Nearly a factor of 2 is already nice.

@grandixximo
Copy link
Copy Markdown
Contributor

If you find something, just tell me.

I was thinking auto installation of missing dependency, whenever one is added, with warnings to request addition to the CI.

I thought one of those projects had it, guess I was wrong, I will look again.

@grandixximo
Copy link
Copy Markdown
Contributor

Found something. rust-lang/rust does this nicely: src/ci/docker/run.sh hashes the Dockerfile plus every file it COPYs (plus arch and a manual cache-version int) and uses the hash as the image tag. PR that touches deps flips the tag, registry lookup misses, script rebuilds transparently with --cache-from type=registry warmed by the previous image. No auto-install fallback needed because a stale image cannot be picked up by construction, and the weekly rebuild becomes a cache warm-up rather than a correctness gate.

For linuxcnc the hash inputs would be debian/control plus the Dockerfile. LLVM pairs the same idea with a paths: filter on those files in build-ci-container.yml so the image rebuild and the dep change land atomically in the same PR.

@grandixximo
Copy link
Copy Markdown
Contributor

While you are restructuring the CI flow, would it be in scope to add a working-tree-clean gate after the build steps?

- name: Verify no untracked or modified files after build
  run: |
    if [ -n "$(git status --porcelain)" ]; then
      echo "Build produced untracked or modified files:"
      git status --porcelain
      exit 1
    fi

Motivation: docs/man/.gitignore enumerates ~450 generated man pages by hand. The list goes stale silently when a new component lands without a matching ignore entry. Recent recurrences include demux_generic.9 (fixed in 073f3a5) and output_buffer.9 (currently untracked on master after 3820604).

output_buffer.9 would actually make a convenient test case for the gate: drop the check into the workflow without fixing the gitignore first, and master should fail. Then a one-line addition to docs/man/.gitignore makes it green. Confirms the gate works end-to-end on a real existing oversight.

Happy to send it as a separate PR after this one merges if preferred.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented May 2, 2026

While you are restructuring the CI flow, would it be in scope to add a working-tree-clean gate after the build steps?

Makes sense. But as of now, the linuxcnc folder is quite a mess after build, especially RTAI. It will probably fail. But I can add it as a soft failing step, so more a reminder something should be cleaned up, like cppcheck / shellcheck. It can be changed to hard fail after the full cleanup is done. The same should be done one time for the above tasks.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

Found something. rust-lang/rust does this nicely: src/ci/docker/run.sh...

Good idea but I have to see how I can use this in Gitlab CI. At the moment, there are three places where packages are installed in my docker variant with size: (download/disk MB)

  1. Base dependency's, hash of dockerfile will do it (207/902MB): https://github.com/hdiethelm/linuxcnc-fork/blob/ci_docker_prototyping/.github/docker/Dockerfile#L14
  2. Debian build dependency's, hash of Debian folder will do it (870/2241MB): https://github.com/hdiethelm/linuxcnc-fork/blob/ci_docker_prototyping/.github/docker/Dockerfile#L21
  3. At test install, which file cover these dependency's? Are they automatically generated by the package creation? (257/1158MB): https://github.com/hdiethelm/linuxcnc-fork/blob/ci_docker_prototyping/.github/workflows/ci-docker.yml#L209

Additionally, I left all build-dep steps still in so if a new package is added later, it doesn't fail. However as of now, nothing is installed there due to everything is already in docker.

The 3. step is a bit annoying to pack in docker. I see some options:

  • Find a way to get all packages for install without actually building the package.
  • Build Debian package also before creating docker, so it can be installed with all dep's and then only the linuxcnc removed again.
  • Get the Debian package of an other build and do ^^. But I haven't yet found how to get these from artifacts. I could push them as packages.
  • Save a package list manually created

@hdiethelm
Copy link
Copy Markdown
Contributor Author

@BsAtHome So, I have redone the CI improvements in a series of logical commits with some reasoning why in the commit message. Do you think this is understandable this way?

Are the scripts travis-build-test.sh / travis-install-build-deps.sh used anywhere else than in the CI? I placed now all CI scripts in .github/scripts. If the travis* scripts are unused now, I could delete them.

So this is ready for review.

After this CI has finished, I will follow with a few on purpose failing commits to see if the failures are properly handled.

@hdiethelm hdiethelm force-pushed the ci_improvemens branch 2 times, most recently from b6c38bc to b96df85 Compare May 10, 2026 18:36
@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm added 10 commits May 10, 2026 20:57
It fails sometimes and the build time doesn't increase.
If it fails, the error is:
ERROR: ld.so: object 'libeatmydata.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Retry should not have an effect: Either you are rate limited and it
fails anyway or it succeeds.
No --quiet so you see what is going on.
Ubuntu image from gitlab should be reasonable up to date.
No need to remove firefox any more.
In debian containers, there is only a minimal package set, so we can
upgrade. Especially for sid, the container is not always up to date.
cppcheck is now in test section.
DEBIAN_FRONTEND: noninteractive can be defined once on top.
set -e is not needed in CI, it stops anyway on any error.
Linuxcnc repo is added for everything except sid / bookworm / trixie
which is all what we build, so it can be removed.
All actions updated to latest release.
Checkout: Submodules not needed / fetch-depth 1 is fine for all targets
not needing history.
More steps help debugging and tuning ci.
Fetch not needed, checkout fetches already.
This argument limited the amount of CPU's to 2. Without, we have all 4
CPU's reducing build time.
Less packages -> faster build. Recommends should not be needed.
Scripts are easier to test locally than if the shell code is in the
ci.yml. They are also reusable.
travis-install-build-deps.sh replaced by one script, only used in CI.
@BsAtHome
Copy link
Copy Markdown
Contributor

@BsAtHome So, I have redone the CI improvements in a series of logical commits with some reasoning why in the commit message. Do you think this is understandable this way?

Yes, but it could have been one commit. This is a big change one way or the other. That said, the dev-chain for the changes is nice to follow, so its fine.

Are the scripts travis-build-test.sh / travis-install-build-deps.sh used anywhere else than in the CI? I placed now all CI scripts in .github/scripts. If the travis* scripts are unused now, I could delete them.

Well, I know I use them once in a while...

FWIW, CI is well isolated with all the required scripts under .github/scripts.

So this is ready for review.
After this CI has finished, I will follow with a few on purpose failing commits to see if the failures are properly handled.

When done, remove the draft tag.

@hdiethelm hdiethelm marked this pull request as ready for review May 10, 2026 19:02
@hdiethelm
Copy link
Copy Markdown
Contributor Author

So, the fail checks failed as they should, so it is ready from my side.

I downloaded the trixie indep and amd64 artifacts, extracted them and compared them to master. Most is binary identical and there are no new/missing files in the deb's except:

  • Timestamps
  • All generated images, however they look equal
  • All generated doc's which have a version number inside
  • A few binary's, might be there is also a version number or timestamp inside

There is only one thing I don't yet fully understand. Sometimes, the package-indep sid job failed and I don't know why, for example here: https://github.com/LinuxCNC/linuxcnc/actions/runs/25428177491/job/74587110190

The log is really long and I did not find any relevant error message and I was also not able to reproduce it on my PC.

Today it did not happen until now but there might be a hidden issue. Or it might be also github being github... ;-)

@hdiethelm
Copy link
Copy Markdown
Contributor Author

Are the scripts travis-build-test.sh / travis-install-build-deps.sh used anywhere else than in the CI? I placed now all CI scripts in .github/scripts. If the travis* scripts are unused now, I could delete them.

Well, I know I use them once in a while...

Now that the CI scripts are there and there are dockers in my fork, I started using the dockers and the CI scripts for testing. For example:

podman run --rm -it -v .:/linuxcnc ghcr.io/hdiethelm/linuxcnc/build-container:trixie-amd64-latest
cd linuxcnc/
.github/scripts/build-rip.sh --with-realtime=uspace

This is really nice to debug issues with a Debian variant you don't have a VM for. With some tricks you can even start the UI:

xhost +"local:podman@linuxcnc"
podman run --net=host -e DISPLAY=$DISPLAY --rm -it -v .:/linuxcnc ghcr.io/hdiethelm/linuxcnc/build-container:trixie-amd64-latest
cd linuxcnc/
chown -R linuxcnc:linuxcnc .
su linuxcnc
.github/scripts/build-rip.sh --with-realtime=uspace
scripts/linuxcnc

Note: chown to linuxcnc user messes up your access rights under the host user for this directory. You need to fix them with sudo chown afterwards.

@BsAtHome
Copy link
Copy Markdown
Contributor

One problem is that the indep CI is using "make -j4" and is missing the -O option to keep the messages from each instance together. Now they may be all gobbled up.

It always seems to happen in "make docs". This has always been a bit fragile. Not sure where it comes from, but one of the usual suspects is a bad translation that trips something somewhere. When you look through the messages, then you can discover all sorts of problems that are not handled nor fixed. Fixing the docs build is another huge undertaking that is due.

@BsAtHome
Copy link
Copy Markdown
Contributor

BTW, one thing we need to get rid of is using (Xe)TeX for docs.

@BsAtHome BsAtHome merged commit f6bfe9c into LinuxCNC:master May 11, 2026
15 checks passed
@hdiethelm
Copy link
Copy Markdown
Contributor Author

Thanks for merging!

For the docker variant: Just tell me if and when you like to do this.

The code is there and it is easy to merge changes over in case the CI files get changed on master, so no urgency from my side. The whole main CI structure is pretty much identical for the docker variant, it just uses containers.

Time wise, the docker variant has no big advantages. It just loads the gitlab servers instead of the debian mirrors, so it is nicer on the debian project.

If you decide against the docker variant, fine for me. It was a good learning anyway... ;-)

@BsAtHome
Copy link
Copy Markdown
Contributor

You are welcome, but it is all others who have to say thanks. You did a lot of work and now it works a lot faster.

The docker variant needs to be integrated into the project in a different way. We need a separate repo to auto-build images when specific updates are done (not sure which trigger). It may not be time-wise be better, but it does make us more independent of provided images and we can tune/test the way we'd like. The docker variant may also solve some of the (github) instabilities we see when the default images are run.

But for now, I think we should enjoy the current changes for a moment before going deeper into the CI pit ;-)

@grandixximo
Copy link
Copy Markdown
Contributor

Many many thanks for the time spent on the CI, the effort does not go unnoticed, keep it up 💪

@hdiethelm
Copy link
Copy Markdown
Contributor Author

You are welcome. One of this side projects that took longer than estimated... ;-) I know people that complain about CI's taking 5-10 min and this one took 40 min so I took a look...

Time wise, the doc part is still the blocking path but this is not CI related. Way's to improve:

  • Buy runners with more cores at gitlab
  • Improve the doc build
  • Only build doc when needed like:
    • Changes detected (Hard to do with the actual way of collecting doc's from all over the place)
    • After merge to master (Will detect doc issues late)
    • Special tag
    • ...

But let it be for a while and see how the actual version behaves before doing more change. I also want to implement xenomai net and create some chips on the CNC.

@BsAtHome
Copy link
Copy Markdown
Contributor

Improve the doc build

This is the way to proceed :-)

All other options are just dealing with symptoms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants