CI improvemens: General improvements#3984
Conversation
|
Remove eatmydata tested: |
f59978a to
1921e21
Compare
|
Did you see this message at the "Complete job" stage: |
Yes, commit is already there, I just wait for the last CI to pass before the next push. |
|
@BsAtHome Feel free to cancel any pipeline that has failed jobs. I think I don't have the access rights to do so. |
|
Now the bigger change will be to run all in prepared dockers or runners. It probably won't reduce the runtime a lot, estimated: 3-5 min. However, it is just nice for the debian to not hammer their package repo just for fun. But let's see how well that works. @BsAtHome Debian is the main target, right? So running rip-* under a debian container is also fine? At the moment, they run with an ubuntu-24.04 runner. |
|
Debian is the primary target, yes. Nice to see improvement and it is primarily in the independent packages building the documentation. Those were the slowest all the time. The tests are not going to be significantly faster because they run in sequence. We've been discussing parallel execution, but that requires #2722 to be fully implemented. There are a significant number of issues not addressed in that PR (some noted, others implied, still others to be discovered). |
|
#3983 is in. |
c617459 to
0c9aaf1
Compare
Thanks, re-based on top of master.
It depends what the target is. For local usage, #2722 is the most comfortable way for users. But it can be also done another way:
Due to the test runner will only shorten the CI if the package-indep gets below 12min, I see this as low prio for the CI. @andypugh Objections if I push docker images to the linuxcnc github? They will appear somewhere here: https://github.com/LinuxCNC/linuxcnc/packages I can probably do that in CI without any additional rights. However, I might need you if I mess something up and packages have to be deleted. I will try it in my account first with the free credits but to do something meaningful, I have to do it in the linuxcnc CI. |
|
Maybe the curl progress meter can be shut off at download. @NTULINUX, when do these kernels move to linuxcnc.org? Is there a procedure? |
|
Opened #3992 as a small stopgap for the firefox snap flake (issue #3991). It adds one |
|
@grandixximo This is fine for me. The main target here is to not have to do all this installs at all. But this will take some time. I could extract the --cpu improvements (25% faster package build) and if desired:
in a separate PR, so this can be already merged while I am busy here. |
|
So, I have a container based prototype running in my gitlab: Do you think such an approach is viable? If yes, I will try how to automatically update the docker images from time to time and then port all other targets. Advantage:
Disadvantage:
I will not yet merge this it this PR. As soon as I do this, packages will most probably appear in the original linuxcnc repo. |
Let's continue the discussion here. LinuxCNC is a bit special due to it need's a ton of depency's to build. The container needed is like 3.4GB in size.
If you find something, just tell me. I had some inspiration from https://github.com/open-webui/open-webui due to I know they build containers from CI and many different manuals / articles. |
|
The size of the container should be acceptable and manageable if it is in a local repo for quick download. If there need to be many different versions, that could be a problem. However, I have no clue what deals have been made with github. @andypugh, do you know of any deals with github? BTW, you need to rebase after I merged the snapped firefox killer. |
Looks like it is no issue due to the docker images are on github. I could not let it be to finalize my prototype. Now there is the same build like before but all using docker containers and I am still not out of free credits: https://github.com/hdiethelm/linuxcnc-fork/actions/runs/25235809082 The time is now down by the estimated 3 min / job. The build-arch could be reduced by future 30s due to they install dependency packages (257MB / 268 packages). However, I don't have an idea how to pipe this list back to the docker build yet. And I am not sure if it is worth the effort. Master CI: 4h 41m / 38m runtime Nearly a factor of 2 is already nice. |
I was thinking auto installation of missing dependency, whenever one is added, with warnings to request addition to the CI. I thought one of those projects had it, guess I was wrong, I will look again. |
|
Found something. rust-lang/rust does this nicely: src/ci/docker/run.sh hashes the Dockerfile plus every file it For linuxcnc the hash inputs would be |
|
While you are restructuring the CI flow, would it be in scope to add a working-tree-clean gate after the build steps? - name: Verify no untracked or modified files after build
run: |
if [ -n "$(git status --porcelain)" ]; then
echo "Build produced untracked or modified files:"
git status --porcelain
exit 1
fiMotivation:
Happy to send it as a separate PR after this one merges if preferred. |
Makes sense. But as of now, the linuxcnc folder is quite a mess after build, especially RTAI. It will probably fail. But I can add it as a soft failing step, so more a reminder something should be cleaned up, like cppcheck / shellcheck. It can be changed to hard fail after the full cleanup is done. The same should be done one time for the above tasks. |
Good idea but I have to see how I can use this in Gitlab CI. At the moment, there are three places where packages are installed in my docker variant with size: (download/disk MB)
Additionally, I left all build-dep steps still in so if a new package is added later, it doesn't fail. However as of now, nothing is installed there due to everything is already in docker. The 3. step is a bit annoying to pack in docker. I see some options:
|
|
@BsAtHome So, I have redone the CI improvements in a series of logical commits with some reasoning why in the commit message. Do you think this is understandable this way? Are the scripts travis-build-test.sh / travis-install-build-deps.sh used anywhere else than in the CI? I placed now all CI scripts in .github/scripts. If the travis* scripts are unused now, I could delete them. So this is ready for review. After this CI has finished, I will follow with a few on purpose failing commits to see if the failures are properly handled. |
b6c38bc to
b96df85
Compare
|
Fail build&doc: https://github.com/LinuxCNC/linuxcnc/actions/runs/25636338081 |
It fails sometimes and the build time doesn't increase. If it fails, the error is: ERROR: ld.so: object 'libeatmydata.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Retry should not have an effect: Either you are rate limited and it fails anyway or it succeeds. No --quiet so you see what is going on.
Ubuntu image from gitlab should be reasonable up to date. No need to remove firefox any more. In debian containers, there is only a minimal package set, so we can upgrade. Especially for sid, the container is not always up to date.
cppcheck is now in test section.
DEBIAN_FRONTEND: noninteractive can be defined once on top. set -e is not needed in CI, it stops anyway on any error. Linuxcnc repo is added for everything except sid / bookworm / trixie which is all what we build, so it can be removed.
All actions updated to latest release. Checkout: Submodules not needed / fetch-depth 1 is fine for all targets not needing history. More steps help debugging and tuning ci. Fetch not needed, checkout fetches already.
This argument limited the amount of CPU's to 2. Without, we have all 4 CPU's reducing build time.
Less packages -> faster build. Recommends should not be needed.
Scripts are easier to test locally than if the shell code is in the ci.yml. They are also reusable. travis-install-build-deps.sh replaced by one script, only used in CI.
Yes, but it could have been one commit. This is a big change one way or the other. That said, the dev-chain for the changes is nice to follow, so its fine.
Well, I know I use them once in a while... FWIW, CI is well isolated with all the required scripts under .github/scripts.
When done, remove the draft tag. |
|
So, the fail checks failed as they should, so it is ready from my side. I downloaded the trixie indep and amd64 artifacts, extracted them and compared them to master. Most is binary identical and there are no new/missing files in the deb's except:
There is only one thing I don't yet fully understand. Sometimes, the package-indep sid job failed and I don't know why, for example here: https://github.com/LinuxCNC/linuxcnc/actions/runs/25428177491/job/74587110190 The log is really long and I did not find any relevant error message and I was also not able to reproduce it on my PC. Today it did not happen until now but there might be a hidden issue. Or it might be also github being github... ;-) |
Now that the CI scripts are there and there are dockers in my fork, I started using the dockers and the CI scripts for testing. For example: podman run --rm -it -v .:/linuxcnc ghcr.io/hdiethelm/linuxcnc/build-container:trixie-amd64-latest
cd linuxcnc/
.github/scripts/build-rip.sh --with-realtime=uspaceThis is really nice to debug issues with a Debian variant you don't have a VM for. With some tricks you can even start the UI: xhost +"local:podman@linuxcnc"
podman run --net=host -e DISPLAY=$DISPLAY --rm -it -v .:/linuxcnc ghcr.io/hdiethelm/linuxcnc/build-container:trixie-amd64-latest
cd linuxcnc/
chown -R linuxcnc:linuxcnc .
su linuxcnc
.github/scripts/build-rip.sh --with-realtime=uspace
scripts/linuxcncNote: chown to linuxcnc user messes up your access rights under the host user for this directory. You need to fix them with sudo chown afterwards. |
|
One problem is that the indep CI is using "make -j4" and is missing the -O option to keep the messages from each instance together. Now they may be all gobbled up. It always seems to happen in "make docs". This has always been a bit fragile. Not sure where it comes from, but one of the usual suspects is a bad translation that trips something somewhere. When you look through the messages, then you can discover all sorts of problems that are not handled nor fixed. Fixing the docs build is another huge undertaking that is due. |
|
BTW, one thing we need to get rid of is using (Xe)TeX for docs. |
|
Thanks for merging! For the docker variant: Just tell me if and when you like to do this. The code is there and it is easy to merge changes over in case the CI files get changed on master, so no urgency from my side. The whole main CI structure is pretty much identical for the docker variant, it just uses containers. Time wise, the docker variant has no big advantages. It just loads the gitlab servers instead of the debian mirrors, so it is nicer on the debian project. If you decide against the docker variant, fine for me. It was a good learning anyway... ;-) |
|
You are welcome, but it is all others who have to say thanks. You did a lot of work and now it works a lot faster. The docker variant needs to be integrated into the project in a different way. We need a separate repo to auto-build images when specific updates are done (not sure which trigger). It may not be time-wise be better, but it does make us more independent of provided images and we can tune/test the way we'd like. The docker variant may also solve some of the (github) instabilities we see when the default images are run. But for now, I think we should enjoy the current changes for a moment before going deeper into the CI pit ;-) |
|
Many many thanks for the time spent on the CI, the effort does not go unnoticed, keep it up 💪 |
|
You are welcome. One of this side projects that took longer than estimated... ;-) I know people that complain about CI's taking 5-10 min and this one took 40 min so I took a look... Time wise, the doc part is still the blocking path but this is not CI related. Way's to improve:
But let it be for a while and see how the actual version behaves before doing more change. I also want to implement xenomai net and create some chips on the CNC. |
This is the way to proceed :-) All other options are just dealing with symptoms. |
This is the follow up to #3983
It will be more in depth changes so riskier and will take some time.
There will be a few pushes, including some that should fail on purpose to test if everything works as desired. Just tell me if I abuse the CI to much and I will find a different solution.
It is experimental for now but I need a PR so the CI runs.