Skip to content

feat(hal,rtapi): add initf one-shot init functs + resync#4012

Merged
andypugh merged 1 commit into
LinuxCNC:masterfrom
grandixximo:initf
May 12, 2026
Merged

feat(hal,rtapi): add initf one-shot init functs + resync#4012
andypugh merged 1 commit into
LinuxCNC:masterfrom
grandixximo:initf

Conversation

@grandixximo
Copy link
Copy Markdown
Contributor

Summary

  • New hal_init_funct_to_thread() plus halcmd verb initf register a funct that runs exactly once in RT context, in a dedicated cycle before the thread's cyclic list, the first time start_threads is observed. Position semantics match addf (+N from head, -N from tail).
  • New rtapi_task_self_resync() re-anchors the calling task's periodic schedule to "one full period from now". The thread loop calls it right after the init pass so a long init does not trip rtapi_wait()'s "unexpected realtime delay" catch-up storm and the next cyclic cycle starts on a clean period boundary.
  • Late initf calls return -EALREADY (halcmd surfaces a warning) so config order does not depend on whether start_threads has run yet.
  • free_funct_struct() sweeps the new init_funct_list so unloading a comp does not leave dangling init entries.

Motivation

EtherCAT master activation (ecrt_master_activate()) must run in the RT thread immediately before cyclic comms, but the call takes far longer than a single period. Today drivers either activate at rtapi_app_main time (loses RT context, breaks DC SYNC0 phasing) or in the first cyclic cycle (overruns + "unexpected realtime delay" warning storm). initf + rtapi_task_self_resync() gives a clean RT-context one-shot init slot followed by a phase-clean first cyclic cycle.

Backends

  • uspace (Posix): Posix::task_self_resync() resets task->nextstart via clock_gettime(RTAPI_CLOCK, ...). Next rtapi_wait() advances by period + pll_correction and sleeps to it, giving exactly one fresh period.
  • RTAI: warn-once stub. Per-task period storage is not currently kept and the primary consumer runs on uspace. If RTAI support is needed later, store period_counts in rtapi_task_start() and call rt_task_make_periodic(rt_whoami(), rt_get_time() + period_counts, period_counts) in rtapi_task_self_resync().

@grandixximo
Copy link
Copy Markdown
Contributor Author

grandixximo commented May 9, 2026

@BsAtHome @andypugh I need this in for EtherCAT
Probably can discuss tomorrow...
It will require users to add an initf line to have the lcec function be properly activated and synchronized, this could work for other use cases, but for now EtherCAT really needs it. If this passes my plan is to update linuxcnc-ethercat to send a warning/error if the initf is not called, forcing users to add it to the INI file.
Adding the initf line could be part of the new INI V1.2 automatic config updater?

@rodw-au
Copy link
Copy Markdown
Contributor

rodw-au commented May 9, 2026

I've been a bit involved with this because Luca and I share a mutual client that has had major problems this after upgrading from Debian 10 to Debian 13 Trixie. Likewise if he used Debian 12.

There is no doubt this is a needed feature. I have worked with kernel tuning since the first prerelease of Debian 12 and running an Ethercat machine on Debian 11 (kernel 5.10). The modern kernels after 5.9 have progressively supported additional power saving features at the expense of latency if not tuned. This has been exacerbated by changes to the iGh Ethercat master we use that makes it difficult to obtain the level of performance that was experienced with Debian 10 due to increased network latency and synchronisation issues.

From what I understand the Ethercat and Linuxcnc threads need to be synchronised to avoid symptoms such as grinding noises and performance issues from expensive servos. This becomes more of an issue with an increased number of slaves on the bus

Luca has been working tirelessly for a few months to resolve this problem experienced by some users. This PR will provide a "hook" point at which the Ethercat thread and the Servo thread can be properly synchronised as we start our servo thread. It will solve a very perplexing problem for a number of users.

As an aside, and this is not really a Linuxcnc issue, if the Trixie kernel is properly tuned, Jitter is substantially reduced with one recent test of mine showed jitter can be reduced by 50% of a normally tuned kernel. Some of my earlier PC's (Celeron j1900's) can be coaxed into far greater latency performance with each successive kernel. Its not really surprising that LinuxCNC needs some tweaks to keep abreast of developments at iGh, Debian and the kernel developers.

@BsAtHome
Copy link
Copy Markdown
Contributor

BsAtHome commented May 9, 2026

If this passes my plan is to update linuxcnc-ethercat to send a warning/error if the initf is not called, forcing users to add it to the INI file. Adding the initf line could be part of the new INI V1.2 automatic config updater?

This is not an INI-file issue afaics. It is a HAL-file issue.

@BsAtHome
Copy link
Copy Markdown
Contributor

BsAtHome commented May 9, 2026

Just me thinking...
If the added functions only run once, then why keep the list at all? Every entry that has run can be simply removed from the list. Then, when no entries are available, you run the normal cycle. In such case, you can test on the list being empty instead of requiring a variable init_done. There is only one exception where you want that variable if you want to detect someone adding an init function after you start the cycle runs (this is a valid case).

The whole HAL memory segment and all the components in RT need to be reloaded when you restart. There is no dynamic load/unload that works well in the current design because HAL memory is not freed. Bad ThingsTM may happen when you restart with an "old" HAL memory segment intact...

@rodw-au
Copy link
Copy Markdown
Contributor

rodw-au commented May 9, 2026

If this passes my plan is to update linuxcnc-ethercat to send a warning/error if the initf is not called, forcing users to add it to the INI file. Adding the initf line could be part of the new INI V1.2 automatic config updater?

This is not an INI-file issue afaics. It is a HAL-file issue.
I seem to remember that versioning was introduced to deal with the transition to what's been called joint axes which in my view should more correctly be called independent axes.
Doesn't the versioning also deal with the transitioning to multi spindle? eg from spindle.on to spindle.0.on?

@grandixximo
Copy link
Copy Markdown
Contributor Author

There is only one exception where you want that variable if you want to detect someone adding an init function after you start the cycle runs (this is a valid case).

Interactive initf from halcmd on a running system is a mistake, and is deliberately rejected loudly, not having the init_done flag is certainly more elegant, but halcmd initf would be IMO a common enough first try for users updating, that preventing it makes the ugly flag worth keeping.

The whole HAL memory segment and all the components in RT need to be reloaded when you restart. There is no dynamic load/unload that works well in the current design because HAL memory is not freed. Bad ThingsTM may happen when you restart with an "old" HAL memory segment intact...

Yes I experience this sometimes while debugging in general not related to this, not try to oversell the free_funct_struct() sweep as making reloads safe, it just patches the obvious crash.

This is not an INI-file issue afaics. It is a HAL-file issue.

Sorry I had a typo, you are correct initf needs to be added in the hal file, but as rodw argues, the INI updater patches hal as well, still I'm not 100% sure how to handle the hal file update requirement cleanly yet, warning from linuxcnc-ethercat is what I'm landing on now, probably something to talk about at the meeting...

@hdiethelm
Copy link
Copy Markdown
Contributor

@rodw-au I don't use Ethercat but I have a mesa Ethernet card so I had some issues at the beginning myself. I am using Debian 13.

I guess you tried similar things already and it might be even somewhere documented. If not, here what I figured out so far:

  • The r8169 driver is crap > 1000 us jitter, unusable. You need the r8168 driver if you have such a card:
    • apt-get install r8168-dkms
  • I use two timedelta loadrt timedelta count=2 and then call the first one as the first function / the last one as the last function in the list. Together with pyvpc, I have now a display of the start / end jitter always available in the main UI. If desired, I can share some snippets.
  • I discovered that the latency and jitter is best if both the Ethernet card task and the IRQ are on the isolated core where the RT thread runs on. It seams counterintuitive first but because the mesa thread sends data and waits for a response, it is probably more efficient if also the Ethernet card task runs on the same CPU, so no CPU->Memory->CPU transfer is needed. The impact was quite noticeable, 150us down to 15us jitter.
  • Later, I changed to a Intel I350 Ethernet card to be able to use Xenomai. However, this is still WIP. This card has several chains but the jitter is lower if I use just one and deactivate the others. A bit better than with the r8168-dkms but not much.
  • Packages to remove, they sometimes trigger actions on the network card. I did not do extensive testing but I assume it can result in latency: network-manager, network-manager-applet, modemmanager
  • Xenomai has special network card drivers which promise even lower latency but this is still WIP on my side. This should also work with Ethercat.

I use this script as post-up script. It's not perfect but it does the job for me. MASK=1<<CPU has to be adjusted to be the same core as linuxcnc uses:
eth-rt-affinity.sh

I use htop / cat /proc/interrupts to check afterwards if all tasks / IRQ's run on the correct CPU.

All together with other RT tuning, I have now an Ethernet related jitter of only about 3 us first function / 15 us last function. The servo_thread.time is between 230 and 260us.

In my setup, the worst jitter happens if the GPU has to do something. Even moving a window is sometimes visible, from 3 to 8...10us and watching youtube results in rare spikes of 100+ us. I need to investigate but the theory for now is that Memory->GPU DMA transfers block the memory access for the isolated CPU.

@hdiethelm
Copy link
Copy Markdown
Contributor

Way overkill mostly but gives insight how the network stack can be tuned with PREEMT_RT: https://www.youtube.com/watch?v=t4H1h9mm188

@rodw-au
Copy link
Copy Markdown
Contributor

rodw-au commented May 10, 2026

@hdiethelm yes, using the isolated rt core for the NIC is best practice.
There isn't any difference tuning the hardware for Ethercat or mesa except Ethercat has some custom NIC drivers that avoid the overhead of interrupts (and thus your affinity) altogether by using polling.
Realtek NIC cards use about 30% more memory than Intel and should be avoided.
TSN (Time Sensitive Networking ) from your YouTube uses IEEE-1588v2 Precision Time Protocol (PTP) which is now supported in the Linux kernel. It requires hardware that supports it. Often entry level NIC's in the same family (eg i225) don't . You have to use server grade cards, not the more common i225-V in this list. https://www.intel.com/content/www/us/en/ark/products/series/184686/intel-ethernet-controller-i225-series.html
Ethercat supports TSN but it needs support on the slave (and Mesa Ethernet cards) to be able to use it.
Agree with your thread time. My kernel traces showed about 200 ns so there is plenty of room to absorb minor latency spikes. But its the intermittent services that can break this.
I have a few videos on my Youtube channel about tuning (there is a seperate playlist for them)
https://www.youtube.com/@MrRodW
My scripted approach is described there. It has improved latency by 50% over manually tuned systems

Remember for our application PREEMPT_RT is not fast, its predictable (deterministic). We just need to lock the kernel down to avoided any intermitent surprises!

For fast, you may need userspace and the DDPK used in HF (High Frequency) trading colocated with your market operator and obscenely expensive NIC's. Here, every nanosecond of latency looses money.
https://www.dpdk.org/

@hdiethelm
Copy link
Copy Markdown
Contributor

@rodw-au Thanks for the input, i have to go trought to see if I missed a possible optimisation. Jitter is fine with 15us but 200us for the tx/rx function seams to be a bit much. Do you meen 200 us or 200 ns?

Even thought the video is about TSN/PTP which makes no sense with linuxcnc when you use a point to point connection, there are also good hints about general rt ethernet.

@grandixximo
Copy link
Copy Markdown
Contributor Author

grandixximo commented May 11, 2026

@BsAtHome I have a whole bunch of stuff on the linuxcnc-ethercat side of things hinging on this making it into master, I would like it merged ASAP, was hoping for it to get merged yesterday at the meeting, but the RTAI V2 cleanup raised conflicts, refactored matching new structure with CLOCK_MONOTONIC, let me know if anything else needs attention.

@BsAtHome
Copy link
Copy Markdown
Contributor

You still didn't fix your indent problems on those parts you added. (and please, only fix the parts you added or we can't differentiate blame anymore).

My question, about the list being destructed when init is done, has not really been answered or I don't understand the answer. I conceded the use of the init_done flag, but still don't understand the logic when it has run and what you are doing with the list (line 3639+). You say "sweep" in the comment, but what use is the list when you are done? That is why I asked why the list isn't destructed in the same loop where you run your init code.

And, I am not holding this back. Andy said he wanted to have a look.

@andypugh
Copy link
Copy Markdown
Collaborator

It sems that this is for functions that need an initialisation step once realtime is running and comms are active?
Some RT functions run an init, then switch to normal mode. I understand that the aim of this new HAL command is to avoid spurious "realtime delay" messages, but I wonder if a neater solution might not be to provide a message masking flag that a HAL component can set on first-run during the init stage? (Actually, would there be any harm in globally masking the message for the first cycle?)
If there are several "initf" registered, do they all run in the same thread cycle, or queue up, running in turn?

@grandixximo
Copy link
Copy Markdown
Contributor Author

grandixximo commented May 11, 2026

@andypugh All initfs on the same thread run sequentially in one cycle, resync happens once after the batch, not per-funct. The resync is not about silencing diagnostics, it's about giving the next cyclic cycle a clean phase reference. Without it, EtherCAT users would still see sync drift even if the warning disappeared.

see src/rtapi/uspace_posix.cc:204-212

@BsAtHome I cleaned up the indentation, and did list-destruction after init_done=1

@BsAtHome
Copy link
Copy Markdown
Contributor

@BsAtHome I cleaned up the indentation

Ehm... I think not (from your tree and the diff):
badindent

@grandixximo
Copy link
Copy Markdown
Contributor Author

adee71f
?

@grandixximo
Copy link
Copy Markdown
Contributor Author

image image blame is on 22 years ago on my end

@BsAtHome
Copy link
Copy Markdown
Contributor

Because you have combined tabs and spaces.
Example: in lines 2485..2490 of hal/hal_lib.c. The if is indented by 4 spaces. The next line by a tab char \t. The default tab-width is 4 spaces and that makes the content of the if statement wrongly indented.

@grandixximo
Copy link
Copy Markdown
Contributor Author

grandixximo commented May 11, 2026

I did 4 spaces everywhere on my changes, should be looking good now.
Going to bed now, hope to see this merged in morning 🙏

@hdiethelm
Copy link
Copy Markdown
Contributor

hdiethelm commented May 11, 2026

I just quickly looked into this changes. Sadly, I don't have the time right now to go into the details. Surly possible that I have missed something. What you are doing, assuming task time is 1ms is the following, right?

t=0
init_ethercat()
//t is now 100ms
resync t=100ms
wait for t = t + 1ms
while(true){
   task_functions()
   wait for t = t + 1ms
}

You can achieve the same with way less code by using rtapi_task_resync() directly in the hal modules. At first glance, it looks like a work around. However, it has also an advantage. With this structure, you can reset and init ethercat again while linuxcnc is running in special states, for example when estop is active.

No changes needed in hal_* files due to no new function type is needed.

Now the question is, if it is good to allow any halmodule to do a resync(). I would say why not, anybody's own responsibility if he breaks RT for a few cycles and with resync() he can now do it the proper way.

I leave it to you which version you prefer, this or yours. Both have advantages and disadvantages.

Free-style pseudo-code matching the above:

t=0
while(true){
    task_functions(){
        if(!init){
            init_ethercat()
            init=true
            //t is now 100ms
            resync t=100ms
            wait for t = t + 1ms
        }
        other_task_functions()
    }
    wait for t = t + 1ms
}

This would look something like this in .hal:

addf ethercat_module.ethercat_init
addf other_stuff...

And ethercat:

int rtapi_app_main(void)
{
    hal_export_funct("ethercat_module.ethercat_init", ethercat_init, 0, 0, 0, comp_id);
}

static void ethercat_init(void *arg, long period)
{
    if(!init){
        do_init();
        init=true;
        rtapi_task_resync(); //This is your resync function
        rtapi_wait(); //This is the wait function, already available in hal objects. Waits for the next cycle
    }
}

@hdiethelm
Copy link
Copy Markdown
Contributor

Multiple init would be a bit more cumbersome with ^, you would need an argument for each module that should not call wait.

Feel free to cherry-pick xenomai variants to your PR: hdiethelm@b5b684a

Tested with:

diff --git a/src/hal/components/timedelta.comp b/src/hal/components/timedelta.comp
index 19fabe45e8..e416da2074 100644
--- a/src/hal/components/timedelta.comp
+++ b/src/hal/components/timedelta.comp
@@ -49,6 +49,8 @@ if(last != 0) {
        count++;
        avg_err = err / (double)count;
 }
+rtapi_delay(1000000);
+rtapi_task_self_resync();
 
 if(reset) { first = 1; last = 0; out = 0; jitter = 0; max_ = 0; }
 else last = now;
diff --git a/src/rtapi/uspace_rtapi_main.cc b/src/rtapi/uspace_rtapi_main.cc
index 3a73c5e294..f96faf7fd0 100644
--- a/src/rtapi/uspace_rtapi_main.cc
+++ b/src/rtapi/uspace_rtapi_main.cc
@@ -1110,8 +1110,6 @@ long int rtapi_delay_max() {
 }
 
 void rtapi_delay(long ns) {
-    if (ns > rtapi_delay_max())
-        ns = rtapi_delay_max();
     App().do_delay(ns);
 }

../scripts/latency-test 1000000 now shows 2000000 as expected for posix / xenomai* without complaining about a delay.

@andypugh
Copy link
Copy Markdown
Collaborator

The default tab-width is 4 spaces and that makes the content of the if statement wrongly indented.

A lot of the older LinuxCNC code assumes an 8-space tab, and uses 4 spaces for a half-tab.

I hate working in those files.

@andypugh
Copy link
Copy Markdown
Collaborator

Is this good to go yet?

There is something about it I don't "like" but nothing rational. And I have absolutely no experience with EtherCAT (I haven't built a new CNC machine for rather a long time)

@grandixximo
Copy link
Copy Markdown
Contributor Author

Well, I have not looked at hdiethelm point yet

New hal_init_funct_to_thread() registers a funct that runs
exactly once in RT context before the thread's cyclic list,
on the first cycle after start_threads. After the init pass
the thread loop calls new rtapi_task_self_resync() to
re-anchor the periodic schedule, so a long init does not
trip rtapi_wait()'s "unexpected realtime delay" catch-up
loop and the next cyclic cycle starts on a clean period
boundary.

Primary use is EtherCAT master activation: ecrt_master_activate()
must run in the RT thread immediately before cyclic comms, but
the call itself takes far longer than a period.

Surfaced as halcmd verb 'initf' (same +N/-N position semantics
as addf). Late initf calls return -EALREADY so config order
does not depend on whether start_threads has run yet.

Posix and Xenomai backends resync by clock_gettime(CLOCK_MONOTONIC,
&task->nextstart); Xenomai EVL uses evl_read_clock(EVL_CLOCK_MONOTONIC,
...). RTAI backend is a warn-once stub: per-task period storage is
not currently kept and the primary consumer runs on uspace.

Co-authored-by: Hannes Diethelm <hannes.diethelm@gmail.com>
@mika4128
Copy link
Copy Markdown

I just quickly looked into this changes. Sadly, I don't have the time right now to go into the details. Surly possible that I have missed something. What you are doing, assuming task time is 1ms is the following, right?

t=0
init_ethercat()
//t is now 100ms
resync t=100ms
wait for t = t + 1ms
while(true){
   task_functions()
   wait for t = t + 1ms
}

You can achieve the same with way less code by using rtapi_task_resync() directly in the hal modules. At first glance, it looks like a work around. However, it has also an advantage. With this structure, you can reset and init ethercat again while linuxcnc is running in special states, for example when estop is active.

No changes needed in hal_* files due to no new function type is needed.

Now the question is, if it is good to allow any halmodule to do a resync(). I would say why not, anybody's own responsibility if he breaks RT for a few cycles and with resync() he can now do it the proper way.

I leave it to you which version you prefer, this or yours. Both have advantages and disadvantages.

Free-style pseudo-code matching the above:

t=0
while(true){
    task_functions(){
        if(!init){
            init_ethercat()
            init=true
            //t is now 100ms
            resync t=100ms
            wait for t = t + 1ms
        }
        other_task_functions()
    }
    wait for t = t + 1ms
}

This would look something like this in .hal:

addf ethercat_module.ethercat_init
addf other_stuff...

And ethercat:

int rtapi_app_main(void)
{
    hal_export_funct("ethercat_module.ethercat_init", ethercat_init, 0, 0, 0, comp_id);
}

static void ethercat_init(void *arg, long period)
{
    if(!init){
        do_init();
        init=true;
        rtapi_task_resync(); //This is your resync function
        rtapi_wait(); //This is the wait function, already available in hal objects. Waits for the next cycle
    }
}

this will made realtime error. because ethercat active will take many time.

@grandixximo
Copy link
Copy Markdown
Contributor Author

@hdiethelm thanks for the alternative design and especially the Xenomai/EVL variants. Cherry-picked those into the latest force-push, credited you as co-author.

On the design choice, we went with the initf verb for these reasons:

  • Resync is automatic. With your pattern, every module that needs init in RT has to remember rtapi_task_self_resync() + rtapi_wait() inside the if-once branch. If a module author forgets or places them wrong, "unexpected realtime delay" fires every startup. With initf, thread_task drives the resync once after the init list drains, so modules can't get it wrong.

  • No stats pollution. thread_task measures funct.tmax / thread.tmax around every cyclic funct. If ecrt_master_activate() runs inside a normal cyclic funct, funct.tmax latches to ~100ms for the rest of the session. Anything monitoring thread.tmax reports a permanent overrun. initf's special cycle skips timing measurement, so stats stay clean.

rtapi_task_self_resync() is exported either way, so modules that prefer your in-funct pattern can still use it directly. The two approaches aren't mutually exclusive.

@grandixximo
Copy link
Copy Markdown
Contributor Author

@andypugh this is ready

@andypugh andypugh merged commit 35a8b8f into LinuxCNC:master May 12, 2026
15 checks passed
@hdiethelm
Copy link
Copy Markdown
Contributor

@mika4128

this will made realtime error. because ethercat active will take many time.

No due to the call to rtapi_task_self_resync() which resets the time. Tested with: #4012 (comment)

@grandixximo
Agreed, i did not think about the stats and you version feels a bit cleaner.

I already used the exported resync in #4012 (comment) to test the xenomai implementations without having to create to much hal code. Works fine.

If there is a resync needed for ethercat, I remember having to do something like that after a drive error for a different project, the "do something breaking RT and then calling rtapi_task_self_resync()" approach would still be an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants