feat(hal,rtapi): add initf one-shot init functs + resync#4012
Conversation
|
@BsAtHome @andypugh I need this in for EtherCAT |
|
I've been a bit involved with this because Luca and I share a mutual client that has had major problems this after upgrading from Debian 10 to Debian 13 Trixie. Likewise if he used Debian 12. There is no doubt this is a needed feature. I have worked with kernel tuning since the first prerelease of Debian 12 and running an Ethercat machine on Debian 11 (kernel 5.10). The modern kernels after 5.9 have progressively supported additional power saving features at the expense of latency if not tuned. This has been exacerbated by changes to the iGh Ethercat master we use that makes it difficult to obtain the level of performance that was experienced with Debian 10 due to increased network latency and synchronisation issues. From what I understand the Ethercat and Linuxcnc threads need to be synchronised to avoid symptoms such as grinding noises and performance issues from expensive servos. This becomes more of an issue with an increased number of slaves on the bus Luca has been working tirelessly for a few months to resolve this problem experienced by some users. This PR will provide a "hook" point at which the Ethercat thread and the Servo thread can be properly synchronised as we start our servo thread. It will solve a very perplexing problem for a number of users. As an aside, and this is not really a Linuxcnc issue, if the Trixie kernel is properly tuned, Jitter is substantially reduced with one recent test of mine showed jitter can be reduced by 50% of a normally tuned kernel. Some of my earlier PC's (Celeron j1900's) can be coaxed into far greater latency performance with each successive kernel. Its not really surprising that LinuxCNC needs some tweaks to keep abreast of developments at iGh, Debian and the kernel developers. |
This is not an INI-file issue afaics. It is a HAL-file issue. |
|
Just me thinking... The whole HAL memory segment and all the components in RT need to be reloaded when you restart. There is no dynamic load/unload that works well in the current design because HAL memory is not freed. Bad ThingsTM may happen when you restart with an "old" HAL memory segment intact... |
|
Interactive
Yes I experience this sometimes while debugging in general not related to this, not try to oversell the
Sorry I had a typo, you are correct |
|
@rodw-au I don't use Ethercat but I have a mesa Ethernet card so I had some issues at the beginning myself. I am using Debian 13. I guess you tried similar things already and it might be even somewhere documented. If not, here what I figured out so far:
I use this script as post-up script. It's not perfect but it does the job for me. MASK=1<<CPU has to be adjusted to be the same core as linuxcnc uses: I use htop / cat /proc/interrupts to check afterwards if all tasks / IRQ's run on the correct CPU. All together with other RT tuning, I have now an Ethernet related jitter of only about 3 us first function / 15 us last function. The servo_thread.time is between 230 and 260us. In my setup, the worst jitter happens if the GPU has to do something. Even moving a window is sometimes visible, from 3 to 8...10us and watching youtube results in rare spikes of 100+ us. I need to investigate but the theory for now is that Memory->GPU DMA transfers block the memory access for the isolated CPU. |
|
Way overkill mostly but gives insight how the network stack can be tuned with PREEMT_RT: https://www.youtube.com/watch?v=t4H1h9mm188 |
|
@hdiethelm yes, using the isolated rt core for the NIC is best practice. Remember for our application PREEMPT_RT is not fast, its predictable (deterministic). We just need to lock the kernel down to avoided any intermitent surprises! For fast, you may need userspace and the DDPK used in HF (High Frequency) trading colocated with your market operator and obscenely expensive NIC's. Here, every nanosecond of latency looses money. |
|
@rodw-au Thanks for the input, i have to go trought to see if I missed a possible optimisation. Jitter is fine with 15us but 200us for the tx/rx function seams to be a bit much. Do you meen 200 us or 200 ns? Even thought the video is about TSN/PTP which makes no sense with linuxcnc when you use a point to point connection, there are also good hints about general rt ethernet. |
|
@BsAtHome I have a whole bunch of stuff on the linuxcnc-ethercat side of things hinging on this making it into master, I would like it merged ASAP, was hoping for it to get merged yesterday at the meeting, but the RTAI V2 cleanup raised conflicts, refactored matching new structure with |
|
You still didn't fix your indent problems on those parts you added. (and please, only fix the parts you added or we can't differentiate blame anymore). My question, about the list being destructed when init is done, has not really been answered or I don't understand the answer. I conceded the use of the And, I am not holding this back. Andy said he wanted to have a look. |
|
It sems that this is for functions that need an initialisation step once realtime is running and comms are active? |
|
@andypugh All initfs on the same thread run sequentially in one cycle, resync happens once after the batch, not per-funct. The resync is not about silencing diagnostics, it's about giving the next cyclic cycle a clean phase reference. Without it, EtherCAT users would still see sync drift even if the warning disappeared. see src/rtapi/uspace_posix.cc:204-212 @BsAtHome I cleaned up the indentation, and did list-destruction after |
|
|
adee71f |
|
Because you have combined tabs and spaces. |
|
I did 4 spaces everywhere on my changes, should be looking good now. |
|
I just quickly looked into this changes. Sadly, I don't have the time right now to go into the details. Surly possible that I have missed something. What you are doing, assuming task time is 1ms is the following, right? You can achieve the same with way less code by using rtapi_task_resync() directly in the hal modules. At first glance, it looks like a work around. However, it has also an advantage. With this structure, you can reset and init ethercat again while linuxcnc is running in special states, for example when estop is active. No changes needed in hal_* files due to no new function type is needed. Now the question is, if it is good to allow any halmodule to do a resync(). I would say why not, anybody's own responsibility if he breaks RT for a few cycles and with resync() he can now do it the proper way. I leave it to you which version you prefer, this or yours. Both have advantages and disadvantages. Free-style pseudo-code matching the above: This would look something like this in .hal: And ethercat: int rtapi_app_main(void)
{
hal_export_funct("ethercat_module.ethercat_init", ethercat_init, 0, 0, 0, comp_id);
}
static void ethercat_init(void *arg, long period)
{
if(!init){
do_init();
init=true;
rtapi_task_resync(); //This is your resync function
rtapi_wait(); //This is the wait function, already available in hal objects. Waits for the next cycle
}
} |
|
Multiple init would be a bit more cumbersome with ^, you would need an argument for each module that should not call wait. Feel free to cherry-pick xenomai variants to your PR: hdiethelm@b5b684a Tested with: diff --git a/src/hal/components/timedelta.comp b/src/hal/components/timedelta.comp
index 19fabe45e8..e416da2074 100644
--- a/src/hal/components/timedelta.comp
+++ b/src/hal/components/timedelta.comp
@@ -49,6 +49,8 @@ if(last != 0) {
count++;
avg_err = err / (double)count;
}
+rtapi_delay(1000000);
+rtapi_task_self_resync();
if(reset) { first = 1; last = 0; out = 0; jitter = 0; max_ = 0; }
else last = now;
diff --git a/src/rtapi/uspace_rtapi_main.cc b/src/rtapi/uspace_rtapi_main.cc
index 3a73c5e294..f96faf7fd0 100644
--- a/src/rtapi/uspace_rtapi_main.cc
+++ b/src/rtapi/uspace_rtapi_main.cc
@@ -1110,8 +1110,6 @@ long int rtapi_delay_max() {
}
void rtapi_delay(long ns) {
- if (ns > rtapi_delay_max())
- ns = rtapi_delay_max();
App().do_delay(ns);
}
|
A lot of the older LinuxCNC code assumes an 8-space tab, and uses 4 spaces for a half-tab. I hate working in those files. |
|
Is this good to go yet? There is something about it I don't "like" but nothing rational. And I have absolutely no experience with EtherCAT (I haven't built a new CNC machine for rather a long time) |
|
Well, I have not looked at hdiethelm point yet |
New hal_init_funct_to_thread() registers a funct that runs exactly once in RT context before the thread's cyclic list, on the first cycle after start_threads. After the init pass the thread loop calls new rtapi_task_self_resync() to re-anchor the periodic schedule, so a long init does not trip rtapi_wait()'s "unexpected realtime delay" catch-up loop and the next cyclic cycle starts on a clean period boundary. Primary use is EtherCAT master activation: ecrt_master_activate() must run in the RT thread immediately before cyclic comms, but the call itself takes far longer than a period. Surfaced as halcmd verb 'initf' (same +N/-N position semantics as addf). Late initf calls return -EALREADY so config order does not depend on whether start_threads has run yet. Posix and Xenomai backends resync by clock_gettime(CLOCK_MONOTONIC, &task->nextstart); Xenomai EVL uses evl_read_clock(EVL_CLOCK_MONOTONIC, ...). RTAI backend is a warn-once stub: per-task period storage is not currently kept and the primary consumer runs on uspace. Co-authored-by: Hannes Diethelm <hannes.diethelm@gmail.com>
this will made realtime error. because ethercat active will take many time. |
|
@hdiethelm thanks for the alternative design and especially the Xenomai/EVL variants. Cherry-picked those into the latest force-push, credited you as co-author. On the design choice, we went with the
rtapi_task_self_resync() is exported either way, so modules that prefer your in-funct pattern can still use it directly. The two approaches aren't mutually exclusive. |
|
@andypugh this is ready |
No due to the call to rtapi_task_self_resync() which resets the time. Tested with: #4012 (comment) @grandixximo I already used the exported resync in #4012 (comment) to test the xenomai implementations without having to create to much hal code. Works fine. If there is a resync needed for ethercat, I remember having to do something like that after a drive error for a different project, the "do something breaking RT and then calling rtapi_task_self_resync()" approach would still be an option. |



Summary
hal_init_funct_to_thread()plus halcmd verbinitfregister a funct that runs exactly once in RT context, in a dedicated cycle before the thread's cyclic list, the first timestart_threadsis observed. Position semantics matchaddf(+Nfrom head,-Nfrom tail).rtapi_task_self_resync()re-anchors the calling task's periodic schedule to "one full period from now". The thread loop calls it right after the init pass so a long init does not triprtapi_wait()'s "unexpected realtime delay" catch-up storm and the next cyclic cycle starts on a clean period boundary.initfcalls return-EALREADY(halcmd surfaces a warning) so config order does not depend on whetherstart_threadshas run yet.free_funct_struct()sweeps the newinit_funct_listso unloading a comp does not leave dangling init entries.Motivation
EtherCAT master activation (
ecrt_master_activate()) must run in the RT thread immediately before cyclic comms, but the call takes far longer than a single period. Today drivers either activate atrtapi_app_maintime (loses RT context, breaks DC SYNC0 phasing) or in the first cyclic cycle (overruns + "unexpected realtime delay" warning storm).initf+rtapi_task_self_resync()gives a clean RT-context one-shot init slot followed by a phase-clean first cyclic cycle.Backends
Posix::task_self_resync()resetstask->nextstartviaclock_gettime(RTAPI_CLOCK, ...). Nextrtapi_wait()advances byperiod + pll_correctionand sleeps to it, giving exactly one fresh period.period_countsinrtapi_task_start()and callrt_task_make_periodic(rt_whoami(), rt_get_time() + period_counts, period_counts)inrtapi_task_self_resync().