Attempt to make service fetching more efficient (using asyncio) by OCopping · Pull Request #228 · epics-containers/edge-containers-cli

OCopping · 2026-01-29T09:15:27Z

No description provided.

gilesknap · 2026-02-05T07:36:58Z

Sorry I missed this. Remind me to take a look at it during the epics-containers sprint next week.

… the table is more efficient Also added a loading indicator

gilesknap · 2026-03-25T08:17:30Z

@OCopping in testing I'm not seeing any improvement on the time to run ec ps.

Let me know when you are around and I'll drop by for a demo.

gilesknap

Review

Sorry to use Claude - but he is way better at reading other people's code than me! Cluade agrees with our analysis that the subprocess calls are not concurrent. Look out for point 3 too. I think these points are generally pretty good.

Overview

This PR attempts to speed up service fetching by running argocd app manifests calls concurrently via asyncio. It also improves the TUI monitor with a loading indicator, batch updates, and selective cell updates.

Issues

1. asyncio provides no actual concurrency here (Critical)

The _extract_app_manifests method is async but calls shell.run_command() — a synchronous blocking call. asyncio.TaskGroup only provides concurrency for await-based I/O. Since nothing is awaited in the hot path, all tasks run sequentially on the event loop, giving zero speedup. This likely explains the observation that ec ps shows no improvement.

To get actual concurrency, you'd need either:

await asyncio.to_thread(shell.run_command, ...) to run each shell call in a thread pool
Or skip asyncio entirely and use concurrent.futures.ThreadPoolExecutor directly

2. `_get_services_df` in monitor.py doesn't actually await (Bug)

In monitor.py, _get_services_df is now async but the body just calls self.commands._get_services_df(running_only) synchronously — no await. The return value is a plain DataFrame, not a coroutine. The async/await wrappers are decorative here.

3. Shared mutable state without proper protection (Bug)

ArgoCommands stores results on self.services_df and self.app_dicts as instance attributes, and _extract_app_manifests mutates self.services_df via .extend(). The self.async_lock is declared but never used in _extract_app_manifests. If this were truly concurrent, multiple tasks would race on self.services_df. Also, services_df is never reset between calls, so repeated invocations would accumulate duplicate rows.

4. `_get_services_df` event loop detection is fragile (Design)

try:
    asyncio.get_running_loop()
    with concurrent.futures.ThreadPoolExecutor() as pool:
        future = pool.submit(asyncio.run, self._get_service_data())
        future.result()
except RuntimeError:
    asyncio.run(self._get_service_data())

Spawning a new event loop inside a thread pool to work around an existing event loop is a code smell. This pattern can deadlock or cause subtle bugs with Textual's own event loop. Consider restructuring so the async boundary is cleaner.

5. `self.services_df` not reset between polls

Each call to _extract_app_manifests appends to self.services_df. On the second poll cycle, _get_service_data runs again but self.services_df still has data from the first cycle, leading to duplicate rows.

6. `_check_service` no longer uses the same code path as `_ps`

The base class _check_service calls _get_services_df, but ArgoCommands._check_service now calls _get_services() and reads self.app_dicts directly. This divergence means _check_service and _ps could give inconsistent results, and the base class override is easy to miss.

7. Minor: `_get_services` on base class is not abstract

commands.py adds _get_services as a plain method raising NotImplementedError (not @abstractmethod), while _get_services_df keeps @abstractmethod. This inconsistency means subclasses aren't forced to implement _get_services.

Positives

Loading indicator in the TUI is a good UX improvement — the app no longer hangs on startup.
batch_update() and selective cell updates (if str(current) != str(cell["contents"])) in populate_table are solid optimizations that reduce unnecessary redraws.
Caching Color.parse("white") as a module-level constant avoids repeated parsing.
Separating service list fetching (_get_services) from manifest extraction is a good structural direction.

Recommendation

The core async approach needs rework — the blocking shell.run_command calls need to be run via asyncio.to_thread() or a thread pool to achieve actual parallelism. I'd suggest addressing the concurrency issues before merging, since without them this adds complexity for no performance gain.

…ess calls)

gilesknap · 2026-03-25T10:23:35Z

ec ps is now super fast but this has introduced a few issues:

Updated PR 228 Review

Good progress — the core issue from the first review (blocking shell.run_command) is addressed by converting it to use asyncio.create_subprocess_shell. However, the async conversion is incomplete, introducing several
broken code paths.

Issues

CLI callers never await async methods (Critical — broken at runtime)

cli.py is unchanged but now calls methods that are async:
backend.commands.delete(service_name) # line 85 — returns unawaited coroutine
backend.commands.deploy(...) # line 132
backend.commands.log_history(service_name) # line 225
backend.commands.restart(service_name) # line 282
backend.commands.start(...) # line 297
backend.commands.stop(...) # line 315
These will silently do nothing — the coroutine is created but never executed. Every user-facing command except ps and logs is broken.

get_patches() calls async shell.run_command without await (Critical — broken at runtime)

get_patches() (line 39) is still a regular function but shell.run_command is now async. app_resp will be a coroutine object, not a string. YAML.load() will then fail or produce nonsense. Since get_patches is called
inside push_remove_key, that entire path is also broken.

push_value and push_remove_key miss await on their first shell.run_command (Bug)

Both functions are async but their initial app_resp = shell.run_command(...) calls (lines ~87, ~107) are not awaited. Same issue — coroutine assigned to app_resp instead of the actual string result.

do_retry wrapper breaks async functions (Critical)

do_retry wraps async functions (patch_value, push_value, push_remove_key) but calls them with cmd(*args, **kwargs) — this returns a coroutine without awaiting it. The _do_retry wrapper is sync, so it can never
properly execute the async function body. The retry logic is effectively dead, and the wrapped functions do nothing.

self.services_df still not reset between polls (Bug from v1 — unfixed)

_get_service_data calls _extract_app_manifests which appends to self.services_df, but it's never cleared before a new poll cycle. Each poll accumulates duplicate rows.

k8s_commands.py, helm.py, git.py all call shell.run_command without await (Critical)

These files have ~20+ calls to shell.run_command that are all non-awaited. The entire k8s backend and helm deployment are broken since shell.run_command is now unconditionally async.

do_polling is @work(thread=True) + async def (Bug)

@work(thread=True) runs the function in a thread. Making it async def means it returns a coroutine from that thread, which Textual's worker won't automatically await. The polling loop likely never executes.

asyncio.create_subprocess_shell passed shell=True (Minor)

create_subprocess_shell always runs through the shell — shell=True is not a valid parameter for it (it's a subprocess.Popen parameter). This may be silently ignored or could error depending on the Python version.

Positives (carried over from v1, still good)

Loading indicator, batch updates, and selective cell updates in the TUI remain solid improvements.
The async subprocess approach is the right direction for achieving concurrency.

Recommendation

The async conversion needs to be completed across the entire codebase — right now only argo_commands.py methods are partially converted while all other callers are broken. Consider:

Keep shell.run_command synchronous and add a separate shell.run_command_async method. This way only the code that needs concurrency (the _extract_app_manifests TaskGroup) uses async, and everything else continues
to work unchanged.
Alternatively, commit to full async but then cli.py, k8s_commands.py, helm.py, git.py, and do_retry all need updating too.

Option 1 is much less invasive and easier to get right.

Missing await/async

also add useful comment

…ucture)

…lity

OCopping requested a review from gilesknap January 29, 2026 09:15

OCopping changed the base branch from Implement-k8s-service-labels to update-copier-template January 29, 2026 09:17

OCopping changed the base branch from update-copier-template to Implement-k8s-service-labels January 29, 2026 09:18

OCopping force-pushed the more-efficient-fetching branch from fe01f5b to 8524908 Compare January 29, 2026 09:24

OCopping mentioned this pull request Feb 6, 2026

ec monitor startup is very slow when namespace has a lot of services #232

Closed

OCopping force-pushed the Implement-k8s-service-labels branch from 6a283d7 to 65cf66c Compare March 4, 2026 08:58

OCopping added 5 commits March 24, 2026 13:24

Attempt to make service fetching more efficient (using asyncio)

b78b97b

Fix base Commands and DemoCommands

0288ec9

Fix argocd commands and tests

f91bb6e

Update k8s commands function

bc95467

Fix asynchronous fetching of services df

4f48a10

OCopping force-pushed the more-efficient-fetching branch from eb0f146 to 4f48a10 Compare March 24, 2026 13:26

OCopping added 3 commits March 24, 2026 13:49

Make it so that the App doesn't hang on initial start, and populating…

2eb5a1c

… the table is more efficient Also added a loading indicator

Fix ec ps

b710578

Remove duplicate test yaml block

cc72560

gilesknap reviewed Mar 25, 2026

View reviewed changes

OCopping added 2 commits March 25, 2026 09:19

Improve fetching massively (convert to use asyncio compatible subproc…

59e9cee

…ess calls)

Re-add async_lock to services_df assignment/extension

a3ccf6b

OCopping added 9 commits March 26, 2026 09:03

Add missing async/await to cli.py

be66d7b

Missing async/await

c1786f9

Missing await/async

Fix do_retry with custom type and await

61598fe

Clear services dataframe at beginning of new poll

8d94127

also add useful comment

Make demo commands async

31ff3c5

Fix k8s_commands to make async

175e336

Remove unnecessary shell=True from async subprocess call

577554e

Fix monitor do_polling (didn't need to be async)

448c014

Remove unnecessary asyncio.run

c48f346

OCopping added 14 commits March 26, 2026 11:35

Allow for _run_async to return result of coroutines

0583480

Fix async logs in monitor

b1d2a12

Fix helm.py async/await

1a5c282

Fix more missing async/await

79fdb13

Fix git.py async/await

53d51b2

Fix autocomplete.py async/await

0137372

Fix k8s commands? (Make it more consistent with new argo_commands str…

f1f41ee

…ucture)

Missing await for shell.run_command

15cf4fe

Add async_command wrapper to allow for typer @cli.command() compatibi…

dae2bdf

…lity

Fix missing await

7cb031f

Add missing await/async to logs

54f6a23

Remove unnecessary async_command decorators

c726a48

Add async/await to instances

823a165

Fix test await issue (need MockRun functions to be async)

68964dc

gilesknap merged commit 68964dc into Implement-k8s-service-labels Mar 27, 2026
5 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to make service fetching more efficient (using asyncio)#228

Attempt to make service fetching more efficient (using asyncio)#228
gilesknap merged 33 commits into
Implement-k8s-service-labelsfrom
more-efficient-fetching

OCopping commented Jan 29, 2026

Uh oh!

gilesknap commented Feb 5, 2026

Uh oh!

gilesknap commented Mar 25, 2026

Uh oh!

gilesknap left a comment •

edited

Loading

Uh oh!

gilesknap commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

OCopping commented Jan 29, 2026

Uh oh!

gilesknap commented Feb 5, 2026

Uh oh!

gilesknap commented Mar 25, 2026

Uh oh!

gilesknap left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review

Overview

Issues

1. asyncio provides no actual concurrency here (Critical)

2. _get_services_df in monitor.py doesn't actually await (Bug)

3. Shared mutable state without proper protection (Bug)

4. _get_services_df event loop detection is fragile (Design)

5. self.services_df not reset between polls

6. _check_service no longer uses the same code path as _ps

7. Minor: _get_services on base class is not abstract

Positives

Recommendation

Uh oh!

gilesknap commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gilesknap left a comment •

edited

Loading

2. `_get_services_df` in monitor.py doesn't actually await (Bug)

4. `_get_services_df` event loop detection is fragile (Design)

5. `self.services_df` not reset between polls

6. `_check_service` no longer uses the same code path as `_ps`

7. Minor: `_get_services` on base class is not abstract