Skip to content

fix(ios-qa): resolve CoreDevice tunnel via devicectl + keep tunnel alive#1673

Open
sternryan wants to merge 1 commit into
garrytan:mainfrom
sternryan:fix/ios-qa-tunnel-resolution
Open

fix(ios-qa): resolve CoreDevice tunnel via devicectl + keep tunnel alive#1673
sternryan wants to merge 1 commit into
garrytan:mainfrom
sternryan:fix/ios-qa-tunnel-resolution

Conversation

@sternryan
Copy link
Copy Markdown

Problem

On macOS 26.x (Darwin 25.x), the /ios-qa daemon's bootstrapTunnel fails at the IPv6-resolution step with resolve_failed. The cause: getDeviceTunnelIPv6 calls dns.resolve6('<device>.coredevice.local'), and Node's dns.resolve6 uses the libresolv path, which does NOT consult mDNSResponder — so the CoreDevice mDNS name returns ESERVFAIL even when the device is actively tunneled. dns.lookup (getaddrinfo) is the API that routes through mDNS.

A deeper issue compounds this: even when resolution succeeds, Xcode 26's CoreDevice only keeps the USB tunnel session alive while a devicectl command is in-flight (or Xcode itself is attached). Within ~10–15 seconds of idle, the tunnel IPv6 ULA becomes unroutable — curl http://[fde4:…]:9999/healthz times out even though xcrun devicectl device info details still reports the same address. Proxy traffic from the daemon to the StateServer fails silently.

Repro

// Node 22 / Bun 1.2 on macOS 26.x (Darwin 25.3.0):
const dns = require('dns');
dns.resolve6('Crack-phone.coredevice.local', console.log);
// → Error: queryAaaa ESERVFAIL Crack-phone.coredevice.local

dns.lookup('Crack-phone.coredevice.local', { family: 6 }, console.log);
// → null 'fde4:2827:528e::1' 6

Fix

Two-part, both inside ios-qa/daemon/src/:

1. Resolution order (devicectl.ts) — new resolveTunnelIPv6() tries strategies in decreasing reliability:

  • xcrun devicectl device info details --json-output and read result.connectionProperties.tunnelIPAddress directly (most reliable; also bumps the tunnel as a side effect)
  • mDNS via dns.lookup (getaddrinfo → mDNSResponder)
  • legacy dns.resolve6 as a last-ditch fallback (kept for backwards compat)

The new defaultResolve uses dns.lookup instead of dns.resolve6. bootstrapTunnel now calls resolveTunnelIPv6 instead of just getDeviceTunnelIPv6.

2. Tunnel keepalive (devicectl.ts + index.ts) — new startTunnelKeepalive(udid) spawns a periodic xcrun devicectl device info details (default 5s) to keep CoreDevice's tunnel session alive. We chose info details over device console because it's cheap (~10ms CPU per tick, no persistent child process, no stdout firehose, no backpressure risk) and the 5s interval is comfortably under the empirical teardown timeout. Started after a successful bootstrap, stopped on SIGINT / SIGTERM / exit. Returns a { stop } handle for clean teardown.

Testing

bun test test/tunnel-bootstrap.test.ts
 17 pass
 0 fail
 36 expect() calls

bun test  (full daemon suite)
 82 pass
 0 fail
 1 error (pre-existing — see below)

New / updated tests in ios-qa/daemon/test/tunnel-bootstrap.test.ts:

  • getDeviceTunnelIPv6FromDevicectl (4 tests): extracts tunnelIPAddress, falls back to result.tunnel.ipAddress, handles non-zero exit, handles missing field
  • resolveTunnelIPv6 fallback chain (4 tests): each strategy preferred in order, all-fail → null
  • startTunnelKeepalive (2 tests): periodic spawn, stop() idempotent
  • Existing bootstrap tests updated to include the new device info details spawn step

Backwards compatibility

The legacy dns.resolve6 path is preserved as the third-tier fallback. The exported getDeviceTunnelIPv6(deviceName, resolve) signature is unchanged. New surface: getDeviceTunnelIPv6FromDevicectl(udid, spawn), resolveTunnelIPv6(opts), startTunnelKeepalive(udid, opts). Existing tests pass with the new resolution order (the test scripts now include the new spawn step).

Pre-existing test issue (not addressed)

test/daemon-integration.test.ts imports afterEach but it's not in the bun:test import list. This is a pre-existing failure on main — unchanged by this PR. Worth a follow-up.

Tested against

iPhone 12 Pro on iOS 26.x via Mac Mini M-series running macOS Sequoia 15.x / Darwin 25.3.0.

Maintainer notes

  • 5s keepalive interval is conservative. If it's too aggressive for some environments, exposing GSTACK_IOS_TUNNEL_KEEPALIVE_MS would be one knob — happy to add if you'd like.
  • The keepalive uses setInterval(...).unref() so it never blocks daemon shutdown.
  • result.tunnel.ipAddress is accepted as a fallback to result.connectionProperties.tunnelIPAddress because some Xcode/CoreDevice JSON shapes use the former — defensive but cheap.

The daemon's tunnel bootstrap used `dns.resolve6` to look up
`<device>.coredevice.local`, which fails with ESERVFAIL on macOS 26.x
(Darwin 25.x) because Node's resolve6 path goes through libresolv and
does NOT consult mDNSResponder. `dns.lookup` (getaddrinfo) does.

Even when resolution works, CoreDevice in Xcode 26 only holds the
USB tunnel up while a devicectl command is in-flight, so the IPv6 ULA
becomes unroutable within ~10-15s of idle and subsequent proxy
requests time out.

Two-part fix:

  1. Resolution order is now (a) `xcrun devicectl device info details
     --json-output` to read `result.connectionProperties.tunnelIPAddress`
     directly, (b) mDNS via `dns.lookup`, (c) legacy `dns.resolve6` as
     a last-ditch fallback.
  2. After a successful bootstrap the daemon spawns a periodic
     `devicectl device info details` (~5s) to keep the tunnel session
     alive. Cleaned up on SIGINT/SIGTERM/exit.

Adds tests for `getDeviceTunnelIPv6FromDevicectl`, the
`resolveTunnelIPv6` fallback chain, and `startTunnelKeepalive`.
Existing bootstrap tests updated to include the new
`device info details` spawn step.

Tested against: iPhone 12 Pro on iOS 26.x via Mac Mini M-series
running macOS Sequoia 15.x / Darwin 25.3.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant