You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used Claude Opus 4.8 to debug the issue and produce the below writeup.
However, I did read every word and edited it before submitting 😄
Bug Description
Intermittent IAM authentication failures (ER_ACCESS_DENIED_ERROR, errno 1045, "using password: YES") on the first query of an invocation when the connector runs on Cloud Run with request-based billing (CPU throttled to ~0 between requests, no min-instances) and the service is invoked on a schedule (~every 5 minutes).
The vast majority of invocations succeed. Failures cluster for a stretch and then self-resolve without any deploy or config change.
My working theory: with automatic IAM database authentication the connector uses an OAuth 2.0 access token as the password — which is "short-lived and valid for only one hour" — plus the ephemeral client cert, and refreshes both on a background timer ("Cloud SQL connectors are able to request and refresh these tokens"). Under request-based billing, CPU is only allocated during request processing, and the docs note that running Node.js async/background work outside of a request requires instance-based billing — so that refresh timer is effectively suspended during the idle gap between scheduled invocations. For IAM, the refresh appears to be scheduled close to token expiry (~duration − 4min), so when an idle gap straddles that window the next request reuses a now-expired token. The connect path (stream(), around connector.ts:226-265 in 1.10.0) appears to reuse the cached cert/token without an expiry check, so the expired token reaches the MySQL handshake → 1045. Because this is a post-handshake auth rejection rather than a TLS error, it doesn't trigger the tlsSocket.once('error', () => forceRefresh()) recovery path, so it doesn't self-heal until a later scheduled refresh happens to run while CPU is allocated.
One possible direction: the Python connector supports a lazy refresh strategy (Connector(refresh_strategy="LAZY")) that validates/refreshes credentials on demand at connect time. An equivalent on-demand freshness check in the Node connector (or a public force-refresh API) would cover environments where background timers can't be relied upon.
Example code (or command)
const{ Connector, AuthTypes }=require('@google-cloud/cloud-sql-connector');constmysql=require('mysql2/promise');constconnector=newConnector();constopts=awaitconnector.getOptions({instanceConnectionName: 'PROJECT:REGION:INSTANCE',authType: AuthTypes.IAM,});constpool=mysql.createPool({
...opts,user: 'my-service-account',// IAM principal (no @domain)database: 'my_db',});// Pool is created once and cached across invocations. On Cloud Run with// request-based billing the process is frozen between requests, so the// connector's background refresh timer does not run during idle gaps.// On a later scheduled invocation, the first query intermittently fails:const[rows]=awaitpool.query('SELECT 1');
Stacktrace
Error: Access denied for user '<service-account>'@'cloudsqlproxy~<ip>' (using password: YES)
at Packet.asError (.../node_modules/mysql2/lib/packets/packet.js)
at ClientHandshake.execute (.../node_modules/mysql2/lib/commands/command.js)
at Connection.handlePacket (.../node_modules/mysql2/lib/connection.js)
code: 'ER_ACCESS_DENIED_ERROR',
errno: 1045,
sqlState: '28000'
Invoke it on a schedule (~every 5 minutes) so there's an idle gap between requests during which CPU is throttled.
Over time, the first query of an invocation intermittently fails with errno 1045; failures cluster and then self-resolve.
Environment details
OS: Cloud Run (Linux container), request-based billing, no min-instances
Node.js version: 24
@google-cloud/cloud-sql-connector version: 1.10.0
Database: Cloud SQL for MySQL, IAM database authentication, via mysql2
Steps to reproduce
See "How to reproduce" above.
Workaround I'm using
Tearing down and recreating the connector each invocation (close() + recreate) so the first connection mints fresh credentials. This is OK for my use case of cron jobs since they only run every few minutes and I'm okay with a small penalty instead of turning on CPU always allocated which will be more expensive ($ wise).he
Note
I used Claude Opus 4.8 to debug the issue and produce the below writeup.
However, I did read every word and edited it before submitting 😄
Bug Description
Intermittent IAM authentication failures (
ER_ACCESS_DENIED_ERROR, errno 1045, "using password: YES") on the first query of an invocation when the connector runs on Cloud Run with request-based billing (CPU throttled to ~0 between requests, no min-instances) and the service is invoked on a schedule (~every 5 minutes).The vast majority of invocations succeed. Failures cluster for a stretch and then self-resolve without any deploy or config change.
My working theory: with automatic IAM database authentication the connector uses an OAuth 2.0 access token as the password — which is "short-lived and valid for only one hour" — plus the ephemeral client cert, and refreshes both on a background timer ("Cloud SQL connectors are able to request and refresh these tokens"). Under request-based billing, CPU is only allocated during request processing, and the docs note that running Node.js async/background work outside of a request requires instance-based billing — so that refresh timer is effectively suspended during the idle gap between scheduled invocations. For IAM, the refresh appears to be scheduled close to token expiry (~
duration − 4min), so when an idle gap straddles that window the next request reuses a now-expired token. The connect path (stream(), around connector.ts:226-265 in 1.10.0) appears to reuse the cached cert/token without an expiry check, so the expired token reaches the MySQL handshake → 1045. Because this is a post-handshake auth rejection rather than a TLS error, it doesn't trigger thetlsSocket.once('error', () => forceRefresh())recovery path, so it doesn't self-heal until a later scheduled refresh happens to run while CPU is allocated.One possible direction: the Python connector supports a lazy refresh strategy (
Connector(refresh_strategy="LAZY")) that validates/refreshes credentials on demand at connect time. An equivalent on-demand freshness check in the Node connector (or a public force-refresh API) would cover environments where background timers can't be relied upon.Example code (or command)
Stacktrace
How to reproduce
AuthTypes.IAMand a cached pool.Environment details
@google-cloud/cloud-sql-connectorversion: 1.10.0mysql2Steps to reproduce
Workaround I'm using
Tearing down and recreating the connector each invocation (
close()+ recreate) so the first connection mints fresh credentials. This is OK for my use case of cron jobs since they only run every few minutes and I'm okay with a small penalty instead of turning on CPU always allocated which will be more expensive ($ wise).he