Pooling by KoalaGeo · Pull Request #2345 · geopython/pygeoapi

KoalaGeo · 2026-05-19T14:18:41Z

Overview

Makes the SQLAlchemy connection pool of the SQL provider configurable per provider via the existing options: block, exposing pool_size, max_overflow, pool_recycle, pool_timeout and pool_pre_ping.

Previously get_engine() called create_engine(conn_str, connect_args=connect_args, pool_pre_ping=True) with no pool sizing or recycle, so the default QueuePool held pool_size connections open for the life of each worker process and never recycled them. In multi-process deployments this produces a large number of permanently-IDLE server-side connections (we saw connections idle for days, eventually exhausting max_connections). There was no way to bound or recycle the pool from configuration.

Changes:

store_db_parameters() now extracts the five pool keys from options, coerces them to their declared types, and stores them as a sorted, hashable tuple (self.db_pool_options). They are popped out of options so they are not forwarded to the DBAPI as connect_args.
get_engine() takes a pool_options tuple parameter and applies **dict(pool_options) to create_engine(). It stays @functools.cache-able because the parameter is a hashable tuple, so engine sharing per process is preserved; providers with differing pool config correctly get distinct engines.
pygeoapi/process/manager/postgresql.py also calls get_engine(); its call site is updated to pass self.db_pool_options so the manager does not lose pool_pre_ping or skip recycling.

Backward compatibility: defaults preserve current behaviour exactly — pool_size=5, max_overflow=10, pool_pre_ping=True, and pool_recycle=-1 (SQLAlchemy's default, i.e. the current effective behaviour).

This PR is therefore a pure, opt-in feature add with no behaviour change for existing users. (See the issue for discussion of whether a finite default pool_recycle should be adopted as a separate follow-up.)

New tests and documentation are included.

Related Issue / discussion

Closes #2344.

Additional information

Example configuration:

providers:
  - type: feature
    name: PostgreSQL
    data:
      host: 127.0.0.1
      port: 5432
      dbname: test
      user: postgres
      password: postgres
      search_path: [osm, public]
    options:
      pool_size: 2          # persistent connections per worker process
      max_overflow: 3       # short-lived burst capacity
      pool_recycle: 300     # recycle connections older than 5 minutes
      pool_timeout: 30
    id_field: osm_id
    table: hotosm_bdi_waterways
    geom_field: foo_geom

Note (documented): because get_engine() is @functools.cache-d on its full argument set, providers that share a database must use identical pool options to continue sharing a single engine per worker; differing options intentionally yield separate engines.

Dependency policy (RFC2)

I have ensured that this PR meets RFC2 requirements

No new dependencies are introduced; only the standard library and the already-required SQLAlchemy are used.

Updates to public demo

I have ensured that breaking changes to the pygeoapi master demo server have been addressed
No changes required: defaults preserve existing behaviour, so the demo local.config.yml does not need to change.

Contributions and licensing

I'd like to contribute a bugfix/feature (configurable SQL connection pool) to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution
I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

Added connection pool options for SQL Alchemy engine.

Change pool_recycle to -1 to preserve current behavior.

Added SQLAlchemy connection-pool tuning options to configuration.

test_sql_pool_options.py exercises `store_db_parameters()` directly, requires no database, and runs in standard CI. It asserts the zero-behaviour-change defaults, override + typing, no DBAPI leakage, the existing dict-filtering, hashable/deterministic cache keys, and coexistence with search_path.

webb-ben · 2026-05-20T22:37:20Z

Is there a reason to not pop the attributes from the connect_args inside of get_engine? This would consolidate a bit of the complications noted in the PR between hashing and the manager using get_engine. Maybe I am missing something

ricardogsilva

Just leaving my two cents here - I'm not a core committer so take these with a grain of salt.

Overall I agree with the PR, as adding these connection-related options seems relevant - thanks for your work and I look forward to having it merged!

Personally, I would simplify the implementation a bit, by relying on pygeoapi's JSON Schema document for the validation of the config.

And I would not include most of these tests, which I see as not being relevant.

ricardogsilva · 2026-05-21T13:43:51Z

+    # Defaults keep SQLAlchemy's QueuePool sizing but, unlike SQLAlchemy's
+    # default of -1, recycle connections after an hour so that pooled
+    # connections cannot sit IDLE on the server indefinitely.


This part of the comment seems to be outdated, as you end up setting the default value of pool_recycle to -1

ricardogsilva · 2026-05-21T14:02:09Z

+             # SQLAlchemy connection-pool tuning (optional). Defaults match
+             # SQLAlchemy's QueuePool and preserve previous behaviour.
+             # Persistent connections held open per worker process.
+             pool_size: 5
+             # Extra short-lived connections allowed above pool_size.
+             max_overflow: 10
+             # Recreate connections older than this many seconds. -1 (the
+             # default) never recycles; set a finite value (e.g. 300) so
+             # pooled connections cannot sit IDLE on the server indefinitely.
+             pool_recycle: -1
+             # Seconds to wait for a connection from the pool before erroring.
+             pool_timeout: 30
+             # Test connections with a lightweight ping before use.
+             pool_pre_ping: true


All of these new parameters need to be added to the config schema at

pygeoapi/resources/schemas/config/pygeoapi-config-0.x.yml

This will make it possible to test a pygeoapi configuration for correctness even before starting up the server.

ricardogsilva · 2026-05-21T14:10:20Z

+        (key, type(default)(options.pop(key, default)))
+        for key, default in pool_defaults.items()
+    ))
+


In my opinion this could be made easier to read and less complex by:

Storing self.db_pool_options as a dict instead of a tuple, and defer tuple creation to when get_engine is called;

Relying on the types of passed options already being correct. Adding these new parameters to the config JSON Schema (as I mentioned in my other comment) would mean that the type of each parameter would already be documented and would be enforceable by doing a validation of the config.

Also, note that your implementation contains a subtle bug when trying to parse pool_pre_ping. If the original value had been:

{'pool_pre_ping': 'False'} # I'm passing a string with the value of "False"

then the outcome would be:

# type(True)("False") True

In other words, bool("False") is actually True because non-empty strings are truthy.

-pool_defaults = { - 'pool_size': 5, - 'max_overflow': 10, - 'pool_recycle': -1, # SQLAlchemy default; preserves current behaviour - 'pool_timeout': 30, - 'pool_pre_ping': True, -} -self.db_pool_options = tuple(sorted( - (key, type(default)(options.pop(key, default))) - for key, default in pool_defaults.items() -)) +self.pool_defaults = { + 'pool_size': options.pop('pool_size', 5), + 'max_overflow': options.pop('max_overflow', 10), + 'pool_recycle': options.pop('pool_recycle', -1), # SQLALchemy default - never release connections + 'pool_timeout': options.pop('pool_timeout', 30), + 'pool_pre_ping': options.pop('pool_pre_ping', True), +}

ricardogsilva · 2026-05-21T14:15:47Z

            self.db_user,
            self._db_password,
            self.db_conn,
+            self.db_pool_options,


as per my other comment, in my opinion it would be clearer if the tuple would be generated here, perhaps also accompanied with a comment mentioning that this is made as a way to enable making use of functools.cache.

Also, in modern Python, a dict's insertion ordering is preserved, so I don't think sorting the tuple would be needed.

Suggested change

self.db_pool_options,

tuple(self.db_pool_options.items()), # convert to hashable type, for using with functools.cache

ricardogsilva · 2026-05-21T14:21:52Z

+def test_pool_options_defaults_preserve_current_behaviour():
+    obj = _Dummy()
+    store_db_parameters(obj, dict(CONN), {})
+    pool = dict(obj.db_pool_options)
+    # Defaults must match pre-existing effective behaviour:
+    # pool_pre_ping was hardcoded True; pool_recycle was unset (-1).
+    assert pool['pool_size'] == 5
+    assert pool['max_overflow'] == 10
+    assert pool['pool_timeout'] == 30
+    assert pool['pool_pre_ping'] is True
+    assert pool['pool_recycle'] == -1


This test seems unnecessary to me - when this PR gets merged, the behavior it implements will become the current behavior, so the test looses its relevancy.

ricardogsilva · 2026-05-21T14:30:42Z

+def test_pool_options_are_overridable_and_typed():
+    obj = _Dummy()
+    store_db_parameters(
+        obj, dict(CONN),
+        {'pool_size': 2, 'max_overflow': 3, 'pool_recycle': 300},
+    )
+    pool = dict(obj.db_pool_options)
+    assert pool['pool_size'] == 2 and isinstance(pool['pool_size'], int)
+    assert pool['max_overflow'] == 3
+    assert pool['pool_recycle'] == 300
+    # untouched keys keep defaults
+    assert pool['pool_timeout'] == 30
+    assert pool['pool_pre_ping'] is True


This test would be unnecessary if you'd go with my suggestion above, of storing db_pool_options as a dict instead of a tuple and you'd rely on the configuration being valid after having added the JSON schema bits that are missing.

ricardogsilva · 2026-05-21T14:32:21Z

+def test_dict_valued_options_still_filtered():
+    obj = _Dummy()
+    store_db_parameters(
+        obj, dict(CONN),
+        {'pool_size': 2, 'zoom': {'min': 0, 'max': 22}},
+    )
+    assert 'zoom' not in obj.db_options
+    assert dict(obj.db_pool_options)['pool_size'] == 2
+


This test seems to be unnecessary, as it is not testing the changes you made in this PR. It verifies that the contents of obj.db_options are correct.

IMO this PR does not make any changes that would warrant this verification, unless you would not trust the behavior of dict.pop, which is a Python builtin.

ricardogsilva · 2026-05-21T14:35:05Z

+def test_pool_options_hashable_and_deterministic():
+    a, b = _Dummy(), _Dummy()
+    store_db_parameters(a, dict(CONN), {'pool_size': 2})
+    store_db_parameters(b, dict(CONN), {'pool_size': 2})
+    # identical config -> identical key -> shared engine via functools.cache
+    assert a.db_pool_options == b.db_pool_options
+    assert hash(a.db_pool_options) == hash(b.db_pool_options)
+
+    c = _Dummy()
+    store_db_parameters(c, dict(CONN), {'pool_size': 9})
+    # differing pool config -> distinct key (separate engine, by design)
+    assert c.db_pool_options != a.db_pool_options
+


This is testing Python's own implementation of how tuples are hashed, so I don't think it is relevant to include in pygeoapi.

ricardogsilva · 2026-05-21T14:39:17Z

+def test_pool_options_coexist_with_search_path():
+    obj = _Dummy()
+    store_db_parameters(
+        obj, dict(CONN),
+        {'search_path': ['published', 'public'], 'pool_size': 4},
+    )
+    assert obj.db_search_path == ('published', 'public')
+    assert dict(obj.db_pool_options)['pool_size'] == 4
+


This test seems unnecessary, as it does not test the functionality introduced in this PR

KoalaGeo added 5 commits May 19, 2026 14:56

Enhance SQL Alchemy engine with connection pool options

37f428c

Added connection pool options for SQL Alchemy engine.

Add db_pool_options to PostgreSQL connection

5841ed9

Update pool_recycle to SQLAlchemy default value

bc68af4

Change pool_recycle to -1 to preserve current behavior.

Enhance SQLAlchemy connection pooling settings

cd9c836

Added SQLAlchemy connection-pool tuning options to configuration.

tomkralidis requested review from francbartoli, tomkralidis and webb-ben May 20, 2026 12:01

tomkralidis added this to the 0.24.0 milestone May 20, 2026

ricardogsilva reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pooling#2345

Pooling#2345
KoalaGeo wants to merge 5 commits into
geopython:masterfrom
KoalaGeo:pooling

KoalaGeo commented May 19, 2026

Uh oh!

webb-ben commented May 20, 2026

Uh oh!

ricardogsilva left a comment

Uh oh!

ricardogsilva May 21, 2026

Uh oh!

ricardogsilva May 21, 2026

Uh oh!

ricardogsilva May 21, 2026

Uh oh!

ricardogsilva May 21, 2026 •

edited

Loading

Uh oh!

ricardogsilva May 21, 2026

Uh oh!

ricardogsilva May 21, 2026

Uh oh!

ricardogsilva May 21, 2026

Uh oh!

ricardogsilva May 21, 2026

Uh oh!

ricardogsilva May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	self.db_pool_options,
	tuple(self.db_pool_options.items()), # convert to hashable type, for using with functools.cache

Uh oh!

Conversation

KoalaGeo commented May 19, 2026

Overview

Related Issue / discussion

Additional information

Dependency policy (RFC2)

Updates to public demo

Contributions and licensing

Uh oh!

webb-ben commented May 20, 2026

Uh oh!

ricardogsilva left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardogsilva May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ricardogsilva May 21, 2026 •

edited

Loading