What happens?
We started seeing duplicated rows when extracting data from PostgresDB and noticed that it happens when the row is updated/created when different transaction starts, which seems to be similar to what was reported back in #163.
I checked our logs with the flag pg_debug_show_queries=true after I saw this change and validated all connections are using BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ READ ONLY as expected.
However, I noticed SET TRANSACTION SNAPSHOT 'some-snapshot-id' is never called and neither the expected query before that: SELECT pg_is_in_recovery(), pg_export_snapshot(), (select count(*) from pg_stat_wal_receiver).
This query works when executed directly in our database and provides the following result:
SELECT pg_is_in_recovery(), pg_export_snapshot(), (select count(*) from pg_stat_wal_receiver);
pg_is_in_recovery | pg_export_snapshot | count
-------------------+---------------------+-------
f | 0000102C-00400680-1 | 0
I suspect the code is not reaching the part where it gets pg_export_snapshot for some reason and therefore affecting the results of the reading operation even when the isolation level is correct. Our workflow has pg_connection_limit=4.
To Reproduce
A bit hard to reproduce since the update operation of a row needs to happen between the start of concurrent transactions (we see this issue happening a couple of times per week), but we're happy to test some code in our workflows.
OS:
Linux
PostgreSQL Version:
17.5
DuckDB Version:
1.4.1
DuckDB Client:
Python
Full Name:
Rina Sakurai
Affiliation:
Mentimeter
Have you tried this on the latest main branch?
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
What happens?
We started seeing duplicated rows when extracting data from PostgresDB and noticed that it happens when the row is updated/created when different transaction starts, which seems to be similar to what was reported back in #163.
I checked our logs with the flag
pg_debug_show_queries=trueafter I saw this change and validated all connections are usingBEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ READ ONLYas expected.However, I noticed
SET TRANSACTION SNAPSHOT 'some-snapshot-id'is never called and neither the expected query before that:SELECT pg_is_in_recovery(), pg_export_snapshot(), (select count(*) from pg_stat_wal_receiver).This query works when executed directly in our database and provides the following result:
I suspect the code is not reaching the part where it gets
pg_export_snapshotfor some reason and therefore affecting the results of the reading operation even when the isolation level is correct. Our workflow haspg_connection_limit=4.To Reproduce
A bit hard to reproduce since the update operation of a row needs to happen between the start of concurrent transactions (we see this issue happening a couple of times per week), but we're happy to test some code in our workflows.
OS:
Linux
PostgreSQL Version:
17.5
DuckDB Version:
1.4.1
DuckDB Client:
Python
Full Name:
Rina Sakurai
Affiliation:
Mentimeter
Have you tried this on the latest
mainbranch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?