Skip to content

Add Safe Ecto Migration guides#720

Open
dbernheisel wants to merge 5 commits intoelixir-ecto:masterfrom
dbernheisel:dbern/safe-ecto-migration
Open

Add Safe Ecto Migration guides#720
dbernheisel wants to merge 5 commits intoelixir-ecto:masterfrom
dbernheisel:dbern/safe-ecto-migration

Conversation

@dbernheisel
Copy link
Contributor

No description provided.


### Good

**Strategy 1**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call these **Option 1** and so on for consistency with the rest of the guide?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively rename Option 1 to Strategy. :)


Here's how we'll manage the backfill:

1. Create a "temporary" table. In this example, we're creating a real table that we'll drop at the end of the data migration. In Postgres, there are [actual temporary tables](https://www.postgresql.org/docs/12/sql-createtable.html) that are discarded after the session is over; we're not using those because we need resiliency in case the data migration encounters an error. The error would cause the session to be over, and therefore the temporary table tracking progress would be lost. Real tables don't have this problem. Likewise, we don't want to store IDs in application memory during the migration for the same reason.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a temporary table, couldn't we use a temporary column? Or is the issue that removing the column later would be expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It assumes the backfill is about one column, and if so that could be another option, but it doesn't have to be: the user might be doing multiple fixes and write to multiple columns.

The point is that we need to store a list of records that a generic operation needs to run on.

Copy link
Member

@greg-rychlewski greg-rychlewski Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jose means you add a column to say "fix applied: yes or no". Then when you iterate through your table doing the updates you use that column to prevent yourself from applying the fix more than once in case of restarts. And whatever filter you used to populate your temporary table can be used to stop the iterating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I understand now. Yeah that could be a good approach but yes also have to consider the same gotchas with adding columns with defaults on large existing tables.

I find a separate temporary table safer and less coupled. Less likely to fool with application logic

@josevalim
Copy link
Member

Thank you @dbernheisel! This looks excellent and I do have some quick feedback:

  1. We should probably check the EOL version of the databases and remove advice that applies only to old versions (e.g. before PostgreSQL 12)

  2. Some part of the guides are too specific to PostgreSQL and I think we should either make them consistent for all databases or remove the PostgreSQL. My suggestion, to make this easier on everyone, is to remove the PG specifics bit for, pretty much the reference manual on Safe Migrations and some of the locking bits in the anatomy of a migration. If you agree, I can push those changes. Can I go ahead?

@dbernheisel
Copy link
Contributor Author

Absolutely. I'll go through feedback today

@dbernheisel
Copy link
Contributor Author

The gotchas are important however for each flavor.

The MyXQL adapter uses advisory locks so there are less issues with transactions.

The Postgres adapter does not default to advisory locks (maybe it should?) so that it can avoid some gotchas

MSSQL I have less experience so more verification is needed

Sqlite3 is different enough where I don't suspect these same gotchas apply, but again more verification is needed.

I think the recipes could be formatted better to have callouts per adapter? It does cover MySQL and Postgres, but is silent on the others.

@josevalim
Copy link
Member

The gotchas are important however for each flavor.

Yeah, I was not asking to remove those, those are really great! It was mostly the debugging locks bits and the reference material, so I pushed my changes already. We could have a separate document on debugging PG locks but as I said, easier to do in a future PR. :)

From my point of view, nothing else needs to removed. There are still some PG specific commands still but it is very easy for someone to consult those for other databases.

@dbernheisel
Copy link
Contributor Author

Sorry I left it stale. I took care of some feedback. Let me know what else could be improved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants