Fly.io runs apps close to users, by transmuting Docker containers into micro-VMs that run on our own hardware around the world. This post is part of the Safe Ecto Migrations series guide. If you just want to ship your Phoenix app, the easiest way to learn more is to try it out; you can be up and running in just a couple minutes.
This is part 3 in a 4-part series on designing and running Safe Ecto Migrations:
- Part 1 - Anatomy of an Ecto migration
- Part 2 - How to migrate Mix Release projects
- Part 3 - Migration Recipes (you are here)
- Part 4 - Backfilling Data
This is a non-exhaustive guide on common migration scenarios and how to avoid trouble.
These migration recipes may evolve over time, so be sure to check the git edition of these recipes at https://github.com/fly-apps/safe-ecto-migrations with up-to-date information.
- Adding an index
- Adding a reference or foreign key
- Adding a column with a default value
- Changing the type of a column
- Removing a column
- Renaming a column
- Renaming a table
- Adding a check constraint
- Setting NOT NULL on an existing column
- Adding a JSON column
Adding an index
Creating an index blocks both reads and writes. This scenario is used as an example in the “How to inspect locks in a query” section.
Bad ❌
def change do
create index("posts", [:slug])
# This obtains a ShareLock on "posts" which will block writes to the table
end
Good ✅
Instead, have Postgres create the index concurrently which does not block reads. To do this, you have to disable the migration transactions.
@disable_ddl_transaction true
@disable_migration_lock true
def change do
create index("posts", [:slug], concurrently: true)
end
While the migration may still take some time to run, it does not block reads and updates on rows. For example, indexing 100,000,000 rows took 165 seconds (2.75 minutes) to run the migration, but it didn’t lock up the table for selects and updates while it was running!
When disabling transactions in a migration like this, do not make other changes in the same migration! Only create the index concurrently. Use separate migrations for other changes. Separating these changes into their own migrations will make failures clear and helps prevent leaving the database in a mangled state. For example, if half of the steps in a migration succeed but the the remaining half failed— trying to re-run the migration will be confusing and cause problems.
Adding a reference or foreign key constraint
Adding a foreign key blocks writes on both tables.
When a foreign key constraint is added, two things happen:
- It creates a new constraint for changing records going forward
- It validates the new constraint for existing records
If these commands are happening at the same time, it obtains a lock on the table as it validates and scans the entire table. To avoid a full table scan, we can separate the operations.
Bad ❌
def change do
alter table("posts") do
add :group_id, references("groups")
end
end
Good ✅
In the first migration:
def change do
alter table("posts") do
add :group_id, references("groups", validate: false)
end
end
In the next migration:
def change do
execute "ALTER TABLE posts VALIDATE CONSTRAINT group_id_fkey", ""
end
Adding a column with a default value
Adding a column with a default value to an existing table may cause the table to be rewritten. While that happens, Postgres blocks both reads and writes, and MySQL and MariaDB block writes.
Bad ❌
Note: This becomes safe in:
def change do
alter table("comments") do
add :approved, :boolean, default: false
# This took 34 seconds for 10 million rows with no fkeys,
# This took 10 minutes for 100 million rows with no fkeys,
# Obtained an AccessExclusiveLock on the table, which blocks reads and
# writes.
end
end
Good ✅
Add the column first and then alter it to include the default.
First migration:
def change do
alter table("comments") do
add :approved, :boolean
# This took 0.27 milliseconds for 100 million rows with no fkeys,
end
end
Second migration:
def change do
alter table("comments") do
modify :approved, :boolean, default: false
# This took 0.28 milliseconds for 100 million rows with no fkeys,
end
end
Schema change to read the new column:
schema "comments" do
+ field :approved, :boolean, default: false
end
In the Ecto schema, the default: false
isn’t necessary, but it matches what the database will do so it avoids reading back the field on inserts and behaves consistently.
Changing the type of a column
Changing the type of a column may cause the table to be rewritten. While that happens, Postgres blocks both reads and writes, and MySQL and MariaDB block writes.
Bad ❌
Safe in Postgres:
- increasing length on varchar or removing the limit
- changing varchar to text
- changing text to varchar with no length limit
- Postgres 9.2+ - increasing precision (NOTE: not scale) of decimal or numeric columns. eg, increasing 8,2 to 10,2 is safe. Increasing 8,2 to 8,4 is not safe.
- Postgres 9.2+ - changing decimal or numeric to be unconstrained
- Postgres 12+ - changing timestamp to timestamptz when session TZ is UTC
Safe in MySQL/MariaDB:
- increasing length of varchar from < 255 up to 255.
- increasing length of varchar from > 255 up to max.
def change do
alter table("posts") do
modify :my_column, :boolean, :text
end
end
Good ✅
Multi deployment strategy:
- Create a new column
- In application code, write to both columns
- Backfill data from old column to new column
- In application code, move reads from old column to the new column
- In application code, remove old column from Ecto schemas.
- Drop the old column.
Removing a column
Removing a column happens when we are “cleaning up” or restructuring the database as it evolves to solve different problems.
If your application deployment rolls out new versions while the old version is still running, then running database structure migrations that change tables out from under the older running app creates problems when they query for data. Queries may fail because the database is no longer structured how the code expects. This scenario is often forgotten or ignored. However, there are safe ways to remove columns that don’t temporarily break things as they roll out.
Bad ❌
# Without a code change to the Ecto Schema
def change
alter table("posts") do
remove :no_longer_needed_column
# Obtained an AccessExclusiveLock on the table, which blocks reads and
# writes, but was instantaneous.
end
end
Good ✅
A “multi-stage” deployment makes removing a column go smoothly. First, remove references to the column from the application so it’s no longer loaded or queried. Then a second deploy safely removes the column from the table.
First deploy, edit the Ecto schema:
defmodule MyApp.Post do
schema "posts" do
- column :no_longer_needed_column, :text
end
end
Second deploy, run with the migration:
def change
alter table("posts") do
remove :no_longer_needed_column
end
end
Renaming a column
Ask yourself: “Do I really need to rename a column?”. Probably not, but if you must, read on and be aware it requires time and effort.
Any running instances of your application that expect the field to still have the old name will fail when the database structure changes. This happens when you have multiple application instances running and you roll out new versions.
There is a shortcut: don’t rename the database column. Instead, rename the schema’s field name and configure it to point to the database column.
Bad ❌
In your schema:
schema "posts" do
field :summary, :text
end
In your migration
def change do
rename table("posts"), :title, to: :summary
end
Good ✅
Strategy 1
Rename the field in the Ecto schema only. Configure the schema field to point to the unchanged column name. Ensure all calling code referencing the old field name is updated to reference the new field name.
defmodule MyApp.MySchema do
use Ecto.Schema
schema "weather" do
field :temp_lo, :integer
field :temp_hi, :integer
- field :prcp, :float
+ field :precipitation, :float, source: :prcp
field :city, :string
timestamps(type: :naive_datetime_usec)
end
end
Update references in other parts of the codebase:
my_schema = Repo.get(MySchema, "my_id")
- my_schema.prcp
+ my_schema.precipitation
Strategy 2
Take a “multi-stage” approach:
- Create a new column
- In application code, write to both columns
- Backfill data from old column to new column
- In application code, move reads from old column to the new column
- In application code, remove old column from Ecto schemas.
- Drop the old column.
Renaming a table
Ask yourself: “Do I really need to rename a table?”. Probably not, but if you must, read on and be aware it requires time and effort.
Any running instances of your application that expect the table to still have the old name will fail when the database structure changes. This happens when you have multiple application instances running and you roll out new versions.
There is a shortcut: rename the schema only, and do not change the underlying database table name.
Bad ❌
def change do
rename table("posts"), to: table("articles")
end
Good ✅
- Create the new table. This should include creating new constraints (checks and foreign keys) that mimic the behavior of the old table.
- In application code, write to both tables, continuing to read from the old table.
- Backfill data from old table to new table
- In application code, move reads from old table to the new table
- In application code, remove the old table from Ecto schemas.
- Drop the old table.
Adding a check constraint
Adding a check constraint blocks reads and writes to the table in Postgres, and blocks writes in MySQL/MariaDB while every row is checked.
When a check constraint is added, two things happen:
- It creates a new constraint for changing records going forward
- It validates the new constraint for existing records
When these happening at the same time, it obtains a lock on the table as it validates and fully scans the entire table. To avoid a full table scan, we can separate the operations.
Bad ❌
def change do
create constraint("products", :price_must_be_positive, check: "price > 0")
# Creating the constraint with validate: true (the default when unspecified)
# will perform a full table scan and acquires a lock preventing updates
end
Good ✅
First migration, add the constraint but don’t let it validate:
def change do
create constraint("products", :price_must_be_positive, check: "price > 0"), validate: false
# Setting validate: false will prevent a full table scan, and therefore
# commits immediately.
end
Second migration, validate the constraint:
def change do
execute "ALTER TABLE products VALIDATE CONSTRAINT price_must_be_positive", ""
# Acquires SHARE UPDATE EXCLUSIVE lock, which allows updates to continue
end
These can be in the same deployment, but ensure they are 2 separate migrations.
Setting NOT NULL on an existing column
Setting NOT NULL on an existing column blocks reads and writes while every row is checked. Just like the Adding a check constraint scenario, two things are happening:
- It creates a new constraint for changing records going forward
- It validates the new constraint for existing records
To avoid the full table scan, we can separate these two operations.
Bad ❌
def change do
alter table("products") do
modify :active, :boolean, null: false
end
end
Good ✅
Add a check constraint without validating it, backfill data to satisfy the constraint and then validate it. This is functionally equivalent.
In the first migration:
def change do
create constraint("products", :active_not_null, check: "active IS NOT NULL"), validate: false
end
This enforces the constraint in all new rows, but does not care about existing rows until a row is updated. You’ll likely also need a data migration to ensure that the constraint is satisfied.
Then in the next deployment’s migration, enforce the constraint on all rows:
def change do
execute "ALTER TABLE products VALIDATE CONSTRAINT active_not_null", ""
end
If you’re using Postgres 12+, you can add the NOT NULL
constraint to the column after validating the constraint. From the Postgres 12 docs:
SET NOT NULL
may only be applied to a column provided none of the records in the table contain a null value for the column. Ordinarily this is checked during theALTER TABLE
by scanning the entire table; however, if a valid check constraint is found which proves no null can exist, then the table scan is skipped.
# **Postgres 12+ only**
def change do
execute "ALTER TABLE products VALIDATE CONSTRAINT active_not_null", ""
alter table("products") do
modify :active, :boolean, null: false
end
drop constraint("products", :active_not_null)
end
If the constraint fails, then first consider backfilling data to cover the gaps in your desired data integrity, then revisit validating the constraint.
Adding a JSON column
In Postgres, there is no equality operator for the json column type, which can cause errors for existing SELECT DISTINCT
queries in your application.
Bad ❌
def change do
alter table("posts") do
add :extra_data, :json
end
end
Good ✅
Use jsonb instead. Some say it’s just like “json” but “better.”
def change do
alter table("posts") do
add :extra_data, :jsonb
end
end
Where to next?
Next finish up our journey by seeing how to safely fill holes in our data created by changing the structure of our tables in “Backfilling Data”!
References
These recipes took a lot of inspiration from Andrew Kane and his library strong_migrations.
PostgreSQL at Scale by James Coleman
Strong Migrations by Andrew Kane
Adding a NOT NULL CONSTRAINT on PG Faster with Minimal Locking
Postgres Runtime Configuration
Automatic and Manual Ecto Migrations by Wojtek Mach
Reference Material
Current Lock → | ||||||||
---|---|---|---|---|---|---|---|---|
Requested Lock ↓ | ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE |
ACCESS SHARE | X | |||||||
ROW SHARE | X | X | ||||||
ROW EXCLUSIVE | X | X | X | X | ||||
SHARE UPDATE EXCLUSIVE | X | X | X | X | X | |||
SHARE | X | X | X | X | X | |||
SHARE ROW EXCLUSIVE | X | X | X | X | X | X | ||
EXCLUSIVE | X | X | X | X | X | X | X | |
ACCESS EXCLUSIVE | X | X | X | X | X | X | X | X |
- SELECT acquires a ACCESS SHARE lock
- SELECT FOR UPDATE acquires a ROW SHARE lock
- UPDATE, DELETE, and INSERT will acquire a ROW EXCLUSIVE lock
- CREATE INDEX CONCURRENTLY and VALIDATE CONSTRAINT acquires SHARE UPDATE EXCLUSIVE
- CREATE INDEX acquires SHARE lock