The standard database definition tool for Ruby on Rails is ActiveRecord’s Migration. It allows version control by writing database definition changes in Ruby scripts instead of SQL.
However, with migration, as the number of DB definition changes increases, innumerable files will be generated and the outlook will be poor.
By using
ridgepole, DB definitions can be aggregated into one configuration file using the same notation as migration, and it can be operated with idempotent settings.
Also, switching the RDBMS driver makes it easy to migrate schemas between databases such as PostgreSQL / MySQL / SQLite.
Cookpad uses ridgepole on production DB.
There is a similar tool sqldef with the same purpose. The choice between them depends on the project’s priorities.
Extended Type support
ridgepole can also take advantage of the extensions supported by ActiveRecord. There is an explanation in
PostgreSQL Guide to Rails.
There is a special type, but when I checked
UUID, Byta, and JSON, I was able to migrate.
Utilization for frameworks other than Rails
Although ridgepole is a tool targeted at Rails, it can be used for other purposes as it specializes in schema management of RDBMS.
For example, some PHP web frameworks manage schemas by migration, and the advantages and disadvantages are the same.
After all, DB management requires coding and management different from the mainline application, so there is room to use ridgepole.
However, you need to understand the terms of the web framework you are using and the schema on the DB before using it.
Installation and basic usage
First of all, you need to have backup of existing DB. Operation mistake can cause data loss.
The setup is bundle install
by adding gem "ridgepole"
to the Gemfile when using bundler.
The basic usage of ridgepole is to describe the database definition in a file called Schemafile and reflect the settings in the DBMS with the ridgepole command as shown below:
$ bundle exec ridgepole -c config/database.yml -f config/Schemafile -a
This command issues a DDL so that the database schema is in the same state as the Schemafile definition. (You can check the contents of DDL by adding the --dry-run
option.)
The argument -a is apply and -c specifies the Rails standard database connection configuration file database.yml. database.yml is exactly the same as the general Rails connection settings.
In addition to column definition, index and compound index can be defined in Schemafile. Since the table definition is organized, you can almost understand the class structure by reading the Schemafile without describing it in MySQL.
Specify environment variables for database.yml from the command line
In the docker / kubernetes environment, there are cases where you want to specify a variable in database.yml using erb because the connection host of database is dynamically determined.
In Rails, the connection information can be retrieved from the environment variable with the following description:
host: <%= ENV['PG1_SERVICE_HOST'] %>
password: <%= ENV['POSTGRES_PASSWORD'] %>
But I encountered a case where the ridgepole command cannot process the template variable.
It worked as expected by setting it with the -E option.
$ ridgepole -c config/database.yml -f config/Schemafile -E PG1_SERVICE_HOST=192.168.0.1 -E POSTGRES_PASSWORD=connectionpass -a
Usage for Rails
CREATE DATABASE
/DROP DATABASE
はRails標準のrails db
を利用して、以下のようなコマンドセットでライフサイクル管理が可能になります。(PostgreSQL/MySQLに直接接続してCREATE DATABASEする手順でもとくに問題なく動作します)
Elementary notes
When using ridgepole, it is important not to use migration together. At first glance, it is natural, for example, when you generate a model with the rails command, a migration file is also generated by default, so add the option not to create migration as follows:
$ rails g model SomeNewModel --skip-migration
If an unnecessary migration file exists, an error will occur at runtime. If you accidentally generate a migration file, simply delete the migration file in db / migrate / with rm.
Also, as a result of idempotent actions, if you write it in a straightforward manner with the intention of changing the table name / column name, the data may disappear, so be careful. The procedure for renaming is explained in Procedure for renaming table name / column name with ridgepole.
Instance restart after schema change
After changing the DB schema with the ridgepole command, it seems that the AP server instance should be restarted in order to be recognized by Rails.
After adding the column, when I accessed it without restarting, I faced an error like ActiveModel :: MissingAttributeError (can’t write unknown attribute.), And when I restarted it, it was fixed.
Introducing to the existing DB in operation
If you want to install ridgepole in the middle of the schema management of the existing database, you can use the schema dump of the -e option.
If you describe the database connection settings (host, ID, password, etc.) in config.yml and execute as follows, you can get the schema configuration file that defines the set of tables in the database.
$ ridgepole -c config.yml -e -o Schemafile
It was dangerous to divide the Schemafile
In version 1.0, even if an undefined table exists in the real DB, the table will not be deleted, which makes it more secure.
In version 0.9 or earlier, if a file is divided by table and only some files are referenced and applied, the table without description will be deleted as DROP TABLE works.
In general, it’s safe to make sure you’re using version 1.0 or later.
Add CONSTRAINT
For example, in order to use UPSERT in PostgreSQL, a unique constraint is required, but it is not simple to implement CONSTRAINT
in idempotent.
Discussion of Postgres: Add constraint if it doesn’t already exist when trying to implement in SQL It is necessary to devise something like.
With ridgepole, execution conditions can be defined in the execute
block, so ADD CONSTAINT can be executed conditionally in the same way as IF NOT EXIST.
An example of a foreign key constraint is posted on the official page, and if you rewrite it for PostgreSQL, it will look like the following:
execute("ALTER TABLE books ADD CONSTRAINT fk_author FOREIGN KEY (author_id) REFERENCES authors (id)") do |c|
c.raw_connection.query(<<-SQL).each.size.zero?
SELECT 1 FROM information_schema.table_constraints
WHERE table_name = 'books' AND constraint_name = 'fk_author' LIMIT 1
SQL
end
Also, in the case of PostgreSQL, the constraint and the index are different things related to each other, and the constraint requires an index, so if you do not create the index with the same name with add_index
, an error will occur at the second execution.
An example of a unique index is as follows:
add_index("books", ["author_id"], name: "unique_author", unique: true)
execute("ALTER TABLE books ADD CONSTRAINT unique_author UNIQUE USING INDEX unique_author") do |c|
c.raw_connection.query(<<-SQL).each.size.zero?
SELECT 1 FROM information_schema.table_constraints
WHERE table_name = 'books' AND constraint_name = 'unique_author' LIMIT 1
SQL
end
The technique of defining the index with add_index
and referencing it with USING INDEX
is required from the constraint.
If you use any other method, conflicts will occur between the first execution and the second and subsequent executions, resulting in an error.
How to write UPSERT using this constraint is explained in
INSERT duplicate record in PostgreSQL.
Since UPSERT can also be executed with execute ()
, it is also a good idea to set up the master record, which is a system requirement, with ridgepole.
The basic function of ridgepole is to manage DDL, but if records with a life cycle similar to DDL are also aggregated, the cohesiveness of the code will be improved.
Resources without REPLACE
Resources without the OR REPLACE
option, such as TYPE, can be executed idempotently with ridgepole by adding exception handling at the time of duplication as shown below:
execute("DO $$ BEGIN
CREATE TYPE comment AS (
id bigint,
description text
);
EXCEPTION
WHEN duplicate_object THEN null;
END $$;")
Keep in mind that if you want to make changes, you will have to delete and re-CREATE once as REPLACE is not done.
Chuma Takahiro