Ridgepole: Benefits and usage

By using ridgepole, DB definitions can be aggregated into one configuration file using the same notation as migration, and it can be operated with idempotent settings.
Also, switching the RDBMS driver makes it easy to migrate schemas between databases such as PostgreSQL / MySQL / SQLite.

Cookpad uses ridgepole on production DB.

There is a similar tool sqldef with the same purpose. The choice between them depends on the project’s priorities.

Extended Type support

ridgepole can also take advantage of the extensions supported by ActiveRecord. There is an explanation in PostgreSQL Guide to Rails.
There is a special type, but when I checked UUID, Byta, and JSON, I was able to migrate.

Utilization for frameworks other than Rails

Although ridgepole is a tool targeted at Rails, it can be used for other purposes as it specializes in schema management of RDBMS.
For example, some PHP web frameworks manage schemas by migration, and the advantages and disadvantages are the same.

After all, DB management requires coding and management different from the mainline application, so there is room to use ridgepole.
However, you need to understand the terms of the web framework you are using and the schema on the DB before using it.

Installation and basic usage

First of all, you need to have backup of existing DB. Operation mistake can cause data loss.

The setup is bundle install by adding gem "ridgepole" to the Gemfile when using bundler.

The basic usage of ridgepole is to describe the database definition in a file called Schemafile and reflect the settings in the DBMS with the ridgepole command as shown below:

$ bundle exec ridgepole -c config/database.yml -f config/Schemafile -a

This command issues a DDL so that the database schema is in the same state as the Schemafile definition. (You can check the contents of DDL by adding the --dry-run option.)

The argument -a is apply and -c specifies the Rails standard database connection configuration file database.yml. database.yml is exactly the same as the general Rails connection settings.

In addition to column definition, index and compound index can be defined in Schemafile. Since the table definition is organized, you can almost understand the class structure by reading the Schemafile without describing it in MySQL.

Specify environment variables for database.yml from the command line

In the docker / kubernetes environment, there are cases where you want to specify a variable in database.yml using erb because the connection host of database is dynamically determined.

In Rails, the connection information can be retrieved from the environment variable with the following description:

host: &lt;%= ENV['PG1_SERVICE_HOST'] %&gt;
password: &lt;%= ENV['POSTGRES_PASSWORD'] %&gt;

But I encountered a case where the ridgepole command cannot process the template variable.
It worked as expected by setting it with the -E option.

$ ridgepole -c config/database.yml -f config/Schemafile -E PG1_SERVICE_HOST=192.168.0.1 -E POSTGRES_PASSWORD=connectionpass -a

Usage for Rails

CREATE DATABASE/DROP DATABASEはRails標準のrails dbを利用して、以下のようなコマンドセットでライフサイクル管理が可能になります。（PostgreSQL/MySQLに直接接続してCREATE DATABASEする手順でもとくに問題なく動作します）

Elementary notes

When using ridgepole, it is important not to use migration together. At first glance, it is natural, for example, when you generate a model with the rails command, a migration file is also generated by default, so add the option not to create migration as follows:

$ rails g model SomeNewModel --skip-migration

If an unnecessary migration file exists, an error will occur at runtime. If you accidentally generate a migration file, simply delete the migration file in db / migrate / with rm.

Also, as a result of idempotent actions, if you write it in a straightforward manner with the intention of changing the table name / column name, the data may disappear, so be careful. The procedure for renaming is explained in Procedure for renaming table name / column name with ridgepole.

Instance restart after schema change

After changing the DB schema with the ridgepole command, it seems that the AP server instance should be restarted in order to be recognized by Rails.

After adding the column, when I accessed it without restarting, I faced an error like ActiveModel :: MissingAttributeError (can’t write unknown attribute.), And when I restarted it, it was fixed.

Introducing to the existing DB in operation

If you want to install ridgepole in the middle of the schema management of the existing database, you can use the schema dump of the -e option.

If you describe the database connection settings (host, ID, password, etc.) in config.yml and execute as follows, you can get the schema configuration file that defines the set of tables in the database.

$ ridgepole -c config.yml -e -o Schemafile

It was dangerous to divide the Schemafile

In version 1.0, even if an undefined table exists in the real DB, the table will not be deleted, which makes it more secure.
In version 0.9 or earlier, if a file is divided by table and only some files are referenced and applied, the table without description will be deleted as DROP TABLE works.

In general, it’s safe to make sure you’re using version 1.0 or later.

Add CONSTRAINT

For example, in order to use UPSERT in PostgreSQL, a unique constraint is required, but it is not simple to implement CONSTRAINT in idempotent.

Discussion of Postgres: Add constraint if it doesn’t already exist when trying to implement in SQL It is necessary to devise something like.

With ridgepole, execution conditions can be defined in the execute block, so ADD CONSTAINT can be executed conditionally in the same way as IF NOT EXIST.
An example of a foreign key constraint is posted on the official page, and if you rewrite it for PostgreSQL, it will look like the following:

execute("ALTER TABLE books ADD CONSTRAINT fk_author FOREIGN KEY (author_id) REFERENCES authors (id)") do |c|
  c.raw_connection.query(<<-SQL).each.size.zero?
    SELECT 1 FROM information_schema.table_constraints
    WHERE table_name = 'books' AND constraint_name = 'fk_author' LIMIT 1
  SQL
end

Also, in the case of PostgreSQL, the constraint and the index are different things related to each other, and the constraint requires an index, so if you do not create the index with the same name with add_index, an error will occur at the second execution.

An example of a unique index is as follows:

add_index("books", ["author_id"], name: "unique_author", unique: true)
execute("ALTER TABLE books ADD CONSTRAINT unique_author UNIQUE USING INDEX unique_author") do |c|
  c.raw_connection.query(<<-SQL).each.size.zero?
    SELECT 1 FROM information_schema.table_constraints
    WHERE table_name = 'books' AND constraint_name = 'unique_author' LIMIT 1
  SQL
end

The technique of defining the index with add_index and referencing it with USING INDEX is required from the constraint.
If you use any other method, conflicts will occur between the first execution and the second and subsequent executions, resulting in an error.

How to write UPSERT using this constraint is explained in INSERT duplicate record in PostgreSQL. Since UPSERT can also be executed with execute (), it is also a good idea to set up the master record, which is a system requirement, with ridgepole.

The basic function of ridgepole is to manage DDL, but if records with a life cycle similar to DDL are also aggregated, the cohesiveness of the code will be improved.

Resources without REPLACE

Resources without the OR REPLACE option, such as TYPE, can be executed idempotently with ridgepole by adding exception handling at the time of duplication as shown below:

execute("DO $$ BEGIN
    CREATE TYPE comment AS (
      id bigint,
      description text
    );
  EXCEPTION
    WHEN duplicate_object THEN null;
  END $$;")

Keep in mind that if you want to make changes, you will have to delete and re-CREATE once as REPLACE is not done.