INSERT duplicate records into PostgreSQL

DBMS has the unique index feature that protects against INSERTing duplicate records, and at a minimum, the primary key must be unique.

In cases like testing, where you might want to enter the same data, including the primary key, you need to be clever about how you use INSERT statement.

Not INSERT if it’s a duplicate

The approach of detecting duplicates and canceling INSERT is straightforward, resulting the existing record takes precedence.

INSERT INTO some_table
  (id, name) VALUES (1, 'test record')
  ON CONFLICT DO NOTHING;

This way, by adding the ON CONFLICT DO NOTHING clause, INSERT will only occur when there are no duplicate records.

An alternative method to achieve the same effect is to use WHERE NOT EXISTS for checking, but ON CONFLICT DO NOTHING is more convenient as it is a fixed clause.

UPSERT - UPDATE when duplicate

In the case of detecting duplicates and wanting to overwrite resulting the new record wins, commonly known as UPSERT, a query would be as follows:

INSERT INTO some_table
  (id, name) VALUES (1, 'test record')
  ON CONFLICT (id)
  DO UPDATE SET name='test record';

The query becomes lengthy because it is necessary to specify the condition for detecting duplicates (id) and UPDATE clause.

If you want to UPSERT on a column other than the primary key, you need to create a unique index in advance.
A composite unique key, which uses multiple columns for judgment, also works appropriately.

Additionally, starting from PostgreSQL 15, MERGE clause from SQL:2013 has also been implemented. The statement would look like the following:

MERGE INTO some_table
USING (VALUES (1)) AS i (id)
ON some_table.id = i.id
WHEN MATCHED THEN
UPDATE SET name='test record'
WHEN NOT MATCHED THEN
INSERT (id, name) VALUES (1, 'test record');

MERGE seems to have better performance, but ON CONFLICT is more stable in case of request races.
While a unique index is not necessary for MERGE, it is often implemented for data consistency.

CONSTRAINT

Generally, duplicate detection can be implemented using a unique index.
However, for more fine-grained control, such as conditions based on values, constraints can also be used.
When using constraints, use ON CONFLICT ON CONSTRAINT <constraint_name> clause.

In PostgreSQL, it’s important to note that unique indexes and constraints are separate objects related to each other, which can be confusing.

For setting up constraints idempotently using ridgepole, you can refer to Ridgepole: Benefits and usage.

中馬崇尋
Chuma Takahiro