DBMS has the unique index feature that protects against INSERTing duplicate records, and at a minimum, the primary key must be unique.
In cases like testing, where you might want to enter the same data, including the primary key, you need to be clever about how you use INSERT
statement.
Not INSERT if it’s a duplicate
The approach of detecting duplicates and canceling INSERT is straightforward, resulting the existing record takes precedence.
INSERT INTO some_table
(id, name) VALUES (1, 'test record')
ON CONFLICT DO NOTHING;
This way, by adding the ON CONFLICT DO NOTHING
clause, INSERT
will only occur when there are no duplicate records.
An alternative method to achieve the same effect is to use WHERE NOT EXISTS
for checking, but ON CONFLICT DO NOTHING
is more convenient as it is a fixed clause.
UPSERT - UPDATE when duplicate
In the case of detecting duplicates and wanting to overwrite resulting the new record wins, commonly known as UPSERT, a query would be as follows:
INSERT INTO some_table
(id, name) VALUES (1, 'test record')
ON CONFLICT (id)
DO UPDATE SET name='test record';
The query becomes lengthy because it is necessary to specify the condition for detecting duplicates (id)
and UPDATE
clause.
If you want to UPSERT on a column other than the primary key, you need to create a unique index in advance.
A composite unique key, which uses multiple columns for judgment, also works appropriately.
Additionally, starting from PostgreSQL 15, MERGE
clause from SQL:2013 has also been implemented. The statement would look like the following:
MERGE INTO some_table
USING (VALUES (1)) AS i (id)
ON some_table.id = i.id
WHEN MATCHED THEN
UPDATE SET name='test record'
WHEN NOT MATCHED THEN
INSERT (id, name) VALUES (1, 'test record');
MERGE
seems to have better performance, but ON CONFLICT
is more stable in case of request races.
While a unique index is not necessary for MERGE
, it is often implemented for data consistency.
CONSTRAINT
Generally, duplicate detection can be implemented using a unique index.
However, for more fine-grained control, such as conditions based on values,
constraints can also be used.
When using constraints, use ON CONFLICT ON CONSTRAINT <constraint_name>
clause.
In PostgreSQL, it’s important to note that unique indexes and constraints are separate objects related to each other, which can be confusing.
For setting up constraints idempotently using ridgepole, you can refer to Ridgepole: Benefits and usage.
Chuma Takahiro