INF: How to Remove Duplicate Rows From a TableID: Q139444
|
Microsoft SQL Server tables should never contain duplicate rows, nor
non-unique primary keys. For brevity, we will sometimes refer to primary
keys as "key" or "PK" in this article, but this will always denote "primary
key." Duplicate PKs are a violation of entity integrity, and should be
disallowed in a relational system. SQL Server has various mechanisms for
enforcing entity integrity, including indexes, UNIQUE constraints,
PRIMARY KEY constraints, and triggers.
Despite this, under unusual circumstances duplicate primary keys may occur,
and if so they must be eliminated. One way they can occur is if duplicate
PKs exist in non-relational data outside SQL Server, and the data is
imported while PK uniqueness is not being enforced. Another way they can
occur is through a database design error, such as not enforcing entity
integrity on each table.
Often duplicate PKs are noticed when you attempt to create a unique index,
which will abort if duplicate keys are found. This message is:
This article discusses how to locate and remove duplicate primary keys from a table. However you should closely examine the process which allowed the duplicates to happen in order to preven a recurrence.Msg 1505, Level 16, State 1 Create unique index aborted on duplicate key.
For this example, we will use the following table with duplicate PK values. In this table the primary key is the two columns (col1, col2). We cannot create a unique index or PRIMARY KEY constraint since two rows have duplicate PKs. This procedure illustrates how to identify and remove the duplicates.
create table t1(col1 int, col2 int, col3 char(50))
insert into t1 values (1, 1, "data value one")
insert into t1 values (1, 1, "data value one")
insert into t1 values (1, 2, "data value two")
The first step is to identify which rows have duplicate primary key values:
SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
This will return one row for each set of duplicate PK values in the table.
The last column in this result is the number of duplicates for the
particular PK value.col1 | col2 | |
---|---|---|
1 | 1 | 2 |
set rowcount 1
delete from t1
where col1=1 and col2=1
The rowcount value should be n-1 the number of duplicates for a given key value. In this example, there are 2 duplicates so rowcount is set to 1. The col1/col2 values are taken from the above GROUP BY query result. If the GROUP BY query returns multiple rows, the "set rowcount" query will have to be run once for each of these rows. Each time it is run, set rowcount to n-1 the number of duplicates of the particular PK value.
SELECT col1, col2, count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
SELECT col1, col2, count(*)
FROM holddups
GROUP BY col1, col2
should return a count of 1 for each row. If yes, proceed to step 5 below. If no, you have duplicate keys, yet unique rows, and need to decide which rows to save. This will usually entail either discarding a row, or creating a new unique key value for this row. Take one of these two steps for each such duplicate PK in the holddups table.
DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
INSERT t1 SELECT * FROM holddups
Additional query words: sql6 dedupe entity intg
Keywords : kbusage SSrvProg
Version : winnt:6.0,6.5,7.0
Platform : winnt
Issue type : kbinfo
Last Reviewed: April 10, 1999