Mysql Handling Duplicates
# MySQL Handling Duplicates
Some MySQL tables may contain duplicate records. In some cases, we allow duplicate data to exist, but sometimes we also need to delete this duplicate data.
In this chapter, we will introduce how to prevent duplicate data in tables and how to delete duplicate data from tables.
* * *
## Preventing Duplicate Records in a Table
You can set specified fields in a MySQL table as **PRIMARY KEY** or **UNIQUE** index to ensure data uniqueness.
Let's try an example: The following table has no index or primary key, so it allows multiple duplicate records.
```sql
CREATE TABLE person_tbl (
first_name CHAR(20),
last_name CHAR(20),
sex CHAR(10)
);
If you want to ensure that the `first_name` and `last_name` fields in the table cannot have duplicate data, you can set a composite primary key to enforce data uniqueness. If you set a composite primary key, the default value for that key cannot be NULL and should be set to NOT NULL. As shown below:
```sql
CREATE TABLE person_tbl (
first_name CHAR(20) NOT NULL,
last_name CHAR(20) NOT NULL,
sex CHAR(10),
PRIMARY KEY (last_name, first_name)
);
If we set a UNIQUE index, then when inserting duplicate data, the SQL statement will fail to execute and throw an error.
The difference between `INSERT IGNORE INTO` and `INSERT INTO` is that `INSERT IGNORE INTO` will ignore data that already exists in the database. If there is no data in the database, it inserts new data; if there is data, it skips that data. This way, existing data in the database is preserved, achieving the purpose of inserting data into gaps.
The following example uses `INSERT IGNORE INTO`. After execution, it will not throw an error and will not insert duplicate data into the table:
```sql
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
-> VALUES( 'Jay', 'Thomas');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
-> VALUES( 'Jay', 'Thomas');
Query OK, 0 rows affected (0.00 sec)
`INSERT IGNORE INTO` does not return an error when inserting duplicate data after setting record uniqueness; it only returns a warning. `REPLACE INTO`, if a record with the same primary or unique key exists, deletes it first and then inserts the new record.
Another way to set data uniqueness is to add a UNIQUE index, as shown below:
```sql
CREATE TABLE person_tbl (
first_name CHAR(20) NOT NULL,
last_name CHAR(20) NOT NULL,
sex CHAR(10),
UNIQUE (last_name, first_name)
);
* * *
## Counting Duplicate Records
Below, we will count the number of duplicate records for `first_name` and `last_name` in the table:
```sql
mysql> SELECT COUNT(*) as repetitions, last_name, first_name
-> FROM person_tbl
-> GROUP BY last_name, first_name
-> HAVING repetitions > 1;
The above query will return the number of duplicate records in the `person_tbl` table. Generally, to query duplicate values, perform the following steps:
* Determine which column contains values that may be duplicated.
* Use `COUNT(*)` in the column selection list for those columns.
* List the columns in the `GROUP BY` clause.
* Set the `HAVING` clause to have a repetition count greater than 1.
* * *
## Filtering Duplicate Data
If you need to read non-duplicate data, you can use the `DISTINCT` keyword in the `SELECT` statement to filter duplicate data.
```sql
mysql> SELECT DISTINCT last_name, first_name
-> FROM person_tbl;
You can also use `GROUP BY` to read non-duplicate data from the table:
```sql
mysql> SELECT last_name, first_name
-> FROM person_tbl
-> GROUP BY (last_name, first_name);
* * *
## Deleting Duplicate Data
If you want to delete duplicate data from a table, you can use the following SQL statements:
```sql
mysql> CREATE TABLE tmp SELECT last_name, first_name, sex FROM person_tbl GROUP BY (last_name, first_name, sex);
mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;
Of course, you can also use a simpler method by adding an INDEX and PRIMARY KEY to the table to delete duplicate records. The method is as follows:
```sql
mysql> ALTER IGNORE TABLE person_tbl
-> ADD PRIMARY KEY (last_name, first_name);
YouTip