Welcome to a quick tutorial on how to get and count duplicate records in MySQL. So you want to filter out all the duplicate records? Or maybe get a count of the number of duplicates?
SELECT `COLUMN` FROM `TABLE` GROUP BY `COLUMN` HAVING COUNT(*)>1
That covers the quick basics, but read on for detailed examples!
TABLE OF CONTENTS
DOWNLOAD & NOTES
Here is the download link to the example code, so you don’t have to copy-paste everything.
EXAMPLE CODE DOWNLOAD
Just click on “download zip” or do a git clone. I have released it under the MIT license, so feel free to build on top of it or use it in your own project.
SORRY FOR THE ADS...
But someone has to pay the bills, and sponsors are paying for it. I insist on not turning Code Boxx into a "paid scripts" business, and I don't "block people with Adblock". Every little bit of support helps.
Buy Me A Coffee Code Boxx eBooks
DEALING WITH DUPLICATE RECORDS IN MYSQL
All right, let us now get into the examples of getting and counting duplicate records in MySQL.
PART 1) DUMMY DATABASE TABLE
-- (A) FOOD TABLE
CREATE TABLE `food` (
`id` bigint(20) NOT NULL,
`name` varchar(255) NOT NULL,
`qty` bigint(20) NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
ALTER TABLE `food`
ADD PRIMARY KEY (`id`);
ALTER TABLE `food`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
-- (B) DUMMY ENTRIES
INSERT INTO `food` (`id`, `name`, `qty`) VALUES
(1, 'Candy', 57),
(2, 'Pasta', 123),
(3, 'Apple', 24),
(4, 'Orange', 67),
(5, 'Pasta', 8),
(6, 'Melon', 64),
(7, 'Apple', 31),
(8, 'Pasta', 12),
(9, 'Rice', 87),
(10, 'Candy', 247);
For this example, we will be using a simple food table with only 3 fields:
id
Primary key.name
Name of the food.qty
Food quantity.
For those who are new, this table structure is not exactly bad if it is for “warehouse stock tracking”. But if this is the “main products table” for an eCommerce site, it gets an immediate red flag – The product name should be unique to prevent duplicate entries.
PART 2) GET DUPLICATE RECORDS WITH COUNT
SELECT `name`, COUNT(*) AS `rows`
FROM `food`
GROUP BY `name`
HAVING `rows` > 1
So which foods have duplicate rows? As in the above introduction, a SELECT - GROUP BY - HAVING COUNT > 1
query is all we need to filter out the duplicate records.
PART 3) GET ALL RECORDS WITH DUPLICATES
SELECT *
FROM `food`
WHERE `name` IN (
SELECT `name`
FROM `food`
GROUP BY `name`
HAVING COUNT(*) > 1
)
The previous example only gave the name of the foods and the number of duplicate rows. Just how do we get all rows that have duplicates? Very simply – We nest the previous “get duplicates” query into another SELECT *
query.
PART 4) SUMMING DUPLICATE RECORDS
SELECT `name`, SUM(`qty`) AS `total`
FROM `food`
GROUP BY `name`
Want to get the sum of the quantity of all the duplicate entries? This should be pretty self-explanatory.
PART 5) REMOVING DUPLICATES
5A) CREATE A STAGING TABLE
-- (A) CREATE NEW "STAGING" TABLE
CREATE TABLE `food-stage` (
`id` bigint(20) NOT NULL,
`name` varchar(255) NOT NULL,
`qty` bigint(20) NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
ALTER TABLE `food-stage`
ADD PRIMARY KEY (`id`),
ADD UNIQUE KEY `name` (`name`);
ALTER TABLE `food-stage`
MODIFY `id` bigint(20) NOT NULL AUTO_INCREMENT;
This is one for you guys who are looking to “fix” and remove the duplicate entries. Some other tutorials give complicated crazy methods, I only have a simple one. The first and golden rule, be non-destructive and create a new “staging” table.
This is the same food
table, except that name
is now unique. Next, we write a simple script to “copy” the unique entries over.
5B) SCRIPT TO REMOVE DUPLICATES
<?php
// (A) CONNECT TO DATABASE
$pdo = new PDO(
"mysql:host=localhost;dbname=test;charset=utf8",
"USER", "PASSWORD", [
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC
]);
// (B) GET SUM OF FOODS
$stmtA = $pdo->prepare("SELECT `name`, SUM(`qty`) AS `total` FROM `food` GROUP BY `name`");
$stmtA->execute();
while ($row = $stmtA->fetch()) {
$stmtB = $pdo->prepare("INSERT INTO `food-stage` (`name`, `qty`) VALUES (?,?)");
echo $row['name'] ." - ". $row['total'] . " - ";
echo $stmtB->execute([$row['name'], $row['total']]) ? "OK" : "ERROR" ;
echo "<br>";
}
This is in PHP, but it can really be in any programming language of your choice… So long as it connects to MySQL. After running this script, the last step is to rename the tables:
food
tofood-old
.food-stage
tofood
.
Done – You still have the old table as backup and can switch back if anything goes wrong.
EXTRAS
That’s all for the tutorial, and here is a small section on some extras and links that may be useful to you.
REMOVING DUPLICATES IS SUBJECTIVE
Before the trolls get angry, here’s a disclaimer – The process of removing duplicates is subjective to the existing database structure… It can get very complicated with foreign keys and even restructuring the entire database. So yes, which is why I recommend the non-destructive approach, and also create a backup copy of the database before you commit the changes.
LINKS & REFERENCES
- Get the sum of columns for duplicate records in MySQL – Tutorials Point
- How To Find Duplicate Values in MySQL – MySQLTutorial
- Different ways to SQL delete duplicate rows from a SQL Table – SQLShack
THE END
Thank you for reading, and we have come to the end. I hope that it has helped you to better understand, and if you want to share anything with this guide, please feel free to comment below. Good luck and happy coding!
Great thank you very much