sql-snippets / src /snippets /checking-duplicate-dataset.md
cfahlgren1's picture
cfahlgren1 HF staff
add duplicate checker
e6091f8
|
raw
history blame
963 Bytes
---
id: "duckdb-check-duplicate-rows"
title: "Check Duplicate Rows"
slug: "duckdb-check-duplicate-rows-query"
description: "Count the number of duplicate rows in a table using DuckDB."
code: |
-- Count duplicate rows in the 'train' table
SELECT COUNT(*) - COUNT(DISTINCT columns(*))
FROM train;
---
# DuckDB Check Duplicate Rows Query
This snippet demonstrates how to count the number of duplicate rows in a DuckDB table using a SQL query.
```sql
-- Count duplicate rows in the 'train' table
SELECT COUNT(*) - COUNT(DISTINCT columns(*))
FROM train;
```
This query works by:
1. Counting all rows using `COUNT(*)`
2. Subtracting the count of distinct rows using `COUNT(DISTINCT *)`
3. The result is the number of duplicate rows
You can replace 'train' with the name of your specific table to check for duplicates in other tables.
Note: This query can be computationally expensive for large tables, as it needs to check all columns for uniqueness.