sql-snippets / src /snippets /checking-duplicate-dataset.md
cfahlgren1's picture
cfahlgren1 HF staff
add duplicate checker
e6091f8
|
raw
history blame
963 Bytes
metadata
id: duckdb-check-duplicate-rows
title: Check Duplicate Rows
slug: duckdb-check-duplicate-rows-query
description: Count the number of duplicate rows in a table using DuckDB.
code: |
  -- Count duplicate rows in the 'train' table
  SELECT COUNT(*) - COUNT(DISTINCT columns(*))
  FROM train;

DuckDB Check Duplicate Rows Query

This snippet demonstrates how to count the number of duplicate rows in a DuckDB table using a SQL query.

-- Count duplicate rows in the 'train' table
SELECT COUNT(*) - COUNT(DISTINCT columns(*))
FROM train;

This query works by:

  1. Counting all rows using COUNT(*)
  2. Subtracting the count of distinct rows using COUNT(DISTINCT *)
  3. The result is the number of duplicate rows

You can replace 'train' with the name of your specific table to check for duplicates in other tables.

Note: This query can be computationally expensive for large tables, as it needs to check all columns for uniqueness.