Show/Remove Duplicate Rows
Show Duplicates and Remove Duplicates and allow you to easily show and/or remove duplicates from your table. Select Show Duplicates to determine the duplicate rows. Select Remove Duplicates to eliminate duplicate rows if you think these are not needed.
For instance you start with a table like this, where record pairs 1 and 2, and 7 and 8 are duplicates :
Show Duplicates will create a table like this:
Remove Duplicates will create a table like this:
You can show or remove rows that have duplicate values for all columns or just specific columns. In our example for instance, you can remove only rows that have duplicate Customers, or duplicate Account Number and Customers.
Steps:
You can show only duplicates from the Prepare Window. To do so:
-
Go to the Prepare window and select the table you want to transform.
-
Select Transform on the Monarch Data Prep Studio Toolbar.
-
Select Duplicates.
-
Enter the name of the resulting table.
-
Select the Deduplicate Operation from the drop-down.
-
Remove Duplicates - Keep first row
-
Show Duplicates - Show all except first row
-
Flag Duplicates - Mark with new column
-
Unselect/select the columns you want to use in evaluating duplicates.
-
Click the Case Sensitive box if you want to differentiate values based on case.
-
Select the columns to be included in the resulting table.
-
Select Use All Columns if you want to display all columns.
-
Select Use Selected Columns, and then check the box beside the columns you want to include:
-
Select OK.
The dialog box that displays allows you to select a transformation.
The Duplicates dialog box displays:
To understand the different operations, let us use the following as an example:
We have 2 sets of duplicates: Betty's Music Store has 2 duplicate records and Big Shanty Music has 3 duplicate records.
With this operation, Monarch Data Prep Studio removes all duplicate records except one row for each duplicate set.
With the records above, the result will be:
With this operation, Monarch Data Prep Studio displays all duplicate records except the first row for each duplicate set.
With the records above, the result will be:
With this operation, Monarch Data Prep Studio marks all records that have duplicates.
With the records above, the result will be:
If a column is not selected, this column is ignored even if it has duplicated values.
For instance, if case sensitive is on, "Betty's Music Store" and "betty's Music Store" are considered distinct values and will not be considered as duplicate.
Monarch Data Prep Studio applies the deduplication and creates a new table.