Similar to the Distinct node, the Remove Duplicates node is used to find and remove duplicate rows from the given table. However, where the Distinct node removes identical rows, the Remove Duplicates node enables users to define duplicate rows according to specified columns. For instance, you may want to remove rows that contain the same customer name, even if the remaining columns in the rows are not identical.
The Remove Duplicates node will retain the first duplicate row and remove the remaining duplicate rows. For example, say the First Name column is selected as a column to compare, and the column contains three customers with the first name Fred, in the following order: Fred Johnson, Fred Jackson, and Fred Jason. The first row containing "Fred" will be retained, and the other two will be removed.
How to Configure the Remove Duplicates Node
- Connect the Remove Duplicates node to the table from which you want to remove duplicate rows.
- From the node's Properties panel, go to the Remove Duplicates window; select the columns that should be used to define duplicate rows under "Columns to Compare".
- Preview the node to see a preview of the table with the given duplicates removed.