On occasion we have all received the same piece of mail multiple times. Maybe we signed up to a service twice or made two frequent purchases from a website or donated to a charity over the phone and also by post. Regardless of the reason, it feels like a waste and as a consumer you know somewhere down the line you are paying for that waste.

Failure to effectively remove duplicates from a mailing file can also have greater consequences. One high profile charity donor received more than 10 duplicate appeal packs and immediately and publicly withdrew their support for the charity. His viewpoint ‘you obviously don’t need my support if you can afford to waste money sending me all this mail’. Lean DM is about highlighting and removing waste and deduplication is a vital data cleaning process to employ for any mailing campaign.

Deduplication isn’t Simple

Removing duplicate records where the name and address exactly match is not a much of a challenge. However unfortunately there are many opportunities for duplicates to slip through the net of a basic deduplication process. The problem lies in the many different ways in which a name and address can be presented. Within the name, Titles could be missing or present, first name abbreviated to an initial or the surname can easily be misspelled.

Furthermore, in the address block lines of a street address might be removed in one version, a county abbreviated in another and a partial postcode in a further duplicate record. Cherished addresses which preserve a house name can present themselves as unique records next to duplicate PAF (Postcode Address File) verified, numbered addresses. So deduplication can become complicated and errors can be difficult to spot in an automated process especially if the data set is large.


A good data cleaning software such as Cygnus from The Software Bureau provides an advanced platform to remove duplicates, however the first step should be a human one. Take a moment to consider the intended recipient of the campaign. Some basic questions can help set the parameters for the deduplication step such as;

Are we Mailing the Same Household?

Consider if the pack is relevant to all individuals within the household. If the mailing is a generic catalogue there might be a good argument to only send one per household. In this case deduping based on the surname and address level would be appropriate.

Is There a Gender Bias?

If the target of the mailing is clearly directed to a specifically male or specifically female audience, then favouring that gender within a surname or address level dedupe makes sense.

Intelligent Deduplication

Once you have decided the rules of the deduplication process, there are some best practices to consider. Use data anchors such as email address or date of birth wherever you have them to improve the confidence of the match. Always dedupe data before you cleanse against a suppression file to ensure you don’t incur unnecessary royalty fees for duplicate records. Take the time to make sure you retain the best quality address out of the 2 or more matches. Cygnus from The Software Bureau provides comprehensive deduplication options to make all of the nuances of the deduplication process fast, flexible and accurate.

Free Introduction to Lean DM

Deduplication is just one component of Lean DM. To learn more, why not download your free 30-page guide from The Software Bureau.  Should you wish to arrange a demonstration of Cygnus or undertake a free trial, please contact us.