top of page
  • mike97862

Why is data cleansing important?

“Clean up your act!”

When first mentioning “data cleansing”, you’d hardly think it was anything too challenging, let alone fearful.

...The wiser truth is that it can be like Monty Python’s Holy Grail rabbit - a mere bunny which looks absolutely harmless in front of all the knights in armour. But the knights soon find out why the bunny deserves being called the Black Beast of Arrrghhh - the last utterance of anyone who ever saw it!

Oops! … What happens when you make a data mistake in space?

A metric mishap caused NASA to lose a $125 million Mars-bound orbiter spacecraft! What happened was that a Lockheed Martin engineering team used English units of measurement while NASA used the metric system usually used for space operations.

To cut a long story short - the Mars orbiter completely missed its target!

What is data cleansing? Data cleansing is the process of detecting and correcting inaccurate data. Enterprises should attach significant importance to data quality. DATA is the lifeblood of INFORMATION, on which all DECISIONS are based.

Don’t be duped into duplicating

Duplicate entries for contacts, products, raw materials, are very common. A company or product name can be mis-spelled or shortened, and the next person looking for it will create a new entry because he/she couldn’t find it.

Confusion caused by multiple systems

Often, different departments will have standalone solutions or spreadsheets. Sales, production, accounting have different needs and may record data in different ways.

  • One department needs accurate legal names, while short-form names will do for others.

  • One refers to products by their marketing names, others by their technical specifications.

  • One needs bank account details, the others don’t.

Here’s how this could lead to a problem!...

Because they all need to reach customers in their different departmental ways, there’s a medley of records in individual systems. This already creates duplication but it may not cause any harm at the outset.

However, contact data are rarely static.

Every day, people move, change details and preferences. A department learning about this may duly make amends, but is unlikely to inform other departments, or to follow through to check.

This way the company could end up with 3 different entries....

… and no one knows which one is accurate.

Integrated systems are not immune either ...

Entering data into any system can sometimes be tiresome. This naturally creates impatience and haste. It might be good enough then and there, but later the data might be insufficient for use elsewhere in the company.

It may not immediately be a problem. When the data entry shows up next in another department, they will just add the missing pieces, or correct the faulty data.

However, it can also mean that they WON’T find the entry.

Because they THINK the data doesn’t exist, they enter it again, creating duplication.

Importing corrupted data

Previously corrupted data can seep into a system through importing from an outside source.

Another way for this to happen is when originally good quality data goes through some inadequate interface.

Companies can lose money, painfully...

One-off errors in data or continuous discrepancies can haemorrhage money!

“Computer says no.”

Have you ever thought you’re out of stock, when you’re really not?!

For example, a wasteful and costly event is purchasing redundant stock just because you can’t find it in inventory, owing to name or category variations.

For the same reason, customers can be turned away, citing stock shortages.

If there is miscommunication during marketing processes owing to inaccuracies, there can also be distrust and frustration from potential buyers.

Questions to determine whether one needs data cleansing

Date cleansing is aligning data in the system with reality.

If your own current standing in data accuracy is not clear to you, here are a few questions you can ask yourself:

  • Are there duplicate account records of the same customer or supplier?

  • Are there "islands" of data in your current un-integrated systems which in reality should be aligned?

  • Are there old or obsolete “product identity codes” in your systems?

  • Are there significant blank or incomplete fields in your product, process or staff records?

  • Can your data be trusted enough to give reliable information?

  • Are there checks in place to manage the problem of data quality?

And also:

  • What are some of the most challenging problems that the company is facing when trying to use data?

Strict procedures to avoid data corruption

One way to fight against data duplication and wrong data entries is to create “data gates”.

This would mean that only a very small, highly-trained group of people are allowed to record certain strategic data.

For example, only one person in the warehouse can create a new product, or just an accounting administrator can store a new customer/supplier into the system. These people would know what to look out for and could be trusted to pay appropriate attention to data entry procedures.

In a system like this, other employees could request these entries, or maybe even record them for approval.

But this creates some problems of its own.

It creates bottlenecks and could slow things down. Because these people are usually more highly trained than average, it adds more direct cost. If they leave the company, they are harder to replace, and there’ll be the need to train newcomers.

When does data cleansing become a particular necessity?

Our desire to have pristine data commonly reaches a pinnacle when it comes to data migration. When a new system is to be installed, we naturally don’t want to corrupt it immediately at the start.

Having correct unambiguous data for future decisions is a main purpose of a new ERP. Data cleansing is a vital part of moving to another stage in a company’s evolution. What served well for years may not be sufficient at a higher level or bigger corporate size.

So we have data cleansing projects...

Every ten years or so, a company suddenly faces a data cleansing problem. The process of cleansing data can be a lengthy one.

It is a PROJECT in its own right and should be treated with its own budget, milestones and monitoring. Discussion should take place, often involving top management.

Here’s an example -

Consider having 10,000 customers, and 30,000 products or materials, and having multiple versions for the same entity, even within one department. For example, having J.Smith Ltd. and John Smith Ltd. and J.Smith and Jim Smith Ltd. ...

... how do you find the duplications?

It is a common and easy mistake to consider data cleansing a straightforward, low-level job.

If so, it is delegated to a junior administrator. But even the simplest questions, like “what should be a product's name?”, can lead to HUGE conceptual questions.

The person in charge should either be able to decide alone or should have the power to alert higher management to get involved.

… It can indeed become such a daunting task to untangle this mess that it’s sometimes cheaper to re-enter everything than to sort it out!

A possible solution to aid data cleansing Sometimes it is a good idea to build a data migration gateway page into the ERP for the installation period. This would introduce a small workflow, where every item has to be approved on specialized pages.

Mike Harsanyi


bottom of page