Your IIoT equipment is installed. You can see the devices on your network. Your log files are being written to when the machine runs. What next?
Initial excitement can soon turn into dismay when you attempt to produce some initial reports from the data that you are gathering.
Why are there so many missing values? What does this mean?
Those readings look far too high. What is going on here?
While frustrating, this is actually progress. The questions are forcing us to understand the process better, and that will eventually lead to improved insight. We can’t optimise a process if we don’t understand it.
Since our intention is likely to be to make predictions about future performance using analytics, we need to be conscious that the statistical methods are only as good as the data they are fed.
If we tell our statistical model that temperature readings of 35C are OK, then we must expect that the predictions will accommodate this. If the readings should not be that high, we are setting our analytics capability up to fail.
So, to be sure that any subsequent processing of the IIoT data is correct, we need to make sure that our data is ‘clean’. That means, no errors, missing values, inconsistencies, etc.,within the stream of data that is to be scrutinised.
The process of data cleansing is essential for consistent post-processing. It starts with a set of rules that can be automatically applied to the data as it is produced. These rules enforce a level of quality that you understand, so that you are confident that any calculations that are based on that data are sound.
For instance, data from sensors can be noisy. There may be out-of-range values that are irrelevant, or that indicate a problem with a sensor. Similarly, an operating hitch with the equipment may mean that a duplicate value is recorded for the same event.
It might be that a process that requires human input may have contained an error. Or, the system itself may have corrupted some of the data during its storage or transmission to other computational nodes.
It is therefore important to look closely at the data that is produced from an IIoT device, and to evaluate how relevant it is to the objective of your analysis.
An important task is to provide a means by which the data can be visualised, as this significantly aids comprehension. A scatter plot is an effective way of illustrating values that are outliers, which may not have been as obvious in tabular data.
Missing data is often an interesting phenomenon. It immediately raises the quest open of “why” it is absent in the first place, and we need to ascertain the difference between a genuine absence of a value (because nothing was happening) and an error in the dataset. Errors in the dataset can cause issues with subsequent processing, so it might be useful to interpolate a value and have that inserted automatically so that the record is complete. This is something that is best decided in the presence of a domain expert (usually the machine operator), in order to qualify what is the appropriate thing to do.
Clean data is important and the presence of dirty data can scupper the successful implementation of IIoT equipment. Try to see it as part of the process of understanding your system, and it will build your organisation’s capability to understand its processes much faster than any training course.