Good data visualisation communicates meaning to a wide audience in an unambiguous way.
Unfortunately, a lot of data visualisation is not so effective and communicating meaning. There can be many reasons for this, from not understanding the data, to not understanding visualisation.
This article is the first of a collection of posts that attempt to explain the basics of data visualisation. These are the fundamentals; I won’t cover everything, and there are plenty of examples of elegant or more sophisticated ways of communicating the meaning in data.
But if you are armed with the basics you won’t make common mistakes, and that will put you ahead of the error-makers straight away!
To start with, we have some data that we eventually want to present to an audience. That might be in person, to an individual or group, or perhaps through a report or website. All of these methods of publication require a sound representation of the underlying data and so it is important that we have an understanding of the basics of data.
When we measure things, like the weight of something, we record quantitative data. Apart from weight, it might be height, cost, temperature, RGB colour scale, density, etc. Quantitative data is generally easy to collect and analyse, and therefore we might put some effort into looking for such data if we want to compare it with something else.
Within the category of quantitative data there are two sub-categories: discrete data and continuous data.
Discrete data refers to the situation where something can only exist as a whole. There cannot be 2.75 dogs for instance; there are either 2 or 3 dogs.
Continuous data can be expressed to an infinitesimally small degree of measurement. Weight is an example of continuous data. Time for instance, is considered as a continuous variable, whereas the days of the week are clearly discrete.
The classification of data assumes that we have a means of representing the measurement that generated the data in the first place. How a measurement reflects the data that is captured is expressed by a scale. When re represent data, we are interested in four scales as follows: nominal, ordinal, interval and ratio (or NOIR).
The nominal scale refers to discrete qualitative data that has a label such as eye colour: green, grey, blue, brown, pink. Each of the labels does not provide any indication of a quantitative value at all.
Ordinal data is similar to nominal data in that there is a label and the data is qualitative. However, there is a relationship between the individual data points that enables them to be ranked in order, for example: gold, silver, bronze. Whilst the data is ranked, there is no information as to the distance between data points. A gold medal gymnastics performance might be 14 points, whereas silver is 12.5 and bronze is 9. The numerical difference therefore is different between gold and silver, than it is between silver and bronze.
Interval data has a ranked order where there is a quantifiable difference between each of the integer data points. The oft-used example of interval data is with regard to temperature; the numerical difference between 25C and 30C is the same as that between 15C and 20C. However, there is no absolute zero in interval data so it is not possible to directly compare magnitudes; so, Dubai is not necessarily twice as hot as London.
If an absolute zero does exist (no weight, no length, etc.), then the data has difference and magnitude and can thus be represented by a continuous numeric scale. 50kg is twice as much as 25kg. This is referred to as Ratio data and is quantitative.
The scientists amongst us may recognise that unlike the Celcius and Fahrenheit scales, the Kelvin scale does have an absolute zero and is therefore considered a ratio scale.
To summarise, there are two types of data, quantitative and qualitative.
- Qualitative data can be numerical or not, but both sub-types are represented by either nominal or ordinal scales.
- Quantitative data is numerical and is represented by interval or ratio scales.
Having some familiarity with these different characteristics will help inform your subsequent choice of analysis approach and presentation type, so it is worthwhile understanding them from the beginning.
You may now find that you identify errors in the next presentation you see!