Color Usage In Data Analysis
Data Visualization is an integral part of Data Science and Data Analysis. It is a way of beautifully presenting information rather than using traditional spreadsheets and reports.
Humans, by nature, can more easily process information when provided with beautiful visualization as our brains are designed this way. By the use of right visualization, we can group chunks of data into categories, highlight areas that need our attention, or show the progressive growth/decay of our products.
More importantly, as David McCandless describes in his famous Ted Talk “The Beauty Of Data Visualization”, you start to see patterns and connections between numbers which would otherwise be scattered across multiple reports with the help of data visualization.
Now, the most important aspect of Data Visualization is of course the use of colors. Most importantly, good colors that fit the context of your analysis. Without the right choice of colors, your visualization could turn into nasty looking color eruptions.
In this post I will talk about choosing the right kind of colors for data visualization purpose and I will do so by taking help from a Color Brewer package used in R for data analysis.
Typically, color usage can be categorized into three different types based on our data analysis needs.
1) Sequential: When you want to show growth or increase in something, you should pick sequential color scheme. Basically, this relates to sequentially ordered numbers and so it can be used to show progression from very small to the very big. In the picture above, the first section of colors relate to Sequential usage. You see they just get darker and darker starting with lighter values.
2) Qualitative: When you want to show the different variety of something without giving any emphasize to the numbers behind them, you should pick qualitative color scheme. These are essentially used to show different categories. So, if you have a bunch of different political parties, you might just show each one of them with different colors. Or if you want to show a different countries in a map or different species of animals, you would use different colors. The colors here are usually of the same light/dark values.
3) Divergent: Finally, when you want to show two extreme values in your data, you should pick divergent color scheme. This scheme has very light shade in the middle, and then they get darker and darker to different colors going out each side. That’s a way of showing high and low values on something. The highs and the lows and the neutrals are easily visualized here.
That’s the basics of picking the right colors for data visualization as per the context of data analysis. I hope it helps you to create beautiful visualizations in your reports!