Divergent Color

Color Usage In Data Analysis

Data Visualization is an integral part of Data Science and Data Analysis. It is a way of beautifully presenting information rather than using traditional spreadsheets and reports.

Humans, by nature, can more easily process information when provided with beautiful visualization as our brains are designed this way. By the use of right visualization, we can group chunks of data into categories, highlight areas that need our attention, or show the progressive growth/decay of our products.

More importantly, as David McCandless describes in his famous Ted Talk “The Beauty Of Data Visualization”, you start to see patterns and connections between numbers which would otherwise be scattered across multiple reports with the help of data visualization.

Now, the most important aspect of Data Visualization is of course the use of colors. Most importantly, good colors that fit the context of your analysis. Without the right choice of colors, your visualization could turn into nasty looking color eruptions.

In this post I will talk about choosing the right kind of colors for data visualization purpose and I will do so by taking help from a Color Brewer package used in R for data analysis.

R Color Brewer

All color choices from Color Brewer

Typically, color usage can be categorized into three different types based on our data analysis needs.

1) Sequential: When you want to show growth or increase in something, you should pick sequential color scheme. Basically, this relates to sequentially ordered numbers and so it can be used to show progression from very small to the very big. In the picture above, the first section of colors relate to Sequential usage. You see they just get darker and darker starting with lighter values.

Sequential Color Usage

Sequential Color Usage

2) Qualitative: When you want to show the different variety of something without giving any emphasize to the numbers behind them, you should pick qualitative color scheme. These are essentially used to show different categories. So, if you have a bunch of different political parties, you might just show each one of them with different colors. Or if you want to show a different countries in a map or different species of animals, you would use different colors. The colors here are usually of the same light/dark values.

Qualitative Color

3) Divergent: Finally, when you want to show two extreme values in your data, you should pick divergent color scheme. This scheme has very light shade in the middle, and then they get darker and darker to different colors going out each side. That’s a way of showing high and low values on something. The highs and the lows and the neutrals are easily visualized here.

Divergent Color

That’s the basics of picking the right colors for data visualization as per the context of data analysis. I hope it helps you to create beautiful visualizations in your reports!

 

Connecting to SQL Server using RODBC package in R

RODBC is a useful package for making SQL queries to SQL Server database for R users.

To start using this package, first we need to make sure it is loaded into R studio.


is.installed <- function(mypkg) is.element(mypkg, installed.packages()[,1])

if(!is.installed('RODBC'))
{
install.packages("RODBC")
}

library(RODBC)

Now that the package is installed/loaded into our environment, let’s start making SQL queries.

First we will need to provide credentials for accessing our SQL Server Database for R. Mainly we will need to have correct info for:

  1. Server name
  2. Database name
  3. Login Id
  4. Password

Providing connection info:


conn <- odbcDriverConnect('driver={SQL Server};
server=SOVITHOME-PC\\RAY;
database=EmergencyRMDb;
uid=sa;
pwd=myPwd***')

Now the “conn” object holds connection for SQL Server. Using this object we can make different kinds of queries supported by RODBC. Let’s perform some common queries like:

  • SELECT statements
  • SELECT statements with WHERE Clause
  • UPDATE statements
  • SQL JOIN statements

# print row count of tables
sqlQuery(conn, "select count(*) from EQData", as.is=T)

# print name of columns
colnames(tblData)

# select all data from a table
tblData &lt;- sqlQuery(conn, "select * from EQData", as.is=T)
#preview data
head(tblData)

<img class="wp-image-542" src="http://sovitpoudel.com.np/wp-content/uploads/2017/03/result1-300x68.png" alt="" width="850" height="193"> Result from query


# Viewing columns and rows

# [x,y] -> x represents the row number and y represents the column number

tblData[,1:5] # view columns 1 through 5

tblData[2,] # view first two rows

tblData$Latitude[2] # view second value for specified column
# finding by matching row(s) value
tblData[tblData$Epicentre == "Kathmandu",]

# update tables
sqlQuery(conn, "update EQData set latitude = '27.75' where epicentre = 'kathmandu'")

SQL JOIN statements can also be easily performed using “merge” function.


# perform inner join
newT <- merge(tblA, tblB, by = "key_id")
dim(newT)

# perform outer join
newOuterT <- merge(tblA, tblB, by = "key_id", all = TRUE)
dim(newOuterT)

# perform left join
newLeftT <- merge(tblA, tblB, by = "key_id", all.x = TRUE)
dim(newLeftT)

# perform right join
newRightT <- merge(tblA, tblB, by = "key_id", all.y = TRUE)
dim(newRightT)

These are some common operations that we can easily perform using R and RODBC on SQL Server tables. As you can see, most of the common operations can be performed using built-in R functions for Data Frames on R. I am just making use of “sqlQuery” function provided by RODBC for making queries on the Server.

Please find the full list of RODBC supported queries at: https://cran.r-project.org/web/packages/RODBC/RODBC.pdf

Hope this was helpful. If you got stuck on any of the steps, above please do not hesitate to leave comments below.

Thanks!