Connecting to SQL Server using RODBC package in R

RODBC is a useful package for making SQL queries to SQL Server database for R users.

To start using this package, first we need to make sure it is loaded into R studio.


is.installed <- function(mypkg) is.element(mypkg, installed.packages()[,1])

if(!is.installed('RODBC'))
{
install.packages("RODBC")
}

library(RODBC)

Now that the package is installed/loaded into our environment, let’s start making SQL queries.

First we will need to provide credentials for accessing our SQL Server Database for R. Mainly we will need to have correct info for:

  1. Server name
  2. Database name
  3. Login Id
  4. Password

Providing connection info:


conn <- odbcDriverConnect('driver={SQL Server};
server=SOVITHOME-PC\\RAY;
database=EmergencyRMDb;
uid=sa;
pwd=myPwd***')

Now the “conn” object holds connection for SQL Server. Using this object we can make different kinds of queries supported by RODBC. Let’s perform some common queries like:

  • SELECT statements
  • SELECT statements with WHERE Clause
  • UPDATE statements
  • SQL JOIN statements

# print row count of tables
sqlQuery(conn, "select count(*) from EQData", as.is=T)

# print name of columns
colnames(tblData)

# select all data from a table
tblData &lt;- sqlQuery(conn, "select * from EQData", as.is=T)
#preview data
head(tblData)

<img class="wp-image-542" src="http://sovitpoudel.com.np/wp-content/uploads/2017/03/result1-300x68.png" alt="" width="850" height="193"> Result from query


# Viewing columns and rows

# [x,y] -> x represents the row number and y represents the column number

tblData[,1:5] # view columns 1 through 5

tblData[2,] # view first two rows

tblData$Latitude[2] # view second value for specified column
# finding by matching row(s) value
tblData[tblData$Epicentre == "Kathmandu",]

# update tables
sqlQuery(conn, "update EQData set latitude = '27.75' where epicentre = 'kathmandu'")

SQL JOIN statements can also be easily performed using “merge” function.


# perform inner join
newT <- merge(tblA, tblB, by = "key_id")
dim(newT)

# perform outer join
newOuterT <- merge(tblA, tblB, by = "key_id", all = TRUE)
dim(newOuterT)

# perform left join
newLeftT <- merge(tblA, tblB, by = "key_id", all.x = TRUE)
dim(newLeftT)

# perform right join
newRightT <- merge(tblA, tblB, by = "key_id", all.y = TRUE)
dim(newRightT)

These are some common operations that we can easily perform using R and RODBC on SQL Server tables. As you can see, most of the common operations can be performed using built-in R functions for Data Frames on R. I am just making use of “sqlQuery” function provided by RODBC for making queries on the Server.

Please find the full list of RODBC supported queries at: https://cran.r-project.org/web/packages/RODBC/RODBC.pdf

Hope this was helpful. If you got stuck on any of the steps, above please do not hesitate to leave comments below.

Thanks!

Uploading Unknown Format Excel File To A SQL Database

Introduction

Usually it is expected from client side to upload only structured Excel sheets (sheet with defined Column Names and data types for Rows) to load to server. But sometimes you might also require to dump whatever is in the Excel directly to your database. While there is a tool that is provided by SQL Management Studio for importing data from Excel files, this tutorial covers how you can automate such task using C# and .NET framework.

Steps

Two important pieces for uploading excel file of unknown column format to SQL database are:

  1. Reading Data from Excel into a Data Table
  2. Creating Table in SQL Server based on the Data Table

Once these two steps are done, the final step is simply to bulk load the data table into the table in SQL server.

Reading Data from Excel into a Data Table

I am using Excel Data Reader nuget package for this purpose.


/// <summary>
/// Gets a data table based on provided Excel File
/// </summary>
/// <param name="pathToExcelFile"></param>
/// <returns></returns>
public static DataTable GetTableFromExcel(string pathToExcelFile)
{
FileStream stream = File.Open(pathToExcelFile, FileMode.Open, FileAccess.Read);

IExcelDataReader excelReader;
if (Path.GetExtension(pathToExcelFile) == ".xls")
{
//1. Reading from a binary Excel file ('97-2003 format; *.xls)
excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
}
else
{
//2. Reading from a OpenXml Excel file (2007 format; *.xlsx)
excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
}

excelReader.IsFirstRowAsColumnNames = true;

//...
//3. DataSet - The result of each spreadsheet will be created in the result.Tables
DataSet result = excelReader.AsDataSet();
DataTable dt = null;
if (result != null && result.Tables.Count > 0)
{
dt = result.Tables[0];//get first sheet only
}
return dt;
}

Excel Data Reader package supports multiple sheets of Excel to be read into a Data Set at once and also allows other configurations such as: Supporting multiple file formats and Checking Column Names in First Row.

Creating Table in SQL Server based on the Data Table

Now, to generate a “Create Table” query, I found a very useful code on the Stack Overflow site for this purpose.


/// <summary>
/// Generates sql create table query for given data table
/// </summary>
/// <param name="tableName">name of the table to be created</param>
/// <param name="table">data table</param>
/// <returns></returns>
public static string GenerateCreateTableQuery(string tableName, DataTable table)
{
var sqlsc = "CREATE TABLE " + tableName + "(";
for (int i = 0; i < table.Columns.Count; i++)
{
sqlsc += "\n [" + table.Columns[i].ColumnName + "] ";
string columnType = table.Columns[i].DataType.ToString();
switch (columnType)
{
case "System.Int32":
sqlsc += " int ";
break;
case "System.Int64":
sqlsc += " bigint ";
break;
case "System.Int16":
sqlsc += " smallint";
break;
case "System.Byte":
sqlsc += " tinyint";
break;
case "System.Decimal":
sqlsc += " decimal ";
break;
case "System.DateTime":
sqlsc += " datetime ";
break;
case "System.String":
default:
sqlsc += string.Format(" nvarchar({0}) ", table.Columns[i].MaxLength == -1 ? "max" : table.Columns[i].MaxLength.ToString());
break;
}
if (table.Columns[i].AutoIncrement)
sqlsc += " IDENTITY(" + table.Columns[i].AutoIncrementSeed.ToString() + "," + table.Columns[i].AutoIncrementStep.ToString() + ") ";
if (!table.Columns[i].AllowDBNull)
sqlsc += " NOT NULL ";
sqlsc += ",";
}
return sqlsc.Substring(0, sqlsc.Length - 1) + "\n)";
}

Once we have the data and create table script, uploading to database is quite simple and can be achieved with following steps:

  1. Decide upon a table name
  2. Check if it exists or not in the server
  3. Create table if it doesn’t exist
  4. Bulk load data table into your newly created table.

 

A Knight’s Watch

Recently I came across this interesting problem through Toptal (on Codility) which kept me thinking hard for a few days. Finally, I now have a solution to this problem and would like to share it here.

Problem:

Basically, the problem deals with a knight piece on an infinite chess board. Assuming the knight is positioned at (0,0) you are asked to calculate and print the shortest path that the knight can take to get to the target location.

knight1

My Approach:

So what is given here?

  1. The Knight’s movement is well defined i.e. it can only move in a ‘L’ shape.
  2. The Knight has option to move to any of the 8 different locations from it’s current position.

 

knight4

Now with these key points in mind, we can calculate which move will take the Knight closest to the given target i.e. move the knight in the direction of shortest distance.

For example: Suppose our target is located at (6,7).

cap1.PNG

Now, from (0,0) the Knight can move to following points: (1,2), (2,1), (-1,2), (-2,1), (-1,-2), (-2,-1), (1,-2) and (2,-1).

But out of these 8 points, the closest one is (1,2). So we move the knight to this position in first move. For each step we can use the same logic to move the Knight.

Hence, the Knight moves to (1,2) in first step, (3,3) in second, (4,5) in third and (6,6) in fourth move.

However, once the Knight reaches a proximity distance of 1 unit, we will have to keep in mind a separate logic to hit the target.

This is because if we stick to our logic of moving the knight towards the shortest distance in all places, what will happen is once the knight reaches a close proximity, the Knight will start going round and round the target but never actually hit the target.

From (6,6), the Knight could jump to (7,8). From (7,8) it could jump to (5,7) and to (7,5) and all but never actually jump to (6,7).

So, we have to create a separate rule for this scenario for our Knight.

Close Proximity Rule:

There are again 8 different close proximity points from the target location’s view point : (7,7), (7,8), (6,8), (5,8), (5,7), (5,6), (6,6) and (7,6).

These eight points can be categorized into 2 types. Either they lie on the axes or they lie on the diagonals from the given target.

Points on Axes: (7,7), (6,8), (5,7) and (6,6)

cart1

Points on Diagonals: (7,8), (5,8), (5,6) and (7,6)

cart2

Based on these two types, the Knight can hit the target in either 2 or 3 moves.

If the Knight was at a point on the diagonal say (5,6), from there it can jump (7,5) and then to (6,7) in 2 two moves. All diagonal points can access the target in the center in two steps.

Similarly, all points on the x-y axes can hit the target in 3 steps. For example, if the Knight was at (6,6), it can jump to points (8,7), then to (7,9) and finally t (6,7).

Now with this much knowledge we can create a program that can calculate the shortest path that our Knight has to take to hit any target on an infinite chess board!

My Implementation:

I have created a console application in C# to calculate the shortest path for the Knight to reach any stated target point with the above mentioned logic.

The application/project is available in git hub @ https://github.com/psovit/knightswatch

Please feel free to like, share and comment.

Thanks!

vitChess.jpg

7 Handy SQL Scripts for SQL Developers

Finally got my article published at Tuts+

I had written this article “7 Handy SQL Scripts for SQL Developers” referencing some of the common scripts that I most frequently use @ work. Some of these scripts were collected from different sites while some were developed by myself.

————————————————————————————

“A lot of things that we do depend upon the knowledge that we possess. If we are aware of what can be done, only then we can make smarter and more effective decisions. That is why it is always good to have quick tips and tricks handy in your pocket. This principle applies everywhere, including for MS-SQL developers.
Through this article I would like to share a few SQL scripts which have proven to be very useful for my daily job as a SQL developer. I’ll present a brief scenario about where each of these scripts can be used along with the scripts below. ……………………”

Read full article at:
http://code.tutsplus.com/tutorials/7-handy-sql-scripts-for-sql-developers–cms-25834—————————————————