Web Scraping Quotes From Good Reads

Introduction

GoodReads is a very good resource for info about books, authors and interesting quotations.

In this post, I will share a piece of code that will allow you to scrape for quotations from this site. The code is written for python’s Scrapy framework.

Getting Started

To get started with scraping quotes from your favorite author, first of all search for quotes by the author name in the quotes section.

Quote Search Section

Once you type the author’s name, you can look for css and xpath in the displayed results for finding pointers to scrape data.

Looking For Xpaths

Code For Spider

Now that we have data to scrape from, the next step is to create a spider that will scrape data from this page. A spider in scrapy is basically a class that you can use to scrape data from a location. You can find more info on scrapy here.

Basically, we want to loop over each “quoteDetail” section to get the author and quote text.


for sel in response.css('div.quoteDetails '):
quote = sel.css('div.quoteText::text').extract()
author = sel.css('div.quoteText a::text').extract_first()
item = GoodreadsItem()
item['author'] = author
item['quote'] = quote
yield item

Each quote gets extracted as a “GoodreadsItem” object.

Next, to scrape data from the next page, following code can be used:


checkNextPage = response.xpath('//a[@class="next_page"]').extract_first()
if(len(checkNextPage)>0):
nextPageLink = response.xpath('//a[@class="next_page"]/@href').extract_first()
nextPageFullUrl = response.urljoin(nextPageLink)
print(nextPageFullUrl)

Conclusion

That’s all the code needed for scraping. It’s quite easy and fun to scrape with Scrapy. Good luck!

 

Divergent Color

Color Usage In Data Analysis

Data Visualization is an integral part of Data Science and Data Analysis. It is a way of beautifully presenting information rather than using traditional spreadsheets and reports.

Humans, by nature, can more easily process information when provided with beautiful visualization as our brains are designed this way. By the use of right visualization, we can group chunks of data into categories, highlight areas that need our attention, or show the progressive growth/decay of our products.

More importantly, as David McCandless describes in his famous Ted Talk “The Beauty Of Data Visualization”, you start to see patterns and connections between numbers which would otherwise be scattered across multiple reports with the help of data visualization.

Now, the most important aspect of Data Visualization is of course the use of colors. Most importantly, good colors that fit the context of your analysis. Without the right choice of colors, your visualization could turn into nasty looking color eruptions.

In this post I will talk about choosing the right kind of colors for data visualization purpose and I will do so by taking help from a Color Brewer package used in R for data analysis.

R Color Brewer

All color choices from Color Brewer

Typically, color usage can be categorized into three different types based on our data analysis needs.

1) Sequential: When you want to show growth or increase in something, you should pick sequential color scheme. Basically, this relates to sequentially ordered numbers and so it can be used to show progression from very small to the very big. In the picture above, the first section of colors relate to Sequential usage. You see they just get darker and darker starting with lighter values.

Sequential Color Usage

Sequential Color Usage

2) Qualitative: When you want to show the different variety of something without giving any emphasize to the numbers behind them, you should pick qualitative color scheme. These are essentially used to show different categories. So, if you have a bunch of different political parties, you might just show each one of them with different colors. Or if you want to show a different countries in a map or different species of animals, you would use different colors. The colors here are usually of the same light/dark values.

Qualitative Color

3) Divergent: Finally, when you want to show two extreme values in your data, you should pick divergent color scheme. This scheme has very light shade in the middle, and then they get darker and darker to different colors going out each side. That’s a way of showing high and low values on something. The highs and the lows and the neutrals are easily visualized here.

Divergent Color

That’s the basics of picking the right colors for data visualization as per the context of data analysis. I hope it helps you to create beautiful visualizations in your reports!

 

Becoming An Indie Game Developer From A Programmer Background

As Wikipedia defines it,

Independent video game development is the video game development process of creating indie games; these are video games, commonly created by individual or small teams of video game developers and usually without significant financial support of a video game publisher or other outside source.”

While both game development and regular software development share a few things in common, the differences weigh a lot more.

Unlike a software, a game is a mix of many other components besides regular programming logic. Graphics, sound and animation play vital roles in any game and these are not the objects of regular interactions for a software programmer. It takes years and years to become masters of these arts in themselves. Plus a game requires advanced knowledge of Math and Newtonian Physics!

So it’s not an easy switch from software development to game development.

Here’s a very good read on Quora on this topic:

https://www.quora.com/Can-one-developer-make-a-successful-indie-game

As you can read from the answers on Quora, there are some people who have “moderately” successfully managed to jump into indie game development coming from programming background.

While revenue wise, the first few years do not look as good as the regular full-time software development jobs, if someone keeps at it for a long time, there could be a brighter future.

If all goes well, as Joe Cassavaugh dreams,  maybe someday indie game developers could stop being a one-man-shop and turn their games into a game like Clash Royale that generate daily revenues of $1,992,870.

Checklist For Xamarin Forms Development

Here’s a list to help you get started with cross-platform mobile app development using Xamarin Forms.

  1. Xamarin
  2. Visual Studio 2015
  3. Android SDK
  4. Windows 10 OS
    1. While you can work with Windows 8 as well, you will need Windows 10 operating system if you are going to develop for Windows 10 Universal App.
  5. Mac with Xcode and Xamarin installed
    1. Again this is not a mandatory to get started with Xamarin but without a Mac you will not be able to develop applications for iPhone.

If you have all the items on this list installed/setup then you are ready to get started with Xamarin Forms Development!

Daily Reflection

unnamed

Daily Reflection is designed to draw your attention towards internal movement within yourself.

A collection of thought provoking questions aimed to raise your level of awareness. Use this app daily towards the end of each day to reach to your contemplative state of mind.

Reflect daily upon these questions to bring more awareness in your life.

Download link: https://play.google.com/store/apps/details?id=com.pso.dailyreflection

A small android app to nudge you to raise your awareness level daily.icon

Thanks!

Getting File Content using GIT Hub API

The GIT Hub itself has a fairly complete tutorial for accessing GIT hub uploaded files.

https://developer.github.com/v3/

However, I think it can still use some little extra explanation.

Make sure you get the following steps right for this:

  • Depending upon master/branch, get your URL for the repository ready first.
  • Next, get the correct header inserted into your HTTP request.
  • Make sure you are requesting data in ‘jsonp’ format
  • Extracting the content of your response can also get tricky. make sure to look into ‘data.contents’ value.
  • Finally, decode the encoded content.

Here’s my implementation API call:


var gitURL = 'https://api.github.com/';
var gitAPIServiceOptions = {
url: gitURL + yourGitHubFilePath,
requestType: "GET",
dataType: "JSONP",
httpHeader: 'Accept',
headerValue:'application/vnd.github.v3.raw+json',
callBack: function(result) {
if (result.success === true) {
var decodedResult = atob(result.data.data.content) ;
}
}
};

Thanks!

A Knight’s Watch

Recently I came across this interesting problem through Toptal (on Codility) which kept me thinking hard for a few days. Finally, I now have a solution to this problem and would like to share it here.

Problem:

Basically, the problem deals with a knight piece on an infinite chess board. Assuming the knight is positioned at (0,0) you are asked to calculate and print the shortest path that the knight can take to get to the target location.

knight1

My Approach:

So what is given here?

  1. The Knight’s movement is well defined i.e. it can only move in a ‘L’ shape.
  2. The Knight has option to move to any of the 8 different locations from it’s current position.

 

knight4

Now with these key points in mind, we can calculate which move will take the Knight closest to the given target i.e. move the knight in the direction of shortest distance.

For example: Suppose our target is located at (6,7).

cap1.PNG

Now, from (0,0) the Knight can move to following points: (1,2), (2,1), (-1,2), (-2,1), (-1,-2), (-2,-1), (1,-2) and (2,-1).

But out of these 8 points, the closest one is (1,2). So we move the knight to this position in first move. For each step we can use the same logic to move the Knight.

Hence, the Knight moves to (1,2) in first step, (3,3) in second, (4,5) in third and (6,6) in fourth move.

However, once the Knight reaches a proximity distance of 1 unit, we will have to keep in mind a separate logic to hit the target.

This is because if we stick to our logic of moving the knight towards the shortest distance in all places, what will happen is once the knight reaches a close proximity, the Knight will start going round and round the target but never actually hit the target.

From (6,6), the Knight could jump to (7,8). From (7,8) it could jump to (5,7) and to (7,5) and all but never actually jump to (6,7).

So, we have to create a separate rule for this scenario for our Knight.

Close Proximity Rule:

There are again 8 different close proximity points from the target location’s view point : (7,7), (7,8), (6,8), (5,8), (5,7), (5,6), (6,6) and (7,6).

These eight points can be categorized into 2 types. Either they lie on the axes or they lie on the diagonals from the given target.

Points on Axes: (7,7), (6,8), (5,7) and (6,6)

cart1

Points on Diagonals: (7,8), (5,8), (5,6) and (7,6)

cart2

Based on these two types, the Knight can hit the target in either 2 or 3 moves.

If the Knight was at a point on the diagonal say (5,6), from there it can jump (7,5) and then to (6,7) in 2 two moves. All diagonal points can access the target in the center in two steps.

Similarly, all points on the x-y axes can hit the target in 3 steps. For example, if the Knight was at (6,6), it can jump to points (8,7), then to (7,9) and finally t (6,7).

Now with this much knowledge we can create a program that can calculate the shortest path that our Knight has to take to hit any target on an infinite chess board!

My Implementation:

I have created a console application in C# to calculate the shortest path for the Knight to reach any stated target point with the above mentioned logic.

The application/project is available in git hub @ https://github.com/psovit/knightswatch

Please feel free to like, share and comment.

Thanks!

vitChess.jpg

Revealing Module Pattern

Using closures in javascript, we can create public and private methods.

revealing

Only the methods and variables that are specifically returned are available publicly and we can provide references to privately declared methods and variables inside the public methods.

Layout for RMP:

Automating Development in ASP.NET MVC

Inspired by rapid development framework Artisan in Laravel for PHP, I am thinking maybe we (.NET developers) can do something similar in our environment as well.

http://laravel.com/docs/5.0/artisan

Basically I am thinking of automating the whole process of creating Repository classes (including required entries for properties and fields), Model classes, ¬†Service layer (layer where you put your business logic) interfaces and classes and maybe even Controllers and Views! Wouldn’t this be real good treat if all of this code writing could be automated!! Unless you have already seen this automation implemented by someone somewhere elsewhere, I am sure you are excited by this idea.

So this is the big idea. How do I plan to achieve this?

Well, I do all my projects in MVC pattern implementing Entity Framework and Unit of Work Pattern for Data Layer and Service Layer. Basically, I keep a project for Data Access Layer where the .edmx file generated by EF stays along with repositories for each Database objects. Then I expose the repositories for each Entities by a central repository using a Unit of Work pattern.

Then I keep another project for Model classes. For every entities generated by EF, I have models in this project.

Then I have third project for Service (or Business Logic) Layer where I have interfaces and their implementations to do all CRUD activities.

Finally for displaying the View to the user, I use a ASP.NET MVC project where I have controllers for each Entities. These controllers receive request from views and process data as per the request using the Interface exposed by Service Layer. The response sent from controllers to View is usually in JSON format.

The big realization is that no matter what type of entities we are dealing with the whole process of creating Repositories, Models, Services, Controllers and even the Views stays the same. The only changes that we will find are in the Members (Properties) of our Entities.

If we are dealing with User entity, we will have properties like Username, Password and Email and if we are dealing with Product entity, we will have properties like ProductName, ProductPrice, Category and so on.

Hence, if we just replace the properties and entity names, everything stays the same. This opens up the possibility for automating this whole process like I mentioned in the beginning of the post. Heck if I am able to automate only 70% of the process and make necessary adjustments for the remaining 30%, I would be more than happy.

I will be giving this a try and will write about it once its done. What are your thoughts on this?