Scraping Jobs From MeroJob

This is a snippet of Python’s Scrapy code that I recently wrote to crawl and scrape job listings from a Nepal’s leading job listing site MeroJob. 

I am planning to use data scraped from this site to perform some data analysis on the current job market trend of Nepal.

Web Scraping Quotes From Good Reads

Introduction

GoodReads is a very good resource for info about books, authors and interesting quotations.

In this post, I will share a piece of code that will allow you to scrape for quotations from this site. The code is written for python’s Scrapy framework.

Getting Started

To get started with scraping quotes from your favorite author, first of all search for quotes by the author name in the quotes section.

Quote Search Section

Once you type the author’s name, you can look for css and xpath in the displayed results for finding pointers to scrape data.

Looking For Xpaths

Code For Spider

Now that we have data to scrape from, the next step is to create a spider that will scrape data from this page. A spider in scrapy is basically a class that you can use to scrape data from a location. You can find more info on scrapy here.

Basically, we want to loop over each “quoteDetail” section to get the author and quote text.


for sel in response.css('div.quoteDetails '):
quote = sel.css('div.quoteText::text').extract()
author = sel.css('div.quoteText a::text').extract_first()
item = GoodreadsItem()
item['author'] = author
item['quote'] = quote
yield item

Each quote gets extracted as a “GoodreadsItem” object.

Next, to scrape data from the next page, following code can be used:


checkNextPage = response.xpath('//a[@class="next_page"]').extract_first()
if(len(checkNextPage)>0):
nextPageLink = response.xpath('//a[@class="next_page"]/@href').extract_first()
nextPageFullUrl = response.urljoin(nextPageLink)
print(nextPageFullUrl)

Conclusion

That’s all the code needed for scraping. It’s quite easy and fun to scrape with Scrapy. Good luck!

 

Scraping Data From Wikipedia

Web Scraping Data For Hollywood Actors/Actresses Info

A friend of mine and I were discussing about the divorce rate/trends of Hollywood actors. Basically she was claiming that most of the actors have multiple wives and have gone through divorce in Hollywood. I was not so much in agreement and so came this project.

I created a project using Scrapy, a web scraping framework in Python to browse through each actors/actresses wiki link and look up info about their marital statuses.

Just for fun only 🙂

Here’s the link to the project codes if anyone else is interested in scraping and getting similar data from wikipedia.

Git

Do let me know if you are interested and need help in running the scripts.

Youtube Crawler

Here’s a web crawling script to crawl videos from Youtube:

https://github.com/psovit/yt_crawler.git

The script is written in Python’s Scrapy framework and can be used for finding full list of videos easily by various filters like ‘keyword’, ‘view count’, ‘channel’, ‘votes’ etc. easily.

Feel free to clone/fork/modify and let me know if you need any help running it.

Thanks!