Scraping For Scooters From Hamrobazaar is one of the most popular forum based e-commerce website from Nepal. One can find various items in this site. But finding the right item that is on sale, for example say finding a Scooter, it can be quite a challenging task because of the number of entries and variations.

So, here is a Scrapy script for your rescue!


import scrapy
import re
from hb_scrape.items import HbScrapeItem
from scrapy.selector import Selector
from scrapy.http import HtmlResponse
from scrapy.http.request import Request
import json
import requests

class scooty(scrapy.Spider):
name = "scooty"

def start_requests(self):
filters = ["scooty", "scooter","scoter","scotty"]
linkUrl = '{0}&Search.x=0&Search.y=0&catid_search=0'

for i in range(0, len(filters)):
url = linkUrl.replace('{0}',filters[i])
yield Request(url, self.parse)

def parse(self, response):

for sel in response.xpath('//td[@bgcolor="#ECF0F6"]/a'):
aLink = Selector(text=sel.extract()).xpath('//@href').extract_first()
if 'useritems' not in aLink:
url = response.urljoin(aLink)
yield Request(url, self.parseAdLink)

for sel in response.xpath('//td[@bgcolor="#F2F4F9"]/a'):
aLink = Selector(text=sel.extract()).xpath('//@href').extract_first()
if 'useritems' not in aLink:
url = response.urljoin(aLink)
yield Request(url, self.parseAdLink)

nextLink = response.xpath('//u[contains(text(),"Next")]')
nextAlink = nextLink.xpath('../../@href').extract_first()
fullNextLink = response.urljoin(nextAlink)
yield Request(fullNextLink, self.parse)

def parseAdLink(self, response):
item = HbScrapeItem()
title = response.xpath('//span[@class="title"]//text()').extract()
item['adTitle'] = ''.join(title)

adPostDateLabel = response.xpath('//td[contains(text(),"Ad Post Date:")]')
item['adPostDate'] = adPostDateLabel.xpath('../td[2]/text()').extract_first()

adViewsLabel = response.xpath('//td[contains(text(),"Ad Views:")]')
item['adViewsCount'] = adViewsLabel.xpath('../td[2]/text()').extract_first()

sellerLabel = response.xpath('//td[contains(text(),"Sold by:")]')
item['seller'] = sellerLabel.xpath('../td[2]/text()').extract_first()

sellerPhoneLabel = response.xpath('//td[contains(text(),"Mobile Phone:")]')
item['sellerPhone'] = sellerPhoneLabel.xpath('../td[2]/text()').extract_first()

sellerAddressLabel = response.xpath('//td[contains(text(),"Location:")]')
address = sellerAddressLabel.xpath('../td[2]/text()').extract()
item['address'] = ' '.join(address)

priceLabel = response.xpath('//td[contains(text(),"Price:")]')
item['price'] = priceLabel.xpath('../td[2]//text()').extract_first()

makeYearLabel = response.xpath('//td[contains(text(),"Make Year:")]')
item['makeYear'] = makeYearLabel.xpath('../td[2]/text()').extract_first()

lotNumLabel = response.xpath('//td[contains(text(),"Lot No:")]')
item['lotNumber'] = lotNumLabel.xpath('../td[2]/text()').extract_first()

featuresLabel = response.xpath('//td[contains(text(),"Features:")]')
item['features'] = featuresLabel.xpath('../td[2]/text()').extract_first()

item['adUrl'] = response.url

yield item

Scraping Jobs From MeroJob

This is a snippet of Python’s Scrapy code that I recently wrote to crawl and scrape job listings from a Nepal’s leading job listing site MeroJob. 

I am planning to use data scraped from this site to perform some data analysis on the current job market trend of Nepal.

Web Scraping Quotes From Good Reads


GoodReads is a very good resource for info about books, authors and interesting quotations.

In this post, I will share a piece of code that will allow you to scrape for quotations from this site. The code is written for python’s Scrapy framework.

Getting Started

To get started with scraping quotes from your favorite author, first of all search for quotes by the author name in the quotes section.

Quote Search Section

Once you type the author’s name, you can look for css and xpath in the displayed results for finding pointers to scrape data.

Looking For Xpaths

Code For Spider

Now that we have data to scrape from, the next step is to create a spider that will scrape data from this page. A spider in scrapy is basically a class that you can use to scrape data from a location. You can find more info on scrapy here.

Basically, we want to loop over each “quoteDetail” section to get the author and quote text.

for sel in response.css('div.quoteDetails '):
quote = sel.css('div.quoteText::text').extract()
author = sel.css('div.quoteText a::text').extract_first()
item = GoodreadsItem()
item['author'] = author
item['quote'] = quote
yield item

Each quote gets extracted as a “GoodreadsItem” object.

Next, to scrape data from the next page, following code can be used:

checkNextPage = response.xpath('//a[@class="next_page"]').extract_first()
nextPageLink = response.xpath('//a[@class="next_page"]/@href').extract_first()
nextPageFullUrl = response.urljoin(nextPageLink)


That’s all the code needed for scraping. It’s quite easy and fun to scrape with Scrapy. Good luck!


Scraping Data From Wikipedia

Web Scraping Data For Hollywood Actors/Actresses Info

A friend of mine and I were discussing about the divorce rate/trends of Hollywood actors. Basically she was claiming that most of the actors have multiple wives and have gone through divorce in Hollywood. I was not so much in agreement and so came this project.

I created a project using Scrapy, a web scraping framework in Python to browse through each actors/actresses wiki link and look up info about their marital statuses.

Just for fun only 🙂

Here’s the link to the project codes if anyone else is interested in scraping and getting similar data from wikipedia.


Do let me know if you are interested and need help in running the scripts.

Youtube Crawler

Here’s a web crawling script to crawl videos from Youtube:

The script is written in Python’s Scrapy framework and can be used for finding full list of videos easily by various filters like ‘keyword’, ‘view count’, ‘channel’, ‘votes’ etc. easily.

Feel free to clone/fork/modify and let me know if you need any help running it.