A useful function to replace certain texts for each files in a given folder:
Python Replace texts in all Files From a Folder and re-write
# Getting the current work directory (cwd)
thisdir = os.getcwd()
folder = thisdir + '\\path_to_your_folder'
# r=root, d=directories, f = files
for r, d, f in os.walk(folder):
for file in f:
fPath = os.path.join(r, file)
read_lines = 
with open(fPath, 'r') as fp:
read_lines = fp.readlines()
read_lines = [line.rstrip('\n') for line in read_lines]
with open(fPath, 'w') as fw:
for line in read_lines:
newText = line.replace("your text 1 to replace", "replace with").replace('your text 2 to replace', 'repalce with').replace('you got the idea', 'your text to replace with').strip()
newText = newText + ";"
# do something like append/prepend
if not newText.startswith('--'):
#skip certain lines if you want before writing
fw.write(newText + "\n")
GoodReads is a very good resource for info about books, authors and interesting quotations.
In this post, I will share a piece of code that will allow you to scrape for quotations from this site. The code is written for python’s Scrapy framework.
To get started with scraping quotes from your favorite author, first of all search for quotes by the author name in the quotes section.
Once you type the author’s name, you can look for css and xpath in the displayed results for finding pointers to scrape data.
Code For Spider
Now that we have data to scrape from, the next step is to create a spider that will scrape data from this page. A spider in scrapy is basically a class that you can use to scrape data from a location. You can find more info on scrapy here.
Basically, we want to loop over each “quoteDetail” section to get the author and quote text.
for sel in response.css('div.quoteDetails '):
quote = sel.css('div.quoteText::text').extract()
author = sel.css('div.quoteText a::text').extract_first()
item = GoodreadsItem()
item['author'] = author
item['quote'] = quote
Each quote gets extracted as a “GoodreadsItem” object.
Next, to scrape data from the next page, following code can be used:
checkNextPage = response.xpath('//a[@class="next_page"]').extract_first()
nextPageLink = response.xpath('//a[@class="next_page"]/@href').extract_first()
nextPageFullUrl = response.urljoin(nextPageLink)
That’s all the code needed for scraping. It’s quite easy and fun to scrape with Scrapy. Good luck!