Day 15: Python – Notes on Webscraping – Adventure @ Learning & Creativity!

A summary from the enticing post “How to Web Scrape with Python in 4 Minutes”:

LIBRARIES:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

Accessing the target URL:

url = ‘http://web.mta.info/developers/turnstile.html'
response = requests.get(url)

Nest the data using BeautifulSoup data structure (See the BeatifulSoup documentation).

soup = BeautifulSoup(response.text, “html.parser”)

Search for links:

soup.findAll('a')

Extract the link:

one_a_tag = soup.findAll(‘a’)[36]
link = one_a_tag[‘href’]

Another approach is at A Beginner’s Guide to learn web scraping with python!. This uses Selenium (a web testing library for automating browser activities), BeautifulSoup (for parsing HTML and XML documents), Pandas (for data manipulation and analysis – to extract the data and store it in the desired format).

More approaches:

Practical Introduction to Web Scraping in Python
Beginner’s guide to Web Scraping in Python using BeautifulSoup
Web Scraping using Python
Tutorial: Python Web Scraping Using BeautifulSoup
Ultimate Guide to Web Scraping with Python Part 1: Requests and BeautifulSoup

Happy Scraping!

Adventure @ Learning & Creativity!

Let us do something Fun & Meaningful

Day 15: Python – Notes on Webscraping

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply