A summary from the enticing post “How to Web Scrape with Python in 4 Minutes”:
LIBRARIES:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
Accessing the target URL:
url = ‘http://web.mta.info/developers/turnstile.html'
response = requests.get(url)
Nest the data using BeautifulSoup data structure (See the BeatifulSoup documentation).
soup = BeautifulSoup(response.text, “html.parser”)
Search for links:
soup.findAll('a')
Extract the link:
one_a_tag = soup.findAll(‘a’)[36]
link = one_a_tag[‘href’]
Another approach is at A Beginner’s Guide to learn web scraping with python!. This uses Selenium (a web testing library for automating browser activities), BeautifulSoup (for parsing HTML and XML documents), Pandas (for data manipulation and analysis – to extract the data and store it in the desired format).
More approaches:
- Practical Introduction to Web Scraping in Python
- Beginner’s guide to Web Scraping in Python using BeautifulSoup
- Web Scraping using Python
- Tutorial: Python Web Scraping Using BeautifulSoup
- Ultimate Guide to Web Scraping with Python Part 1: Requests and BeautifulSoup
Happy Scraping!