Day 15: Python – Notes on Webscraping

A summary from the enticing post “How to Web Scrape with Python in 4 Minutes”:

LIBRARIES:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

Accessing the target URL:

url = ‘http://web.mta.info/developers/turnstile.html'
response = requests.get(url)

Nest the data using BeautifulSoup data structure (See  the BeatifulSoup documentation).

soup = BeautifulSoup(response.text, “html.parser”)

Search for links:

soup.findAll('a')

Extract the link:

one_a_tag = soup.findAll(‘a’)[36]
link = one_a_tag[‘href’]

Another approach is at A Beginner’s Guide to learn web scraping with python!. This uses Selenium (a web testing library for automating browser activities), BeautifulSoup (for parsing HTML and XML documents), Pandas (for data manipulation and analysis – to extract the data and store it in the desired format).

More approaches:

Happy Scraping!

 

 

Leave a comment