MCA
WEB SCRAPING // MASTER THE CONCEPTS // WRITE BRUTAL CODE // WEB SCRAPING // MASTER THE CONCEPTS // WRITE BRUTAL CODE //
BACK TO SYLLABUS
MEDIUM

WEB SCRAPING

BeautifulSoup and requests library

CONCEPTS

01HTTP requests (GET, POST)
02Parsing HTML with BeautifulSoup
03Finding elements by tag/class/id
04Extracting attributes and text
05Handling pagination
06Respecting robots.txt

SYNTAX_DEMO

Extracting data from the web
import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url)

if response.status_code == 200:
    soup = BeautifulSoup(response.text, "html.parser")
    # Get the title
    title = soup.title.string
    print("Page Title:", title)
    
    # Find all links
    for link in soup.find_all("a"):
        print(link.get("href"))