Parsing Web Pages with Python

Scraping and parsing text from websites can be done with urlopen request method .
Python libraray urllib contains tools for working with URLs.

You can fetch all the html code by providing the urllink .

import urllib
from urllib.request import urlopen

url_link = "https://en.wikipedia.org/wiki/Liverpool_F.C."

pageResponse = urlopen(url_link)
print(pageResponse)

html_bytes = pageResponse.read()
html = html_bytes.decode("utf-8")
print(html)

Once all the html text is parsed you can write logic to extract specific strings and texts from the whole page.

You can use advanced python libraries like Beautiful Soup and Scrapy for parsing structured data.

Leave a Reply