How can I get a Wikipedia article's text using Python 3 with Beautiful Soup? How can I get a Wikipedia article's text using Python 3 with Beautiful Soup? python python

How can I get a Wikipedia article's text using Python 3 with Beautiful Soup?


There is a much, much more easy way to get information from wikipedia - Wikipedia API.

There is this Python wrapper, which allows you to do it in a few lines only with zero HTML-parsing:

import wikipediaapiwiki_wiki = wikipediaapi.Wikipedia('en')page = wiki_wiki.page('Mathematics')print(page.summary)

Prints:

Mathematics (from Greek μάθημα máthēma, "knowledge, study, learning")includes the study of such topics as quantity, structure, space, andchange...(omitted intentionally)

And, in general, try to avoid screen-scraping if there's a direct API available.


select the <p> tag. There are 52 elements. Not sure if you want the whole thing, but you can iterate through those tags to store it as you may. I just chose to print each of them to show the output.

import bs4import requestsresponse = requests.get("https://en.wikipedia.org/wiki/Mathematics")if response is not None:    html = bs4.BeautifulSoup(response.text, 'html.parser')    title = html.select("#firstHeading")[0].text    paragraphs = html.select("p")    for para in paragraphs:        print (para.text)    # just grab the text up to contents as stated in question    intro = '\n'.join([ para.text for para in paragraphs[0:5]])    print (intro)


Use the library wikipedia

import wikipedia#print(wikipedia.summary("Mathematics"))#wikipedia.search("Mathematics")print(wikipedia.page("Mathematics").content)