Selenium python: get all the <li> text of all the <ul> from a <div>

Try waiting for the page to fully load before parsing it, one way is to use the time.sleep() method:

from time import sleep...for url in listURL:    driver.get(url)    sleep(5)    ...

EDIT: Using BeautifulSoup:

import requestsfrom bs4 import BeautifulSouplistURL = [    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_1",    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_2",    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Phrases_1",    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Negative_1",]list_text = []for url in listURL:    soup = BeautifulSoup(requests.get(url).content, "html.parser")    print("Link:", url)        for tag in soup.select("[id*=Lesson]:not([id*=Lessons])"):        print(tag.text)        print()        print(tag.find_next("ul").text)        print("-" * 80)    print()

Output (truncated):

Link: https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_1Lesson 1man = manvrouw = womanjongen = boyik = Iben = ameen = a/anen = and--------------------------------------------------------------------------------Lesson 2meisje = girlkind = child/kidhij = heze = she (unstressed)is = isof = or--------------------------------------------------------------------------------Lesson 3appel = apple... And on

If you want the output as a list:

for url in listURL:    soup = BeautifulSoup(requests.get(url).content, "html.parser")    print("Link:", url)    print([tag.text for tag in soup.select(".mw-parser-output > ul li")])    print("-" * 80)

python selenium xpath html-parsing

Your script seems to be ok, but I'd add explicit or implicit wait.Try to wait till all elements on a page are visible:

from selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.wait import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECoptions = webdriver.ChromeOptions()options.add_argument('headless')  # start chrome without opening windowdriver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver', options=options)listURL = [    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_1",    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_2",    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Phrases_1",    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Negative_1",]list_text = []for url in listURL:    driver.get(url)    WebDriverWait(driver, 15).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="mw-content-text"]/div/ul')))    elem = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div/ul')    for each_ul in elem:        all_li = each_ul.find_elements_by_tag_name("li")        for li in all_li:            list_text.append(li.text)print(list_text)

Also, you can add driver.implicitly_wait(15) right after you declare driver.

Output:

['man = man', 'vrouw = woman', 'jongen = boy', 'ik = I', 'ben = am', 'een = a/an', 'en = and', 'meisje = girl', 'kind = child/kid', 'hij = he', 'ze = she (unstressed)', 'is = is', 'of = or', 'appel = apple', 'melk = milk', 'drinkt = drinks (2nd and 3rd person singular)', 'drink = drink (1st person singular)', 'eet = eat(s) (singular)', 'de = the', 'sap = juice', 'water = water', 'brood = bread', 'het = it, the', 'je = you (singular informal, unstressed)', 'bent = are (2nd person singular)', 'Zijn (to be)', 'Hebben (to have)', 'Mogen (to be allowed to)', 'Willen (to want)', 'Kunnen (to be able to)', 'Zullen ("will")', 'boterham = sandwich', 'rijst = rice', 'we = we (unstressed)', 'jullie = you (plural informal)', 'eten = eat (plural)', 'drinken = drink (plural)', 'vrouwen = women', 'mannen = men', 'meisjes = girls', 'krant = newspaper', 'lezen = read (plural)', 'jongens = boys', 'menu = menu', 'dat = that', 'zijn = are (plural)', 'ze = they (unstressed)', 'heb = have (1st person singular)', 'heeft = has (3rd person singular)', 'hebt = have (2nd person singular)', 'hebben = have (plural)', 'boek = book', 'lees = read (1st person singular)', 'leest = read(s) (2nd and 3rd person singular)', 'kinderen = children', 'spreken = speak (plural)', 'spreek = speak (1st person singular)', 'spreekt = speak(s) (2nd and 3rd person singular)', 'hallo = hello', 'bedankt = thanks', 'doei = bye', 'dag = goodbye', 'tot ziens = see you later', 'hoi = hi', 'goedemorgen = good morning', 'goededag = good day', 'goedenavond = good evening', 'goedenacht = good night', 'welterusten = good night', 'ja = yes', 'dank je wel = thank you very much', 'alsjeblieft = please', 'sorry = sorry', 'het spijt me = I am sorry', 'oké = okay', 'pardon = excuse me', 'hoe gaat het = how are you', 'goed = good, fine, well', 'dank je = thank you', '(een) beetje = (a) bit of', 'Engels = English', 'Nederlands = Dutch', 'Geen: negating indefinite nouns (you can think of it as "no" things or "none of" a thing if that helps). Geen replaces the indefinite pronoun in question.', 'Niet: negating a verb, adjective or definite nouns. Niet comes at the end of a sentence or directly after the verb zijn.', 'nee = no', 'niet = not', 'geen = not']

Update:I found a more reliable way with CSS selectors. Try it please:

from selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.wait import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECoptions = webdriver.ChromeOptions()options.add_argument('headless')  # start chrome without opening windowdriver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver', options=options)driver.implicitly_wait(15)listURL = [    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_1",    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Basics_2",    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Phrases_1",    "https://duolingo.fandom.com/wiki/Dutch_(NL)_Skill:Negative_1",]list_text = []for url in listURL:    driver.get(url)wait = WebDriverWait(driver, 15)wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id*='google_ads_iframe'] ")))wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '.mw-parser-output>ul')))    elem = driver.find_elements_by_css_selector('.mw-parser-output>ul')    for each_ul in elem:        all_li = each_ul.find_elements_by_css_selector("li")        for li in all_li:            list_text.append(li.text)print(list_text)

Update 2After trying to understand the reason I found out that ads take the most of the time of loading. So I'm adding wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id*='google_ads_iframe'] "))) that waits till all ads are loaded.

I also changed the second wait to .mw-parser-output>ul by removing last li. I think it is not necessary. You can also try removing the second wait and see if it helps.

python selenium xpath html-parsing

After

WebDriverWait(driver, 15).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="mw-content-text"]/div/ul')))

you need to add some sleep, I guess time.sleep(1) will be enough and only after that do

elem = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div/ul')

Your problem is caused by misunderstanding visibility_of_all_elements_located functionality.
It is not actually waiting for all the elements located by the locator you passing it to become visible, it has no idea for what amount of such elements to wait.
So once it detects at least 1 element matching your locator visible - it returns the list of detected elements and the program continues forward.
See more details about those methods here and in the official documentation.

CodeHunter

Selenium python: get all the <li> text of all the <ul> from a <div>

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last