Scrape the absolute URL instead of a relative path in python

In this case urlparse.urljoin helps you. You should modify your code like this-

import bs4 as bs4import urllib.requestfrom urlparse import  urljoinweb_url = 'https:www.example-page-xl.com'sauce = urllib.request.urlopen(web_url).read()soup = bs.BeautifulSoup(sauce,'lxml')section = soup.sectionfor url in section.find_all('a'):    print urljoin(web_url,url.get('href'))

here urljoin manage absolute and relative paths.

python beautifulsoup mechanize

urllib.parse.urljoin() might help. It does a join, but it is smart about it and handles both relative and absolute paths. Note this is python 3 code.

>>> import urllib.parse>>> base = 'https://www.example-page-xl.com'>>> urllib.parse.urljoin(base, '/helloworld/index.php') 'https://www.example-page-xl.com/helloworld/index.php'>>> urllib.parse.urljoin(base, 'https://www.example-page-xl.com/helloworld/index.php')'https://www.example-page-xl.com/helloworld/index.php'

python beautifulsoup mechanize

I see the solution mentioned here to be the most robust.

import urllib.parsedef base_url(url, with_path=False):    parsed = urllib.parse.urlparse(url)    path   = '/'.join(parsed.path.split('/')[:-1]) if with_path else ''    parsed = parsed._replace(path=path)    parsed = parsed._replace(params='')    parsed = parsed._replace(query='')    parsed = parsed._replace(fragment='')    return parsed.geturl()

CodeHunter

Scrape the absolute URL instead of a relative path in python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last