Get domain name from given url

java url

If you want to parse a URL, use java.net.URI. java.net.URL has a bunch of problems -- its equals method does a DNS lookup which means code using it can be vulnerable to denial of service attacks when used with untrusted inputs.

"Mr. Gosling -- why did you make url equals suck?" explains one such problem. Just get in the habit of using java.net.URI instead.

public static String getDomainName(String url) throws URISyntaxException {    URI uri = new URI(url);    String domain = uri.getHost();    return domain.startsWith("www.") ? domain.substring(4) : domain;}

should do what you want.

Though It seems to work fine, is there any better approach or are there some edge cases, that could fail.

Your code as written fails for the valid URLs:

httpfoo/bar -- relative URL with a path component that starts with http.
HTTP://example.com/ -- protocol is case-insensitive.
//example.com/ -- protocol relative URL with a host
www/foo -- a relative URL with a path component that starts with www
wwwexample.com -- domain name that does not starts with www. but starts with www.

Hierarchical URLs have a complex grammar. If you try to roll your own parser without carefully reading RFC 3986, you will probably get it wrong. Just use the one that's built into the core libraries.

If you really need to deal with messy inputs that java.net.URI rejects, see RFC 3986 Appendix B:

Appendix B. Parsing a URI Reference with a Regular Expression
As the "first-match-wins" algorithm is identical to the "greedy" disambiguation method used by POSIX regular expressions, it is natural and commonplace to use a regular expression for parsing the potential five components of a URI reference.
The following line is the regular expression for breaking-down a well-formed URI reference into its components.
  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?   12            3  4          5       6  7        8 9
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis).

java url

import java.net.*;import java.io.*;public class ParseURL {  public static void main(String[] args) throws Exception {    URL aURL = new URL("http://example.com:80/docs/books/tutorial"                       + "/index.html?name=networking#DOWNLOADING");    System.out.println("protocol = " + aURL.getProtocol()); //http    System.out.println("authority = " + aURL.getAuthority()); //example.com:80    System.out.println("host = " + aURL.getHost()); //example.com    System.out.println("port = " + aURL.getPort()); //80    System.out.println("path = " + aURL.getPath()); //  /docs/books/tutorial/index.html    System.out.println("query = " + aURL.getQuery()); //name=networking    System.out.println("filename = " + aURL.getFile()); ///docs/books/tutorial/index.html?name=networking    System.out.println("ref = " + aURL.getRef()); //DOWNLOADING  }}

java url

Here is a short and simple line using InternetDomainName.topPrivateDomain() in Guava: InternetDomainName.from(new URL(url).getHost()).topPrivateDomain().toString()

Given http://www.google.com/blah, that will give you google.com. Or, given http://www.google.co.mx, it will give you google.co.mx.

As Sa Qada commented in another answer on this post, this question has been asked earlier: Extract main domain name from a given url. The best answer to that question is from Satya, who suggests Guava's InternetDomainName.topPrivateDomain()

public boolean isTopPrivateDomain()
Indicates whether this domain name is composed of exactly one subdomain component followed by a public suffix. For example, returns true for google.com and foo.co.uk, but not for www.google.com or co.uk.
Warning: A true result from this method does not imply that the domain is at the highest level which is addressable as a host, as many public suffixes are also addressable hosts. For example, the domain bar.uk.com has a public suffix of uk.com, so it would return true from this method. But uk.com is itself an addressable host.
This method can be used to determine whether a domain is probably the highest level for which cookies may be set, though even that depends on individual browsers' implementations of cookie controls. See RFC 2109 for details.

Putting that together with URL.getHost(), which the original post already contains, gives you:

import com.google.common.net.InternetDomainName;import java.net.URL;public class DomainNameMain {  public static void main(final String... args) throws Exception {    final String urlString = "http://www.google.com/blah";    final URL url = new URL(urlString);    final String host = url.getHost();    final InternetDomainName name = InternetDomainName.from(host).topPrivateDomain();    System.out.println(urlString);    System.out.println(host);    System.out.println(name);  }}

CodeHunter

Get domain name from given url

Appendix B. Parsing a URI Reference with a Regular Expression

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last