How to check if two URL's lead to the same path? How to check if two URL's lead to the same path? express express

How to check if two URL's lead to the same path?


Before creating a new entry you can check work destination with

const existing = await UrlSchema.findOne({destination:req.body.destination});if(!existing){    // create new} else{    // return same}

This way you will be creating destination if it does not exist already. You can remove tariling slash(/) if it exists to match URLs better,


You've listed four slightly different URLs:

https://www.google.comhttps://google.comhttps://www.google.com/https://google.com/

None of these are technically the same https request, though it sounds like you want to assume that the / at the end is optional and thus does not make it a different target URL.

The last two are not guaranteed to be the same host as the first two. For google.com and www.google.com, they are the same host, but this is not guaranteed to be the case for all possible hosts.

If you want to assume that these four are all to the same host no matter what the domain is, then you just have to normalize the URL before you put it in your database and then before assigning a new shortened ID, you search the database for the normalized version of the URL.

In this case, you would remove the www. and remove any trailing slash to create the normalized version of the URL.

function normalizeUrl(url) {    // remove "www." if at first part of hostname    // remove trailing slash    return url.replace(/\/\/www\./, "//").replace(/\/$/, "");}

Once you've normalized the URL, you search for the normalized URL in your database. If you find it, you use the existing shortener for it. If you don't find it, you add the normalized version to your database with a newly generated shortId.

Here's a demo:

function normalizeUrl(url) {    // remove "www." if at first part of hostname    // remove trailing slash    return url.replace(/\/\/www\./i, "//").replace(/\/$/, "");}const testUrls = [    "https://www.google.com",    "https://www.google.com/",    "https://google.com",    "https://google.com/",];for (const url of testUrls) {    console.log(normalizeUrl(url));}

FYI, since hostnames in DNS are not case sensitive, you may also want to force the hostname to lower case to normalize it. Path names or query parameters could be case sensitive (sometimes they are and sometime they are not).

To include the host case sensitivity normalization, you could use this:

function normalizeUrl(url) {    // remove "www." if at first part of hostname    // remove trailing slash    // lowercase host name    return newUrl = url.replace(/\/\/www\./i, "//").replace(/\/$/, "").replace(/\/\/([^/]+)/, function(match, p1) {        // console.log(match, p1);        return "//" + p1.toLowerCase();    });}const testUrls = [    "https://www.google.com",    "https://www.google.com/",    "https://google.com",    "https://google.com/",    "https://WWW.google.com",    "https://www.Google.com/",    "https://GOOGLE.com",    "https://google.COM/",    "https://www.Google.com/xxx",     // this should be unique    "https://google.COM/XXX",         // this should be unique];for (const url of testUrls) {    console.log(normalizeUrl(url));}