Logo

MonoCalc

/

URL Canonicalizer

Encode/Decode

Canonicalization Settings

About This Tool

🔗 URL Canonicalizer – Normalize Any URL to Its Standard Form

A URL Canonicalizer converts any raw, messy, or malformed URL into its single, authoritative canonical form. Whether you are an SEO professional tracking duplicate pages, a developer normalizing endpoints before storage, or a security researcher analyzing URL-based injection attacks, having a consistent URL representation is essential.

What is URL Canonicalization?

Canonicalization is the process of resolving all equivalent representations of a URL into one agreed-upon form. The same web resource can be accessed via dozens of technically distinct strings — for example:

HTTP://Example.COM:80/a/b/../c?foo=bar
http://example.com/a/c?foo=bar   ← canonical form

Without canonicalization, search engines may index the same page multiple times, analytics tools report split traffic, and API comparisons fail silently. RFC 3986 defines the authoritative set of normalization rules that all conformant URL parsers must follow.

Normalization Rules Applied

The tool applies up to seven independent normalization steps, each tracked individually:

StepRuleExample
1Lowercase scheme & hostHTTP://EXAMPLE.COM → http://example.com
2Remove default portshttp://host:80/ → http://host/
3Resolve dot segments/a/b/../c → /a/c
4Decode unreserved chars%41 → A (A is unreserved)
5Normalize percent-encoding hex case%2f → %2F
6Sort / deduplicate query params?b=2&a=1&b=3 → ?a=1&b=2
7Strip fragment (SEO mode)/page?id=5#section → /page?id=5

Canonicalization Profiles

Three profiles let you apply the appropriate rule set for your use case:

  • RFC 3986 (Standard) — The internet standard. Applies all seven rules above. Best for developers building APIs, web scrapers, or caches that compare URLs programmatically.
  • SEO (Search Engine) — Extends RFC 3986 by automatically sorting query parameters and stripping the fragment component (since search engines ignore fragments). Ideal for de-duplicating URLs in site audits.
  • Security / Pentest — Fully decodes all percent-encoding, including double-encoded sequences, and resolves path traversal segments. Use this to reveal the true resource path hidden behind obfuscated URLs, such as %2e%2e/admin/admin.

Relative URL Resolution

Supply a Base URL to resolve relative URLs into absolute canonical form. This mirrors how browsers handle relative links in HTML documents:

Base:     https://example.com/blog/2024/
Relative: ../../about
Result:   https://example.com/about

Batch Canonicalization

The Batch tab accepts up to 100 newline-separated URLs, processes each with the selected profile and settings, and presents a table of original → canonical pairs with a count of changes per URL. Export the entire results as a CSV file for use in spreadsheets, site-audit reports, or redirect-mapping workflows.

Query Parameter Handling

Query strings are a major source of URL duplication. The tool provides two dedicated controls:

  • Sort Query Params — Alphabetically sorts key-value pairs so?z=1&a=2 and ?a=2&z=1 produce the same canonical URL. Automatically enabled in SEO mode.
  • Remove Duplicates — Drops repeated keys, keeping only the first occurrence. Prevents URL explosion from analytics tags that repeat the same parameter.

Use Cases

  • SEO audits — Identify duplicate-content URLs before setting canonical tags or 301 redirects
  • API development — Normalize incoming request URLs before database lookups or cache-key generation
  • Security testing — Decode obfuscated URLs to detect path traversal, open redirects, and WAF bypass attempts
  • Web scraping — Deduplicate crawl queues by canonicalizing all discovered URLs before enqueuing
  • Link validation — Normalize user-submitted URLs in forms to prevent duplicates in your link database

How Percent-Encoding Normalization Works

RFC 3986 §2.3 defines a set of unreserved charactersA-Z a-z 0-9 - _ . ~ — that must never be percent-encoded. When a URL contains %41 (the encoded form of uppercase A), canonical form requires decoding it back. Conversely, characters outside the unreserved set that appear raw in positions where they are not allowed must be re-encoded. This bidirectional normalization ensures that two URLs that look different but represent the same resource produce an identical canonical string.

Frequently Asked Questions

Is the URL Canonicalizer free?

Yes, URL Canonicalizer is totally free :)

Can I use the URL Canonicalizer offline?

Yes, you can install the webapp as PWA.

Is it safe to use URL Canonicalizer?

Yes, any data related to URL Canonicalizer only stored in your browser (if storage required). You can simply clear browser cache to clear all the stored data. We do not store any data on server.

What is URL canonicalization and why does it matter?

URL canonicalization is the process of converting a URL into a single, standard form so that different representations of the same resource produce an identical string. It matters for SEO (preventing duplicate content), security (blocking path-traversal attacks), and API reliability (ensuring consistent endpoint comparisons).

How does the URL Canonicalizer tool work?

Paste a raw or malformed URL, optionally supply a base URL for relative links, choose a canonicalization profile (RFC 3986, SEO, or Security), then click Canonicalize. The tool parses the URL using the WHATWG URL API, applies the selected normalization rules, and instantly shows the canonical URL, a component breakdown, and a diff of every change applied.

What is the difference between RFC 3986, SEO, and Security profiles?

RFC 3986 applies the standard Internet normalization rules: lowercasing scheme and host, removing default ports, resolving dot segments, and normalizing percent-encoding. The SEO profile additionally sorts query parameters and strips tracking fragments. The Security (Pentest) profile fully decodes all percent-encoding and resolves path traversal sequences to reveal the true resource being accessed.

Does the tool handle relative URLs?

Yes. If you provide a Base URL, relative URLs such as ../../about are resolved against it to produce a fully qualified canonical URL. The base URL must itself be a valid absolute URL.

What normalizations are applied to percent-encoding?

Unreserved characters (A–Z, a–z, 0–9, hyphen, underscore, period, tilde) that have been unnecessarily percent-encoded are decoded. Reserved characters that must be encoded are re-encoded if missing. Hex digits in percent-sequences are normalized to uppercase per RFC 3986 §2.1.

Are there any limitations to URL canonicalization?

The tool processes URLs entirely in your browser using the WHATWG URL API. Non-standard schemes (e.g., custom app:// URIs) receive limited normalization with a warning. Internationalized domain names (IDN) are converted to Punycode only when the IDN toggle is enabled. Batch input is capped at 100 URLs to keep the browser responsive.