🔗 URL Canonicalizer – Normalize Any URL to Its Standard Form
A URL Canonicalizer converts any raw, messy, or malformed URL into its single, authoritative canonical form. Whether you are an SEO professional tracking duplicate pages, a developer normalizing endpoints before storage, or a security researcher analyzing URL-based injection attacks, having a consistent URL representation is essential.
What is URL Canonicalization?
Canonicalization is the process of resolving all equivalent representations of a URL into one agreed-upon form. The same web resource can be accessed via dozens of technically distinct strings — for example:
HTTP://Example.COM:80/a/b/../c?foo=bar
http://example.com/a/c?foo=bar ← canonical formWithout canonicalization, search engines may index the same page multiple times, analytics tools report split traffic, and API comparisons fail silently. RFC 3986 defines the authoritative set of normalization rules that all conformant URL parsers must follow.
Normalization Rules Applied
The tool applies up to seven independent normalization steps, each tracked individually:
| Step | Rule | Example |
|---|---|---|
| 1 | Lowercase scheme & host | HTTP://EXAMPLE.COM → http://example.com |
| 2 | Remove default ports | http://host:80/ → http://host/ |
| 3 | Resolve dot segments | /a/b/../c → /a/c |
| 4 | Decode unreserved chars | %41 → A (A is unreserved) |
| 5 | Normalize percent-encoding hex case | %2f → %2F |
| 6 | Sort / deduplicate query params | ?b=2&a=1&b=3 → ?a=1&b=2 |
| 7 | Strip fragment (SEO mode) | /page?id=5#section → /page?id=5 |
Canonicalization Profiles
Three profiles let you apply the appropriate rule set for your use case:
- RFC 3986 (Standard) — The internet standard. Applies all seven rules above. Best for developers building APIs, web scrapers, or caches that compare URLs programmatically.
- SEO (Search Engine) — Extends RFC 3986 by automatically sorting query parameters and stripping the fragment component (since search engines ignore fragments). Ideal for de-duplicating URLs in site audits.
- Security / Pentest — Fully decodes all percent-encoding, including double-encoded sequences, and resolves path traversal segments. Use this to reveal the true resource path hidden behind obfuscated URLs, such as
%2e%2e/admin→/admin.
Relative URL Resolution
Supply a Base URL to resolve relative URLs into absolute canonical form. This mirrors how browsers handle relative links in HTML documents:
Base: https://example.com/blog/2024/
Relative: ../../about
Result: https://example.com/aboutBatch Canonicalization
The Batch tab accepts up to 100 newline-separated URLs, processes each with the selected profile and settings, and presents a table of original → canonical pairs with a count of changes per URL. Export the entire results as a CSV file for use in spreadsheets, site-audit reports, or redirect-mapping workflows.
Query Parameter Handling
Query strings are a major source of URL duplication. The tool provides two dedicated controls:
- Sort Query Params — Alphabetically sorts key-value pairs so
?z=1&a=2and?a=2&z=1produce the same canonical URL. Automatically enabled in SEO mode. - Remove Duplicates — Drops repeated keys, keeping only the first occurrence. Prevents URL explosion from analytics tags that repeat the same parameter.
Use Cases
- SEO audits — Identify duplicate-content URLs before setting canonical tags or 301 redirects
- API development — Normalize incoming request URLs before database lookups or cache-key generation
- Security testing — Decode obfuscated URLs to detect path traversal, open redirects, and WAF bypass attempts
- Web scraping — Deduplicate crawl queues by canonicalizing all discovered URLs before enqueuing
- Link validation — Normalize user-submitted URLs in forms to prevent duplicates in your link database
How Percent-Encoding Normalization Works
RFC 3986 §2.3 defines a set of unreserved characters — A-Z a-z 0-9 - _ . ~ — that must never be percent-encoded. When a URL contains %41 (the encoded form of uppercase A), canonical form requires decoding it back. Conversely, characters outside the unreserved set that appear raw in positions where they are not allowed must be re-encoded. This bidirectional normalization ensures that two URLs that look different but represent the same resource produce an identical canonical string.