How would you design a URL Shortener (e.g., Bit.ly)?

Designing a URL Shortener (like Bit.ly) requires considerations for scalability, performance, and reliability. Let's break it down step by step.

1. Requirements

Functional Requirements :

* Shorten a long URL and generate a unique short URL
* Redirect users when they visit the short URL
* Track analytics (clicks, location, browser, etc.)
* Support custom short URLs

Non-Functional Requirements :

* High availability and low latency
* Scalability to handle millions of requests
* Security to prevent abuse (e.g., spamming, phishing)

2. High-Level Design

a) API Endpoints :

Endpoint	Functionality
`POST /shorten`	Takes a long URL and returns a short URL
`GET /{shortUrl}`	Redirects to the original URL
`GET /stats/{shortUrl}`	Retrieves analytics for the short URL

b) Database Design :

Two main options :

SQL (MySQL, PostgreSQL): Good for ACID compliance and analytics
NoSQL (Cassandra, DynamoDB, Redis): Good for high read/write throughput

A simple SQL table might look like:

short_id (PK)	long_url	created_at	expiration_date	click_count
abc123	https://example.com/some-long-url	2025-02-06	NULL	1200

Indexes should be added on short_id for fast lookups.

3. URL Shortening Strategy

a) Hashing Approach (Collision-Free) :

MD5/SHA-256 Hashing → Generates a hash of the long URL, but can be too long
Base62 Encoding (0-9, a-z, A-Z) → Shortens the hash (e.g., abc123)
Counter-Based (Auto-increment ID + Base62) → Guarantees uniqueness

Preferred Approach: Base62 Encoding

import string

CHARACTERS = string.ascii_letters + string.digits  # a-z, A-Z, 0-9
BASE = len(CHARACTERS)

def encode(num):
    """Encodes a number to Base62."""
    short_url = []
    while num > 0:
        short_url.append(CHARACTERS[num % BASE])
        num //= BASE
    return ''.join(reversed(short_url))

This approach avoids collisions and allows incremental IDs.

4. Redirection Mechanism

When a user visits a short URL:

Extract short_id from the request.
Query the database to get long_url.
Perform a 301 Redirect to long_url.

Example Nginx configuration:

location /s/ {
    rewrite ^/s/(.*)$ /redirect.php?short_id=$1 last;
}

5. Scaling the System

a) Read Optimization :

Use Caching (Redis, Memcached) to store short-to-long URL mappings.
Store frequently accessed URLs in a Content Delivery Network (CDN).

b) Write Optimization :

Use a distributed database (Cassandra, DynamoDB) for high write throughput.
Implement asynchronous processing using a message queue (Kafka, RabbitMQ).

c) Load Balancing :

Deploy multiple API servers behind a load balancer (NGINX, AWS ALB).
Use Rate Limiting to prevent abuse.

6. Security Considerations

Preventing Abuse :

Rate limit API calls to prevent spam.
Block malicious URLs using a blacklist.
Use CAPTCHA for anonymous users.

Data Protection :

Encrypt stored URLs to protect user privacy.
Secure API with OAuth & JWT authentication.

7. Tech Stack

Component	Technology
Backend	Python (Flask, FastAPI) / Node.js (Express)
Database	PostgreSQL / DynamoDB / Redis
Caching	Redis / Memcached
Load Balancer	Nginx / AWS ALB
Message Queue	Kafka / RabbitMQ
CDN	Cloudflare / AWS CloudFront