Hash maps (also called dictionaries in Python) are one of the most powerful data structures for building fast lookups. They store key-value pairs and allow you to retrieve any value in constant time O(1) given its key. This makes them perfect for URL shorteners, caches, and any system that needs quick data retrieval.
Base encoding systems convert numbers from one base to another. You're familiar with base 10 (decimal: 0-9) and probably base 2 (binary: 0-1). Base62 uses 62 characters: digits 0-9, lowercase letters a-z, and uppercase letters A-Z. This gives us a compact way to represent large numbers as short strings.
Why base62 for URL shorteners? It creates human-readable codes that are URL-safe (no special characters that need escaping). A 6-character base62 string can represent over 56 billion unique values (62^6), which is plenty for most applications.
The key insight is combining these concepts: use an incrementing counter to ensure uniqueness, convert each counter value to base62 for compact representation, and use hash maps for instant bidirectional lookups between original URLs and short codes.
Real URL shorteners like bit.ly use similar approaches, though they add features like custom domains, analytics, and expiration dates. The core principle remains the same: unique IDs + base encoding + fast lookups.
1# Simple base62 conversion example
2def to_base62(num):
3 chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
4 if num == 0:
5 return "0"
6
7 result = []
8 while num > 0:
9 result.append(chars[num % 62])
10 num //= 62
11
12 return ''.join(reversed(result))
13
14# Demo the conversion
15print(to_base62(123)) # "1Z"
16print(to_base62(3843)) # "ZZ"
17print(to_base62(56800235584)) # "100000" (6 chars)