Run-length encoding (RLE) is one of the simplest forms of data compression. It works by replacing sequences of repeated data with a count and the data value. This technique is particularly effective when data contains many consecutive repeated elements.
The basic idea is to scan through data sequentially and count how many times each element appears consecutively. Instead of storing 'aaaa', we store 'a4' - the character followed by its count. This can dramatically reduce storage space when there are long runs of repeated data.
RLE is commonly used in image compression (especially for simple graphics with large areas of solid color), fax transmission, and as a preprocessing step in more complex compression algorithms. It's also used in some database storage engines to compress columns with many repeated values.
The algorithm follows a simple pattern: track the current character and its count, then when you encounter a different character, output the previous character and its count before starting to track the new character. The tricky part is remembering to handle the final group after your loop ends.
One important consideration is that compression isn't always beneficial. If your data has mostly unique elements, RLE can actually make it larger! For example, 'abcd' would become 'a1b1c1d1e1' - twice as long. Good compression algorithms check if the compressed result is actually smaller before using it.
RLE teaches us about the trade-offs in algorithm design: simplicity versus efficiency, and the importance of measuring results rather than assuming an optimization actually optimizes.
1def count_consecutive_chars(text):
2 """Count consecutive characters in a string"""
3 if not text:
4 return []
5
6 result = []
7 current_char = text[0]
8 count = 1
9
10 for char in text[1:]:
11 if char == current_char:
12 count += 1
13 else:
14 result.append((current_char, count))
15 current_char = char
16 count = 1
17
18 result.append((current_char, count)) # Don't forget the last group!
19 return result
20
21# Example usage
22print(count_consecutive_chars("aaabbcc")) # [('a', 3), ('b', 2), ('c', 2)]