Lossy and Lossless Compression
Compression reduces file sizes to enable faster data transfer over the internet, lower bandwidth consumption, and efficient storage.
There are two main methods:
| Method | What happens | Benefits | Drawbacks |
|---|
| Lossy | Some data is permanently discarded | Greatly reduced file sizes; suitable for media streaming where some quality loss is acceptable | Irreversible loss of data quality; not suitable for text or data requiring exact integrity |
| Lossless | All original data is preserved; nothing is lost | Maintains original data integrity; best for text and data that must be exact | Larger file sizes than lossy; requires high bandwidth when streaming |
Recommending a Compression Type
• Choose
lossless when data integrity is imperative — e.g. medical images, text documents, archival storage, brand logos
• Choose
lossy when small file size or fast transfer is the priority and minor quality loss is acceptable — e.g. social media images, video streaming, music streaming
Exam tip: The right choice always depends on the specific scenario requirements.
Worked example — brand logo vs social media image [4 marks]:•
High-resolution logo → Lossless compression: preserves pixel-perfect quality essential for professional branding, printing, and high-definition media
•
Social media campaign image → Lossy compression: significantly reduces file size for faster uploading and loading; the minor quality loss is acceptable given the fast-paced, temporary nature of social media
Run Length Encoding and Dictionary Coding
Run Length Encoding (RLE)
RLE condenses consecutive identical elements into a single value with a count.
• Text example:
AAAABBBCCDAA →
4A3B2C1D2A (four A's, three B's, two C's, one D, two A's)
• Image example: a row with 5 red pixels then 3 blue pixels →
5R3B• Used in
bitmap images to compress sequences of the same colour
• Most effective when data contains
long runs of the same valueWorked example — decompress 3C3M4C:Answer:
CCCMMMCCCCWorked example — compress CCCCOLLLCCCCCMOCCCCC:Answer:
4C1O3L5C1M1O5CRLE pseudocode — find longest consecutive run of 'C' in a string:Function longest(s: String) -> Integer:
max_count = 0
current_count = 0
For each char in s:
If char equals 'C':
Increment current_count by 1
If current_count > max_count:
Set max_count to current_count
Else:
Set current_count to 0
Return max_count
Dictionary Coding
Dictionary coding replaces recurring sequences or words with shorter unique codes, using a lookup dictionary.
1. Scan the data for recurring sequences
2. Build a dictionary mapping each sequence to a short code
3. Replace all occurrences of the sequences with their codes
4. Store the dictionary alongside the compressed data (needed for decompression)
Example:Original: *"QuickSort is faster than BubbleSort but MergeSort is more stable than QuickSort and BubbleSort."* (95 characters)
Dictionary:
QuickSort → Q,
BubbleSort → B,
MergeSort → MCompressed: *"Q is faster than B but M is more stable than Q and B."* (53 characters)
• More versatile than RLE — works on any repeated sequence, not just consecutive identical characters
• May require more computational resources than RLE
Exam tip:• Use
RLE when data has lots of repetition of consecutive identical values (e.g. image rows, repeated characters)
• Use
dictionary coding when data contains frequently repeated words or sequences that may not be consecutive
Symmetric and Asymmetric Encryption
Encryption converts readable data (plaintext) into an unreadable format (ciphertext) to protect it from unauthorised access. Encryption uses
keys — specialised algorithms to scramble and unscramble data.
Symmetric Encryption
• A
single shared key is used to both encrypt and decrypt the data
• The sender encrypts using the key; the receiver decrypts using the
same key•
Faster — efficient for encrypting large amounts of data
•
Key distribution problem — the key must be securely shared between sender and receiver; if intercepted, all messages are compromised
• Suitable for: encrypting large files, databases, backups where the same person/system encrypts and decrypts
Asymmetric Encryption
• Uses a
pair of mathematically linked keys: a
public key (for encryption) and a
private key (for decryption)
• The receiver generates both keys; their
public key is shared openly• Anyone can encrypt data with the public key, but
only the private key (kept secret by the receiver) can decrypt it
•
Slower than symmetric encryption
•
Solves the key distribution problem — sharing the public key does not compromise security
• Suitable for: passwords, bank details, secure email, government communications
Choosing an Encryption Type
| Encryption Type | Suitable For | Reason |
|---|
| Symmetric | Large files, databases, bulk data | Fast and efficient; ideal when the same person/system encrypts and decrypts |
| Asymmetric | Confidential communications, passwords, banking | Highly secure key exchange; solves the key distribution problem |
Worked example — 9-mark discussion on encryption and society:Encryption protects people from data misuse in today's digital society, underpinning e-commerce, secure communications, and personal data protection.
Asymmetric encryption uses a public key for encryption and a private key for decryption. Because the public key can be shared openly without risk, it solves the critical key distribution problem that symmetric encryption faces. It forms the backbone of secure online transactions and enables digital signatures.
Symmetric encryption uses a single key for both operations. It is faster and more efficient for handling large volumes of data, but requires a secure method of sharing the key — if intercepted, all encrypted data is at risk. Asymmetric encryption is slower but eliminates this vulnerability.
Symmetric encryption is best for encrypting large files or databases within a closed, secure environment. Asymmetric encryption is preferred for high-security scenarios such as online banking or secure email where key distribution cannot be assumed to be safe.
Both forms of encryption have societal trade-offs: they enable legitimate privacy and secure commerce, but also allow nefarious actors to communicate securely. The choice between them should weigh speed requirements against security needs.
Hashing
Hashing converts any input data into a
fixed-size string of characters called a
digest using a hashing algorithm.
Key properties:
• The
same input always produces the same hash (consistency)
• Even a
tiny change in input produces a radically different hash (sensitivity)
• Hashing is
one-way — it cannot be reversed to the original data
Common Hashing Algorithms
| Algorithm | Notes |
|---|
| MD5 | Widely used but now considered weak; vulnerable to collision attacks |
| SHA-1 | Previously used in SSL certificates; now considered weak |
| SHA-256 | Part of the SHA-2 family; commonly used for cryptographic applications and data integrity; considered secure for most purposes |
| SHA-3 | The most recent SHA family member; designed for higher security levels |
Encryption vs Hashing
| Feature | Encryption | Hashing |
|---|
| Purpose | Secure data for transmission or storage | Data verification and quick retrieval |
| Reversibility | Can be decrypted to the original data | Cannot be reversed — one-way function |
| Keys | Uses keys for encryption and decryption | No keys involved |
| Speed | Generally slower for strong methods | Generally faster |
| Use cases | Secure communications, file storage | Password storage, data integrity checks |
| Output length | Variable — may be same as or longer than input | Always fixed length |
| Change in output | Small input change → significantly different output | Small input change → significantly different output |
Hashing for Password Storage
1. When a user registers, their password is
hashed using a hashing algorithm
2. The
hash digest is stored in the database — never the plaintext password
3. When the user logs in, the entered password is
hashed again4. The new hash is
compared against the stored hash — if they match, access is granted
5. If the database is breached, attackers cannot recover original passwords from the digests
Hashing for Efficient Data Retrieval
A
hash table uses hash digests as index values for fast data lookup:
• Hash digests have a
fixed length, making comparisons computationally faster than comparing variable-length strings (e.g. email addresses)
• A good hash function distributes data uniformly across the table, enabling efficient retrieval
• Example: a Users table indexed by hash digest allows the system to locate any user's record faster than searching by email address
Hashing for Data Integrity
When data is transferred over a network, it can be corrupted or interfered with. Hashing allows verification:
• Hash the data before sending; hash it again after receiving
• If both hashes match, the data arrived intact
• Comparing two fixed-size hashes is computationally less intensive than comparing entire files
Worked example — hash value for www.foo.co.uk (steps: strip to domain, uppercase, sum ASCII):1. Remove up to first dot →
foo.co.uk2. Remove from next dot →
foo3. Uppercase →
FOO4. Sum ASCII values: F(70) + O(79) + O(79) =
228Worked example — write the hash function in pseudocode [5 marks]:function hash(siteName)
firstDot = locate(siteName, ".")
remainingString = siteName[firstDot+1:]
secondDot = locate(remainingString, ".")
requiredString = upper(remainingString[:secondDot])
sum = 0
for char in requiredString
sum += asc(char)
endfor
return sum
endfunction