← Dashboard 1.3.1 Compression, encryption and hashing A-Level Computer Science

Lossy and Lossless Compression


Compression reduces file sizes to enable faster data transfer over the internet, lower bandwidth consumption, and efficient storage.

There are two main methods:

MethodWhat happensBenefitsDrawbacks
LossySome data is permanently discardedGreatly reduced file sizes; suitable for media streaming where some quality loss is acceptableIrreversible loss of data quality; not suitable for text or data requiring exact integrity
LosslessAll original data is preserved; nothing is lostMaintains original data integrity; best for text and data that must be exactLarger file sizes than lossy; requires high bandwidth when streaming

Recommending a Compression Type

• Choose lossless when data integrity is imperative — e.g. medical images, text documents, archival storage, brand logos
• Choose lossy when small file size or fast transfer is the priority and minor quality loss is acceptable — e.g. social media images, video streaming, music streaming

Exam tip: The right choice always depends on the specific scenario requirements.

Worked example — brand logo vs social media image [4 marks]:
High-resolution logo → Lossless compression: preserves pixel-perfect quality essential for professional branding, printing, and high-definition media
Social media campaign image → Lossy compression: significantly reduces file size for faster uploading and loading; the minor quality loss is acceptable given the fast-paced, temporary nature of social media

Run Length Encoding and Dictionary Coding

Run Length Encoding (RLE)

RLE condenses consecutive identical elements into a single value with a count.

• Text example: AAAABBBCCDAA4A3B2C1D2A (four A's, three B's, two C's, one D, two A's)
• Image example: a row with 5 red pixels then 3 blue pixels → 5R3B
• Used in bitmap images to compress sequences of the same colour
• Most effective when data contains long runs of the same value

Worked example — decompress 3C3M4C:
Answer: CCCMMMCCCC

Worked example — compress CCCCOLLLCCCCCMOCCCCC:
Answer: 4C1O3L5C1M1O5C

RLE pseudocode — find longest consecutive run of 'C' in a string:
Function longest(s: String) -> Integer:
  max_count = 0
  current_count = 0
  For each char in s:
    If char equals 'C':
      Increment current_count by 1
      If current_count > max_count:
        Set max_count to current_count
    Else:
      Set current_count to 0
  Return max_count

Dictionary Coding

Dictionary coding replaces recurring sequences or words with shorter unique codes, using a lookup dictionary.

1. Scan the data for recurring sequences
2. Build a dictionary mapping each sequence to a short code
3. Replace all occurrences of the sequences with their codes
4. Store the dictionary alongside the compressed data (needed for decompression)

Example:
Original: *"QuickSort is faster than BubbleSort but MergeSort is more stable than QuickSort and BubbleSort."* (95 characters)

Dictionary: QuickSort → Q, BubbleSort → B, MergeSort → M

Compressed: *"Q is faster than B but M is more stable than Q and B."* (53 characters)

• More versatile than RLE — works on any repeated sequence, not just consecutive identical characters
• May require more computational resources than RLE

Exam tip:
• Use RLE when data has lots of repetition of consecutive identical values (e.g. image rows, repeated characters)
• Use dictionary coding when data contains frequently repeated words or sequences that may not be consecutive

Symmetric and Asymmetric Encryption


Encryption converts readable data (plaintext) into an unreadable format (ciphertext) to protect it from unauthorised access. Encryption uses keys — specialised algorithms to scramble and unscramble data.
Symmetric Encryption

• A single shared key is used to both encrypt and decrypt the data
• The sender encrypts using the key; the receiver decrypts using the same key
Faster — efficient for encrypting large amounts of data
Key distribution problem — the key must be securely shared between sender and receiver; if intercepted, all messages are compromised
• Suitable for: encrypting large files, databases, backups where the same person/system encrypts and decrypts
Asymmetric Encryption

• Uses a pair of mathematically linked keys: a public key (for encryption) and a private key (for decryption)
• The receiver generates both keys; their public key is shared openly
• Anyone can encrypt data with the public key, but only the private key (kept secret by the receiver) can decrypt it
Slower than symmetric encryption
Solves the key distribution problem — sharing the public key does not compromise security
• Suitable for: passwords, bank details, secure email, government communications
Choosing an Encryption Type

Encryption TypeSuitable ForReason
SymmetricLarge files, databases, bulk dataFast and efficient; ideal when the same person/system encrypts and decrypts
AsymmetricConfidential communications, passwords, bankingHighly secure key exchange; solves the key distribution problem


Worked example — 9-mark discussion on encryption and society:

Encryption protects people from data misuse in today's digital society, underpinning e-commerce, secure communications, and personal data protection.

Asymmetric encryption uses a public key for encryption and a private key for decryption. Because the public key can be shared openly without risk, it solves the critical key distribution problem that symmetric encryption faces. It forms the backbone of secure online transactions and enables digital signatures.

Symmetric encryption uses a single key for both operations. It is faster and more efficient for handling large volumes of data, but requires a secure method of sharing the key — if intercepted, all encrypted data is at risk. Asymmetric encryption is slower but eliminates this vulnerability.

Symmetric encryption is best for encrypting large files or databases within a closed, secure environment. Asymmetric encryption is preferred for high-security scenarios such as online banking or secure email where key distribution cannot be assumed to be safe.

Both forms of encryption have societal trade-offs: they enable legitimate privacy and secure commerce, but also allow nefarious actors to communicate securely. The choice between them should weigh speed requirements against security needs.

Hashing


Hashing converts any input data into a fixed-size string of characters called a digest using a hashing algorithm.

Key properties:
• The same input always produces the same hash (consistency)
• Even a tiny change in input produces a radically different hash (sensitivity)
• Hashing is one-way — it cannot be reversed to the original data
Common Hashing Algorithms

AlgorithmNotes
MD5Widely used but now considered weak; vulnerable to collision attacks
SHA-1Previously used in SSL certificates; now considered weak
SHA-256Part of the SHA-2 family; commonly used for cryptographic applications and data integrity; considered secure for most purposes
SHA-3The most recent SHA family member; designed for higher security levels

Encryption vs Hashing

FeatureEncryptionHashing
PurposeSecure data for transmission or storageData verification and quick retrieval
ReversibilityCan be decrypted to the original dataCannot be reversed — one-way function
KeysUses keys for encryption and decryptionNo keys involved
SpeedGenerally slower for strong methodsGenerally faster
Use casesSecure communications, file storagePassword storage, data integrity checks
Output lengthVariable — may be same as or longer than inputAlways fixed length
Change in outputSmall input change → significantly different outputSmall input change → significantly different output

Hashing for Password Storage

1. When a user registers, their password is hashed using a hashing algorithm
2. The hash digest is stored in the database — never the plaintext password
3. When the user logs in, the entered password is hashed again
4. The new hash is compared against the stored hash — if they match, access is granted
5. If the database is breached, attackers cannot recover original passwords from the digests
Hashing for Efficient Data Retrieval

A hash table uses hash digests as index values for fast data lookup:
• Hash digests have a fixed length, making comparisons computationally faster than comparing variable-length strings (e.g. email addresses)
• A good hash function distributes data uniformly across the table, enabling efficient retrieval
• Example: a Users table indexed by hash digest allows the system to locate any user's record faster than searching by email address
Hashing for Data Integrity

When data is transferred over a network, it can be corrupted or interfered with. Hashing allows verification:
• Hash the data before sending; hash it again after receiving
• If both hashes match, the data arrived intact
• Comparing two fixed-size hashes is computationally less intensive than comparing entire files

Worked example — hash value for www.foo.co.uk (steps: strip to domain, uppercase, sum ASCII):
1. Remove up to first dot → foo.co.uk
2. Remove from next dot → foo
3. Uppercase → FOO
4. Sum ASCII values: F(70) + O(79) + O(79) = 228

Worked example — write the hash function in pseudocode [5 marks]:
function hash(siteName)
  firstDot = locate(siteName, ".")
  remainingString = siteName[firstDot+1:]

  secondDot = locate(remainingString, ".")
  requiredString = upper(remainingString[:secondDot])

  sum = 0
  for char in requiredString
    sum += asc(char)
  endfor

  return sum
endfunction