Articles

RAID doesn’t work!

Now, that we have the clickbaity title out of the way, let’s talk about data integrity. Specifically, disk data integrity on Linux.

RAID, or as it is less well known, Redundant Array of Independent Disks is a way to make the stored data more resilient against disk failure. Unfortunately it does not work against silent data corruption, which in studies from CERN were present in the 10-7 range, other studies have also shown non-negligible rates of data corruption.

You may say, OK, I understand fixing it doesn’t work for RAID 1 with just two copies, or with RAID 5, as you don’t know which data is correct – as any one of them can be – surely the system is more clever if it has RAID 6 or 3 drives in RAID! Again, unfortunately it isn’t so. Careful reading of the md(4) man page will reveal this fragment:

If check was used, then no action is taken to handle the mismatch, it is simply recorded. If repair was used, then a mismatch will be repaired in the same way that resync repairs arrays. For RAID5/RAID6 new parity blocks are written. For RAID1/RAID10, all but one block are overwritten with the content of that one block.

In other words, the RAID depends on the disks telling the truth: if it can’t read data, it needs to return I/O error, not return garbage. But as we’ve established, this isn’t the way disks behave.

Now you may say, but I use disk encryption! Surely encryption will detect this data modification and prevent use of damaged/changed data! Again, this is not the property of either AES in XTS mode or in CBC mode – the standard modes of encryption for disk drives – those are so-called malleable encryption modes. There is no way to detect ciphertext modification for them in general case.

This was one of the main reasons behind Btrfs and ZFS; checksumming all data and metadata so that detection of such incorrect blocks could be possible (so that at the very least this corruption is detected and doesn’t reach the application) and with addition of the in-build RAID levels, also corrected.

What if you don’t want to (or likely can’t, in case of ZFS) use either of them? Until recently, there was not much you could do. Introduction of the dm-integrity target has changed that though.

Using dm-integrity in LUKS

dm-integrity target is best integrated with LUKS disk encryption.

To enable it, the device needs to be formatted as a LUKS2 device, and integrity mechanism needs to be specified:

cryptsetup luksFormat --type luks2 --integrity hmac-sha256 \
--sector-size 4096 /dev/example/ciphertext

(Other options include --integrity hmac-sha512 and --cipher chacha20-random --integrity poly1305. Smaller tags will be discussed below)

which then can be opened as a regular LUKS device:

cryptsetup open --type luks /dev/example/ciphertext plaintext

This will create a /dev/mapper/plaintext device that is encrypted and integrity protected.
And the /dev/mapper/plaintext_dif that provides storage for authentication tags.

Note that the integrity device will report (none) as the integrity mechanism:

integritysetup status plaintext_dif
/dev/mapper/plaintext_dif is active and is in use.
  type:    INTEGRITY
  tag size: 32
  integrity: (none)
  device:  /dev/mapper/example-ciphertext
  sector size:  4096 bytes
  interleave sectors: 32768
  size:    2056456 sectors
  mode:    read/write
  journal size: 8380416 bytes
  journal watermark: 50%
  journal commit time: 10000 ms

This is expected, as LUKS passes the encryption tags from a higher level and dm-integrity is only used to store them. This can be verified with cryptsetup:

cryptsetup status /dev/mapper/plaintext
/dev/mapper/plaintext is active.
  type:    LUKS2
  cipher:  aes-xts-plain64
  keysize: 512 bits
  key location: keyring
  integrity: hmac(sha256)
  integrity keysize: 256 bits
  device:  /dev/mapper/example-ciphertext
  sector size:  4096
  offset:  0 sectors
  size:    2056456 sectors
  mode:    read/write

The device can be removed (while preserving data, but making it inaccessible without providing password again) using cryptsetup:

cryptsetup close /dev/mapper/plaintext

Testing

To test if the verification works correctly, first let’s verify that the whole device is readable:

dd if=/dev/mapper/plaintext of=/dev/null bs=$((4096*256)) \
status=progress
988807168 bytes (989 MB, 943 MiB) copied, 6 s, 165 MB/s
1004+1 records in
1004+1 records out
1052905472 bytes (1.1 GB, 1004 MiB) copied, 6.28939 s, 167 MB/s

Now, let’s close the device and check if the block looks random, and overwrite it:

cryptsetup close /dev/mapper/plaintext
dd if=/dev/example/ciphertext bs=4096 skip=$((512*1024*1024/4096)) \
count=1 status=none | hexdump -C | head
00000000  70 a1 1d f7 da ae 04 d2  d5 f1 ed 6e ba 96 81 7a  |p..........n...z|
00000010  90 c9 7c e7 01 95 2b 12  ed fc 46 fb 0c d7 24 dd  |..|...+...F...$.|
00000020  48 a2 17 7a 17 9f 26 d8  ef ca 97 74 6e 56 2b 55  |H..z..&....tnV+U|
00000030  59 60 6c 72 e1 5d 14 b3  00 f9 70 e8 f3 31 5e 6f  |Y`lr.]....p..1^o|
00000040  c7 98 c8 e0 e0 f6 52 d3  36 07 34 93 59 42 98 12  |......R.6.4.YB..|
00000050  a8 44 f4 fa 13 94 d6 30  5d 88 ee 79 4c 99 7a a8  |.D.....0]..yL.z.|
00000060  cd 35 87 52 07 66 74 68  9e 61 2e 26 c3 74 67 91  |.5.R.fth.a.&.tg.|
00000070  33 57 21 61 44 b4 2e 31  a6 61 90 3f 04 d9 5e f3  |3W!aD..1.a.?..^.|
00000080  46 dc 2c c5 cb 50 1a b4  3a b5 4d 4d ee d3 0f fd  |F.,..P..:.MM....|
00000090  be 6c 5f 3a b6 f9 b3 f3  21 ac 6b cf dd f0 2e 3b  |.l_:....!.k....;|

Yep, looks random (and will look different for every newly formatted LUKS volume).

dd if=/dev/zero of=/dev/example/ciphertext bs=4096 \
seek=$((512*1024*1024/4096)) count=1
dd if=/dev/example/ciphertext bs=4096 skip=$((512*1024*1024/4096)) \
count=1 status=none | hexdump -C | head
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

Not any more.

What happens when we try to read it now?

cryptsetup open --type luks /dev/example/ciphertext plaintext
dd if=/dev/mapper/plaintext of=/dev/null bs=$((4096*256)) \
status=progress
464519168 bytes (465 MB, 443 MiB) copied, 2 s, 232 MB/s
dd: error reading '/dev/mapper/example': Input/output error
496+1 records in
496+1 records out
520097792 bytes (520 MB, 496 MiB) copied, 2.43931 s, 213 MB/s

Exactly as expected, after reading about 0.5GiB of data, we get an I/O error.
(Re-writing the sector will cause checksum recalculation and will clear the error.)

Repeating the experiment without --integrity is left as an exercise for the reader

Standalone dm-integrity setup

By default, integritysetup will use crc32 which is quite fast and small (requiring just 4 bytes per block). This gives probability of random corruption not being detected of about 2^{-32} (as the value of the CRC and the data will be independently selected). Please remember that this is on top of the silent corruption of the hard drive; i.e. if the HDD has a probability of returning malformed data of 10^{-7} then probability of malformed data reaching upper layer is 10^-7 \cdot 2^-32 \approx 2^{-55} \approx 10^{-16}. There is a hidden assumption in this though – that the malformed data returned by the disk has uniform distribution, I don’t know if that is typical and was unable to find more information on this topic. If it is not uniformly distributed, the failure rate for crc32 may be higher. More research is necessary in this area.

In case that probability is unsatisfactory, it’s possible to use any of the hashes supported by the kernel and listed in the /proc/crypto system file, sha1, sha256, hmac-sha1 and hmac-sha256 being the more interesting ones.

Example configuration would look like this:

integritysetup format --progress-frequency 5 --integrity sha1 \
--tag-size 20 --sector-size 4096 /dev/example/raw-1
Formatted with tag size 20, internal integrity sha1.
Wiping device to initialize integrity checksum.
You can interrupt this by pressing CTRL+c (rest of not wiped device will contain invalid checksum).
Progress:   7.5%, ETA 01:04,   76 MiB written, speed  14.5 MiB/s
Progress:  17.2%, ETA 00:49,  173 MiB written, speed  16.8 MiB/s
Progress:  25.5%, ETA 00:45,  257 MiB written, speed  16.6 MiB/s
Progress:  32.5%, ETA 00:42,  328 MiB written, speed  16.0 MiB/s
Progress:  42.1%, ETA 00:35,  424 MiB written, speed  16.5 MiB/s
Progress:  51.2%, ETA 00:29,  516 MiB written, speed  16.8 MiB/s
Progress:  58.5%, ETA 00:25,  590 MiB written, speed  16.4 MiB/s
Progress:  68.6%, ETA 00:18,  692 MiB written, speed  16.9 MiB/s
Progress:  77.3%, ETA 00:13,  779 MiB written, speed  17.0 MiB/s
Progress:  84.3%, ETA 00:09,  850 MiB written, speed  16.6 MiB/s
Progress:  93.9%, ETA 00:03,  947 MiB written, speed  16.9 MiB/s
Finished, time 00:59.485, 1008 MiB written, speed  16.9 MiB/s

A new device is created with the open subcommand:

integritysetup open --integrity-no-journal --integrity sha1 \
/dev/example/raw-1 integr-1
integritysetup status integr-1
/dev/mapper/integr-1 is active.
  type:    INTEGRITY
  tag size: 20
  integrity: sha1
  device:  /dev/mapper/example-raw--1
  sector size:  4096 bytes
  interleave sectors: 32768
  size:    2064688 sectors
  mode:    read/write
  journal: not active

(note that in line 4 we now have sha1 instead of (none))

If we want to cryptographically verify the integrity of the data, we will need to use an HMAC though.
Setting it up is fairly similar, but will require a key file (note: this key needs to remain secret for the algorithm to be cryptographically secure).

dd if=/dev/urandom of=hmac-key bs=1 count=20 status=none
integritysetup format --progress-frequency 5 --integrity hmac-sha1 \
--tag-size 20 --integrity-key-size 20 --integrity-key-file hmac-key \
--no-wipe --sector-size 4096 /dev/example/raw-1
integritysetup open --integrity-no-journal --integrity hmac-sha1 \
--integrity-key-size 20 --integrity-key-file hmac-key \
/dev/example/raw-1 integr-1
dd if=/dev/zero of=/dev/mapper/integr-1 bs=$((4096*32768))
integritysetup status integr-1
Formatted with tag size 20, internal integrity hmac-sha1.
dd: error writing '/dev/mapper/integr-1': No space left on device
8+0 records in
7+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 9.45404 s, 112 MB/s
/dev/mapper/integr-1 is active.
  type:    INTEGRITY
  tag size: 20
  integrity: hmac(sha1)
  device:  /dev/mapper/example-raw--1
  sector size:  4096 bytes
  interleave sectors: 32768
  size:    2064688 sectors
  mode:    read/write
  journal: not active

(this time I’ve used --no-wipe as dd from /dev/zero is much faster)

Combined mdadm and dm-integrity

Now that we know how to set up dm-integrity devices and that they work as advertised, let’s see if that indeed will prevent silent data corruption and provide automatic recovery with Linux RAID infrastructure.

For this setup I’ll be using files to make it easier to reproduce the results (integritysetup configures loop devices automatically).

RAID 5

Setup

First let’s initialise the dm-integrity targets:

SIZE=$((1024*1024*1024))
COUNT=6
for i in $(seq $COUNT); do
  truncate -s $SIZE "raw-$i"
  integritysetup format --integrity sha1 --tag-size 16 \
  --sector-size 4096 --no-wipe "./raw-$i"
  integritysetup open --integrity-no-journal --integrity \
  sha1 "./raw-$i" "integr-$i"
  dd if=/dev/zero "of=/dev/mapper/integr-$i" bs=$((4096*512)) || :
done
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-1': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 5.119 s, 207 MB/s
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-2': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 6.37586 s, 166 MB/s
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-3': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 6.18465 s, 171 MB/s
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-4': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 6.32175 s, 167 MB/s
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-5': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 5.94098 s, 178 MB/s
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-6': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 6.73871 s, 157 MB/s

Then set up the RAID-5 device and wait for initialisation.

mdadm --create /dev/md/robust -n$COUNT --level=5 \
$(seq --format "/dev/mapper/integr-%.0f" $COUNT)
mdadm --wait /dev/md/robust && echo ready
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/robust started.
ready

Single failure

To make sure that we do not work from cache, stop the array, clear cache, and then overwrite half of one of the drives:

mdadm --stop /dev/md/robust
integritysetup close integr-1
tr '\000' '\377' < /dev/zero | dd of=raw-1 bs=4096 \
seek=$((SIZE/4096/2)) count=$((SIZE/4096/2-256)) \
conv=notrunc status=progress
mdadm: stopped /dev/md/robust
415428608 bytes (415 MB, 396 MiB) copied, 1 s, 413 MB/s
131008+0 records in
131008+0 records out
536608768 bytes (537 MB, 512 MiB) copied, 1.49256 s, 360 MB/s

(As we're not encrypting data, the 00's actually are on the disk, so we need to write something else, 0xff in this case. Also, we’re skipping last 1MiB of data as that’s where RAID superblock lives and recovery of it is a completely different kettle of fish.)

Restart the array:

integritysetup open --integrity-no-journal --integrity \
sha1 "./raw-1" "integr-1"
mdadm --assemble /dev/md/robust $(seq --format \
"/dev/mapper/integr-%.0f" $COUNT)
mdadm: /dev/md/robust has been started with 6 drives.

Verify that all data is readable and that it has expected values (all zero):

hexdump -C /dev/md/robust
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
13ab00000

Additionally you can verify that the errors were reported to the MD layer.
First, run ls -l /dev/md/robust to get to know the number of the md, then use it in the following command instead of 127:

grep . /sys/block/md127/md/dev-*/errors
/sys/block/md127/md/dev-dm-14/errors:867024
/sys/block/md127/md/dev-dm-15/errors:0
/sys/block/md127/md/dev-dm-16/errors:0
/sys/block/md127/md/dev-dm-17/errors:0
/sys/block/md127/md/dev-dm-18/errors:0
/sys/block/md127/md/dev-dm-19/errors:0

Scrub the volume to fix all bad blocks:

mdadm --action=repair /dev/md/robust
mdadm --wait /dev/md/robust && echo ready
ready

And verify that all the incorrect blocks were rewritten/corrected:

dd if=/dev/mapper/integr-1 bs=4096 of=/dev/null status=progress
864800768 bytes (865 MB, 825 MiB) copied, 4 s, 216 MB/s
258086+0 records in
258086+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 4.68922 s, 225 MB/s

Double failure

Let’s see what happens if two drives return I/O error for the same area. Let’s introduce the errors:

mdadm --stop /dev/md/robust
integritysetup close integr-1
integritysetup close integr-2
tr '\000' '\377' < /dev/zero | dd of=raw-1 bs=4096 \
seek=$((SIZE/4096/2)) count=$((100*1025*1024/4096)) \
conv=notrunc status=none
tr '\000' '\377' < /dev/zero | dd of=raw-2 bs=4096 \
seek=$((SIZE/4096/2)) count=$((100*1025*1024/4096)) \
conv=notrunc status=none
mdadm: stopped /dev/md/robust

Then restart the array

integritysetup open --integrity-no-journal --integrity \
sha1 "./raw-1" "integr-1"
integritysetup open --integrity-no-journal --integrity \
sha1 "./raw-2" "integr-2"
mdadm --assemble /dev/md/robust $(seq --format \
"/dev/mapper/integr-%.0f" $COUNT)
mdadm: /dev/md/robust has been started with 6 drives.

Let’s verify:

hexdump -C /dev/md/robust
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
hexdump: /dev/md/robust: Input/output error
9c026000

As expected, we’ve received an I/O error. Additionally if we look at the status of the array, we’ll see that it has become degraded:

Personalities : [raid10] [raid6] [raid5] [raid4]
md127 : active raid5 dm-14[0](F) dm-19[6] dm-18[4] dm-17[3] dm-16[2] dm-15[1]
      5155840 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [_UUUUU]

This is most likely because the kernel will re-try to read the sectors, but if it receives a set number of read errors it cannot correct (see max_read_errors in md directory), it will kick the disk out of the array. Probably suboptimal given this setup – knowing the error came from software not hardware would mean that kicking the disk from the RAID won’t change anything.

Clean-up

mdadm --stop /dev/md/robust
for i in $(seq $COUNT); do
  integritysetup close "integr-$i"
done

RAID 6

Let’s now try with RAID 6, that is, with double redundancy.

set-up:

for i in $(seq $COUNT); do
  truncate -s $SIZE "raw-$i"
  integritysetup format --integrity sha1 --tag-size 16 \
  --sector-size 4096 --no-wipe "./raw-$i"
  integritysetup open --integrity-no-journal --integrity \
  sha1 "./raw-$i" "integr-$i"
  dd if=/dev/zero "of=/dev/mapper/integr-$i" bs=$((4096*512)) || :
done
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-1': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 4.8998 s, 216 MB/s
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-2': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 7.21848 s, 146 MB/s
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-3': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 7.85458 s, 135 MB/s
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-4': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 7.48937 s, 141 MB/s
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-5': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 7.30369 s, 145 MB/s
Formatted with tag size 16, internal integrity sha1.
dd: error writing '/dev/mapper/integr-6': No space left on device
505+0 records in
504+0 records out
1057120256 bytes (1.1 GB, 1008 MiB) copied, 7.05441 s, 150 MB/s

RAID initialization:

mdadm --create /dev/md/robust -n$COUNT --level=6 \
$(seq --format "/dev/mapper/integr-%.0f" $COUNT)
mdadm --wait /dev/md/robust && echo ready
ready

Single failure

mdadm --stop /dev/md/robust
integritysetup close integr-1
tr '\000' '\377' < /dev/zero | dd of=raw-1 bs=4096 \
seek=$((SIZE/4096/2)) count=$((SIZE/4096/2-256)) \
conv=notrunc status=progress
mdadm: stopped /dev/md/robust
415428608 bytes (415 MB, 396 MiB) copied, 1 s, 413 MB/s
131008+0 records in
131008+0 records out
536608768 bytes (537 MB, 512 MiB) copied, 1.49256 s, 360 MB/s

Restart the array:

integritysetup open --integrity-no-journal --integrity \
sha1 "./raw-1" "integr-1"
mdadm --assemble /dev/md/robust $(seq --format \
"/dev/mapper/integr-%.0f" $COUNT)
mdadm: /dev/md/robust has been started with 6 drives.

Verify that all data is readable and that it has expected values (all zero):

hexdump -C /dev/md/robust
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
fbc00000

Good. And let’s check the status of the array:

Personalities : [raid10] [raid6] [raid5] [raid4]
md127 : active raid6 dm-19[5] dm-18[4] dm-17[3] dm-16[2] dm-15[1] dm-14[0](F)
      4124672 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [_UUUUU]

Not good, looks like the raid6 target has a different way of handling I/O errors than the raid5 one and even if the failures are correctible, the array is degraded.

Double failure

Faults introduction:

mdadm --stop /dev/md/robust
integritysetup close integr-1
integritysetup close integr-2
tr '\000' '\377' < /dev/zero | dd of=raw-1 bs=4096 \
seek=$((SIZE/4096/2)) count=$((100*1025*1024/4096)) \
conv=notrunc status=none
tr '\000' '\377' < /dev/zero | dd of=raw-2 bs=4096 \
seek=$((SIZE/4096/2)) count=$((100*1025*1024/4096)) \
conv=notrunc status=none
mdadm: stopped /dev/md/robust

Restart:

integritysetup open --integrity-no-journal --integrity \
sha1 "./raw-1" "integr-1"
integritysetup open --integrity-no-journal --integrity \
sha1 "./raw-2" "integr-2"
mdadm --assemble /dev/md/robust $(seq --format \
"/dev/mapper/integr-%.0f" $COUNT)
mdadm: /dev/md/robust has been started with 6 drives.

Verify:

hexdump -C /dev/md/robust
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
fbc00000

so far so good

Personalities : [raid10] [raid6] [raid5] [raid4]
md127 : active raid6 dm-14[0](F) dm-19[5] dm-18[4] dm-17[3] dm-16[2] dm-15[1](F)
      4124672 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/4] [__UUUU]

not so good, both disks were marked as faulty in the array…

Clean-up

mdadm --stop /dev/md/robust
for i in $(seq $COUNT); do
  integritysetup close "integr-$i"
  rm "raw-$i"
done

Summary

While the functionality necessary to provide detection and correction of silent data corruption is available in the Linux kernel, the implementation likely will need few tweaks to not excerbate situations where the hardware is physically failing, not just returning garbage. Passing additional metadata about the I/O errors from the dm-integrity layer to the md layer could be a potential solution.

Also, this mechanism comes into play only when the hard drive technically already is failing, so at least you will know about the failure, and really, the failing disk is getting kicked out 🙂

Learn more

Data integrity protection with cryptsetup tools presentation by Milan Brož at FOSDEM.

Note: Above tests were performed using 4.16.6-1-ARCH Linux kernel, mdadm 4.0-1 and cryptsetup 2.0.2-1 from ArchLinux.

Advertisements

Safe primes in Diffie-Hellman

Diffie-Hellman key agreement protocol uses modular exponentiation and calls for use of special prime numbers. If you ever wondered why, I’ll try to explain.

Diffie-Hellman key agreement

The “classical” Diffie-Hellman key exchange also known as Finite Field Diffie-Hellman uses one type of operation — modular exponentiation — and two secrets for two communication peers to arrive at a single shared secret.

The protocol requires a prime number and a number that is a so-called “generator” number to be known to both peers. This is usually achieved by either the server sending those values to the client (e.g. in TLS before v1.3 or in some SSH key exchange types) or by distributing them ahead of time to the peers (e.g. in IPSec and TLS 1.3).

When both peers know which parameters to use, they generate a random number, perform modular exponentiation with this random number, group generator and prime and send the result to the other party as a “key share”. That other party takes this key share and uses it as the generator to perform modular exponentiation again. The result of that second operation is the agreed key share.

If we define g as the generator, p as the prime, S_r as the server selected random, S_k as the server key share, C_r as the client selected random and C_k as the client key share, and SK being the agreed upon secret, the server performs following operations:
g^{S_r} = S_k \mod p
C_k^{S_r} = SK \mod p

Client performs following operation:
g^{C_r} = C_k \mod p
S_k^{C_r} = SK \mod p

Because (g^{C_r})^{S_r} = (g^{S_r})^{C_r} = SK \mod p both parties agree on the same SK.

Unfortunately both peers need to operate on a value provided by the other party (not necessarily trusted or authenticated) and their secret value at the same time. This calls for the the prime number used to have some special properties.

Modular exponentiation

The basic operation we’ll be dealing with is modular exponentiation. The simple way to explain it is that we take a number, raise it to some power. Then we take that result and divide it by a third number. The remainder of that division is our result.

For 2^10 mod 12, the calculation will go as follows, first exponentiation:

2^10 = 1024

Then division:

1024 = 85*12 + 4

So the result is 4.

One of the interesting properties of modular exponentiation, is that it is cyclic. If we take a base number and start raising it to higher and higher powers, we will be getting the same numbers in the same order:

$ python powmod.py -g 3 -m 14 -e 28
3^0  mod 14 = 1
3^1  mod 14 = 3
3^2  mod 14 = 9
3^3  mod 14 = 13
3^4  mod 14 = 11
3^5  mod 14 = 5
3^6  mod 14 = 1
3^7  mod 14 = 3
3^8  mod 14 = 9
3^9  mod 14 = 13
3^10 mod 14 = 11
3^11 mod 14 = 5
3^12 mod 14 = 1
3^13 mod 14 = 3
3^14 mod 14 = 9
3^15 mod 14 = 13
3^16 mod 14 = 11
3^17 mod 14 = 5
3^18 mod 14 = 1
3^19 mod 14 = 3
3^20 mod 14 = 9
3^21 mod 14 = 13
3^22 mod 14 = 11
3^23 mod 14 = 5
3^24 mod 14 = 1
3^25 mod 14 = 3
3^26 mod 14 = 9
3^27 mod 14 = 13

This comes from the fact that in modulo arithmetic, for addition, subtraction, multiplication and exponentiation, the order in which the modulo operations are made does not matter; a + b mod c is equal to (a mod c + b mod c) mod c. Thus if we try to calculate the example for 3^17 mod 14 we can write it down as ((3^6 mod 14) * (3^6 mod 14) * (3^5 mod 14)) mod 14. Then the calculation is reduced to 1 * 1 * 3^5 mod 14.

The inverse of modular exponentiation is discrete logarithm, in which for a given base and modulus, we look for exponent that will result in given number:

g^x mod m = n

Where g, m and n are given, we’re looking for x.

Because there are no fast algorithms for calculating discrete logarithm, is one of the reasons we can use modulo exponentiation as the base of Diffie-Hellman algorithm.

Cyclic groups

Let’s see what happens if we start calculating results of modular exponentiation for 14 with different bases:

$ python groups.py -m 14
Groups modulo 14
g:  0, [1, 0]
g:  1, [1]
g:  2, [1, 2, 4, 8]
g:  3, [1, 3, 9, 13, 11, 5]
g:  4, [1, 4, 2, 8]
g:  5, [1, 5, 11, 13, 9, 3]
g:  6, [1, 6, 8]
g:  7, [1, 7]
g:  8, [1, 8]
g:  9, [1, 9, 11]
g: 10, [1, 10, 2, 6, 4, 12, 8]
g: 11, [1, 11, 9]
g: 12, [1, 12, 4, 6, 2, 10, 8]
g: 13, [1, 13]

Neither of the numbers can generate all of the numbers that are smaller than the integer we calculate the modulo operation with. In other words, there is no generator (in number theoretic sense) that generates the whole group.

To find such numbers, we need to start looking at prime numbers.

Cyclic groups modulo prime

Let’s see what happens for 13:

$ python groups.py -m 13
Groups modulo 13
g:  0, [1, 0]
g:  1, [1]
g:  2, [1, 2, 4, 8, 3, 6, 12, 11, 9, 5, 10, 7]
g:  3, [1, 3, 9]
g:  4, [1, 4, 3, 12, 9, 10]
g:  5, [1, 5, 12, 8]
g:  6, [1, 6, 10, 8, 9, 2, 12, 7, 3, 5, 4, 11]
g:  7, [1, 7, 10, 5, 9, 11, 12, 6, 3, 8, 4, 2]
g:  8, [1, 8, 12, 5]
g:  9, [1, 9, 3]
g: 10, [1, 10, 9, 12, 3, 4]
g: 11, [1, 11, 4, 5, 3, 7, 12, 2, 9, 8, 10, 6]
g: 12, [1, 12]

The obvious result is that we now have 4 generators — 2, 6, 7 and 11 generate the whole group.

But there is other result hiding. Let’s see the results for 19, but with sizes of those groups shown:

$ python groups-ann.py -m 19
Groups modulo 19
g:   0, len:   2, [1, 0]
g:   1, len:   1, [1]
g:   2, len:  18, [1, 2, 4, 8, 16, 13, 7, 14, 9, 18, 17, 15, 11, 3, 6, 12, 5, 10]
g:   3, len:  18, [1, 3, 9, 8, 5, 15, 7, 2, 6, 18, 16, 10, 11, 14, 4, 12, 17, 13]
g:   4, len:   9, [1, 4, 16, 7, 9, 17, 11, 6, 5]
g:   5, len:   9, [1, 5, 6, 11, 17, 9, 7, 16, 4]
g:   6, len:   9, [1, 6, 17, 7, 4, 5, 11, 9, 16]
g:   7, len:   3, [1, 7, 11]
g:   8, len:   6, [1, 8, 7, 18, 11, 12]
g:   9, len:   9, [1, 9, 5, 7, 6, 16, 11, 4, 17]
g:  10, len:  18, [1, 10, 5, 12, 6, 3, 11, 15, 17, 18, 9, 14, 7, 13, 16, 8, 4, 2]
g:  11, len:   3, [1, 11, 7]
g:  12, len:   6, [1, 12, 11, 18, 7, 8]
g:  13, len:  18, [1, 13, 17, 12, 4, 14, 11, 10, 16, 18, 6, 2, 7, 15, 5, 8, 9, 3]
g:  14, len:  18, [1, 14, 6, 8, 17, 10, 7, 3, 4, 18, 5, 13, 11, 2, 9, 12, 16, 15]
g:  15, len:  18, [1, 15, 16, 12, 9, 2, 11, 13, 5, 18, 4, 3, 7, 10, 17, 8, 6, 14]
g:  16, len:   9, [1, 16, 9, 11, 5, 4, 7, 17, 6]
g:  17, len:   9, [1, 17, 4, 11, 16, 6, 7, 5, 9]
g:  18, len:   2, [1, 18]

Note that all the sizes of those groups are factors of 18 – that is p-1.

The third observation we can draw from those results is that, for any number the size of the group of the generated elements will be at most as large as the size of the base number.

With 19, if we take generator 8, the size of its subgroup is 6. But size of subgroups of 7, 18, 11 and 12 is respectively 3, 2, 3 and 6.

Thus, not only is the subgroup much smaller than the full group, it is also impossible to “escape” from it.

Safe primes

We saw that for primes, some bases are better than others (look into finite fields to learn more).

As we noticed, sizes of all groups are factors of the prime less one (see Fermat’s little theorem for proof of this). Of course, with the exception of 2, all primes are odd numbers, so p-1 will always be divisible by two – it will be a composite number. But q = (p-1)/2 doesn’t have to be composite. Indeed, we call primes for which q is also prime safe primes.

Let’s see what happens if we calculate groups of such a prime:

$ python groups-ann.py -m 23
Group sizes modulo 23
g:   0, len:   2, [1, 0]
g:   1, len:   1, [1]
g:   2, len:  11, [1, 2, 4, 8, 16, 9, 18, 13, 3, 6, 12]
g:   3, len:  11, [1, 3, 9, 4, 12, 13, 16, 2, 6, 18, 8]
g:   4, len:  11, [1, 4, 16, 18, 3, 12, 2, 8, 9, 13, 6]
g:   5, len:  22, [1, 5, 2, 10, 4, 20, 8, 17, 16, 11, 9, 22, 18, 21, 13, 19, 3, 15, 6, 7, 12, 14]
g:   6, len:  11, [1, 6, 13, 9, 8, 2, 12, 3, 18, 16, 4]
g:   7, len:  22, [1, 7, 3, 21, 9, 17, 4, 5, 12, 15, 13, 22, 16, 20, 2, 14, 6, 19, 18, 11, 8, 10]
g:   8, len:  11, [1, 8, 18, 6, 2, 16, 13, 12, 4, 9, 3]
g:   9, len:  11, [1, 9, 12, 16, 6, 8, 3, 4, 13, 2, 18]
g:  10, len:  22, [1, 10, 8, 11, 18, 19, 6, 14, 2, 20, 16, 22, 13, 15, 12, 5, 4, 17, 9, 21, 3, 7]
g:  11, len:  22, [1, 11, 6, 20, 13, 5, 9, 7, 8, 19, 2, 22, 12, 17, 3, 10, 18, 14, 16, 15, 4, 21]
g:  12, len:  11, [1, 12, 6, 3, 13, 18, 9, 16, 8, 4, 2]
g:  13, len:  11, [1, 13, 8, 12, 18, 4, 6, 9, 2, 3, 16]
g:  14, len:  22, [1, 14, 12, 7, 6, 15, 3, 19, 13, 21, 18, 22, 9, 11, 16, 17, 8, 20, 4, 10, 2, 5]
g:  15, len:  22, [1, 15, 18, 17, 2, 7, 13, 11, 4, 14, 3, 22, 8, 5, 6, 21, 16, 10, 12, 19, 9, 20]
g:  16, len:  11, [1, 16, 3, 2, 9, 6, 4, 18, 12, 8, 13]
g:  17, len:  22, [1, 17, 13, 14, 8, 21, 12, 20, 18, 7, 4, 22, 6, 10, 9, 15, 2, 11, 3, 5, 16, 19]
g:  18, len:  11, [1, 18, 2, 13, 4, 3, 8, 6, 16, 12, 9]
g:  19, len:  22, [1, 19, 16, 5, 3, 11, 2, 15, 9, 10, 6, 22, 4, 7, 18, 20, 12, 21, 8, 14, 13, 17]
g:  20, len:  22, [1, 20, 9, 19, 12, 10, 16, 21, 6, 5, 8, 22, 3, 14, 4, 11, 13, 7, 2, 17, 18, 15]
g:  21, len:  22, [1, 21, 4, 15, 16, 14, 18, 10, 3, 17, 12, 22, 2, 19, 8, 7, 9, 5, 13, 20, 6, 11]
g:  22, len:   2, [1, 22]

The groups look very different to the ones we saw previously, with the exception of bases 0, 1 and p-1, all groups are relatively large – 11 or 22 elements.

One interesting observation we can make about bases that have group order of 2q, is that even exponents will remain in a group of size 2q while odd will move to a group with order of q. Thus we can say that use of generator that is part of group order of 2q will leak the least significant bit of the exponent.

That’s why protecting against small subgroup attacks with safe primes is so easy, it requires comparing the peer’s key share against just 3 numbers. It’s also the reason why it’s impossible to “backdoor” the parameters (prime and generator), as every generator is a good generator.

Testing for SLOTH

Researchers at INRIA have published a new attack against TLS they called SLOTH. More details about it can be found at http://sloth-attack.org.

The problematic part, is that many frameworks (that is GnuTLS, OpenSSL, NSS) even if they don’t advertise support for MD5 hashes, would in fact accept messages signed with this obsolete and insecure hash.

Thus, to test properly if a server is vulnerable against this attack, we need a client that is misbehaving.

For easy writing of such test cases I have been working on the tlsfuzzer. Just released version of it was extended to be able test servers for vulnerability against the SLOTH attack (to be more precise, just the client impersonation attack – the most severe of the described ones).

Client impersonation attack

To test vulnerability of server to client impersonation attack, you will need the TLS server, set of a client certificate and key trusted by server and Python (any version since 2.6 or 3.2 will do). The full procedure for testing a server is as follows.

Certificates:

For testing we will need a set of certificates trusted by the server, in this case we will cheat a little and tell the server to trust a certificate directly.

Client certificate:

openssl req -x509 -newkey rsa -keyout localuser.key \
-out localuser.crt -nodes -batch -subj /CN=Local\ User

Server certificate:

openssl req -x509 -newkey rsa -keyout localhost.key -out localhost.crt -nodes -batch -subj /CN=localhost

Server setup

The test client expects an HTTP server on localhost, on port 4433 that requests client certificates:

openssl s_server -key localhost.key -cert localhost.crt -verify 1 -www -tls1_2 -CAfile localuser.crt

Reproducer setup

The reproducer has a bit of dependencies on the system.

First thing, you will need python pip command. In case your distribution doesn’t provide it, download it from https://bootstrap.pypa.io/get-pip.py and run using python:

python get-pip.py

After that, install dependencies of tlsfuzzer:

pip install --pre tlslite-ng

Note: Installation may print an error: “error: invalid command ‘bdist_wheel'”, it can be ignored, it doesn’t break installation of package. In case you want to fix it anyway, upgrade setuptools package installed on your system by running:

pip install --upgrade setuptools

Finally download the reproducer itself:

git clone https://github.com/tomato42/tlsfuzzer.git

Running reproducer

Once we have all pieces in place, we can run the reproducer as follows:

cd tlsfuzzer
PYTHONPATH=. python scripts/test-certificate-verify.py -k /tmp/localuser.key -c /tmp/localuser.crt

(if you generated user certificates in /tmp directory)

Results

If the execution finished with

MD5 CertificateVerify test version 4                              
MD5 forced...
OK
Sanity check...
OK
Test end
successful: 2
failed: 0

That means that the server is not vulnerable.

In case the “MD5 forced” failed, but “Sanity check” resulted in “OK”, it means that the server is vulnerable.

Example failure may look like this:

MD5 CertificateVerify (CVE-2015-7575 aka SLOTH) test version 4
MD5 forced...
Error encountered while processing node <tlsfuzzer.expect.ExpectClose object at 0xe3a410> with last message being: <tlslite.messages.Message object at 0xe3a8d0>
Error while processing
Traceback (most recent call last):
  File "scripts/test-certificate-verify.py", line 140, in main
    runner.run()
  File "/root/tlsfuzzer/tlsfuzzer/runner.py", line 139, in run
    msg.write()))
AssertionError: Unexpected message from peer: ChangeCipherSpec()

Sanity check...
OK
Test end
successful: 1
failed: 1

(if the error was caused by Unexpected message from peer: ChangeCipherSpec, as shown above, it means that the server is definitely vulnerable)

In case the Sanity check failed, that may mean one of few things:

  • the server is not listening on localhost on port 4433
  • the server does not support TLS v1.2 protocol, in that case it is not vulnerable (note: this is NOT a good workaround)
  • the server does not support TLS_RSA_WITH_AES_128_CBC_SHA cipher (AES128-SHA in OpenSSL naming system)
  • the server did not ask for certificate on first connection attempt

The cryptopocalipse is near(er)!

That’s at least what NIST, CNSS and NSA think.

The primary reason for deploying cryptographic systems is to protect secrets. When the system carries information with a very long life (like locations of nuclear silos or evidence for marital infidelity) you need to stop using it well before it is broken. That means the usable life of a crypto-system is shorter than the time it remains unbroken.

Suite B is a set of cryptographic algorithms in very specific configurations that was originally published in 2005. Implementations certified by NIST in the FIPS program were allowed for protection of SECRET and TOP SECRET information depending on specific key sizes used. In practice SECRET was equivalent to 128 bit level of security, so SHA-256 signatures, AES-128 and P-256 curve, TOP SECRET required 192 bit level of security with SHA-384 signatures, P-384 curve and AES-256 for encryption.

They now claim that quantum computers are much closer than we think (less than 10 years time frame) and as such the keys used for protection of secure information need to be increased in short term (significantly in case of ECC) and research of quantum resistant algorithms is now a priority.

New recommendations

That means we get a new set of recommendations.

To summarise:

If you’re using TLS or IPsec with Pre-Shared Keys (PSK) with AES-256 encryption, you’ll most likely be fine.

If you were planning deployment of ECC in near future, you should just increase key sizes of existing RSA and DH systems and prepare for deployment of quantum resistant crypto in near future instead.

For RSA and finite-field DH (a new addition to Suite B but very old crypto systems by their own right) the recommended minimum is 3072 bit parameters. That is not particularly surprising, as that is the ENISA as well as NIST recommendation for 128 bit level of security.

What is a bit surprising is that they have changed the minimum hash size from 256 to 384 bit.

For ECC systems the P-256 curve was degraded to be secure enough only to protect unclassified information, so it was put together with 2048 bit RSA or DH. The minimum now is P-384 curve.

So now the table with equivalent systems looks like this:

 LoS RSA key size DH key size ECC key size Hash AES key size
112 bit 2048 bit 2048 bit 256 bit SHA-256 128 bit
128 bit 3072 bit 3072 bit 384 bit SHA-384 256 bit

What does that mean?

Most commercial systems don’t need to perform key rotation and reconfiguration of their systems just yet, as the vast majority of them (nearly 90%) still use just 2048 bit RSA for authentication. What that does mean is that the recent migration to ECC (like ECDHE key exchange and ECDSA certificates) didn’t bring increase in security, just in speed of key exchange. So if you’re an admin, that means you don’t need to do much, at least not until other groups of people don’t do their part.

Software vendors need to make their software actually negotiate the curve used for ECDHE key exchange. Situation in which 86% of servers that can do ECDHE can do it only with P-256 is… unhealthy. The strongest mutually supported algorithms should be negotiated automatically and by default. That means stronger signatures on ECDHE and DHE key exchanges, bigger curves selected for ECDHE and bigger parameters selected for DHE (at least as soon as draft-ietf-tls-negotiated-ff-dhe-10 becomes a standard).

Finally, we need quantum computing resistant cryptography. It would be also quite nice if we didn’t have to wait 15 or even 10 years before it reaches 74% of web servers market because of patent litigation fears.

More nails to RC4 coffin

Last week Christina Garman, Kenneth G. Paterson and Thyla van der Merwe have published a new attacks on RC4 in a paper titled Attacks Only Get Better: Password Recovery Attacks Against RC4 in TLS. In it they outline an attack which recovers user passwords in IMAP and HTTP Basic authentication using 226 ciphertexts. Previous attacks required about 234 ciphertexts.

The other attack, published yesterday at the BlackHat conference, is the Bar-mitzvah attack which requires about 229 ciphertexts.

While connections to relatively few servers (~6% of Alexa top 1 million TLS enabled sites) will end up with RC4 cipher, the 75% market share of RC4 in general is not reassuring.

RC4 prohibited

After nearly half a year of work, the Internet Engineering Task Force (IETF) Request for Comments (RFC) 7465 is published.

What it does in a nutshell is disallows use of any kind of RC4 ciphersuites. In effect making all servers or clients that use it non standard compliant.

Halloween special

Since the Haloween is the time of scary stories, let me share with you some preliminary results of the new cipherscan results.

I’ve extended the tool to check different grades of TLS ClientHello intolerance, like TLS version, size of client hello and the results are scary indeed.

Over 3% of servers just in Alexa top 2000 sites are not RFC compliant (can’t process big ClientHello, ClientHello with extensions they don’t understand, ClientHello with high protocol version, like for TLSv1.2 or with cipher suites they don’t know).

Two servers in this set are strictly TLSv1.2 intolerant, that means they use TLS implementations that can’t handle new protocol version 6 years after its release!

Those are the servers which force web browsers to use the fallback mechanism or limit their cipher suite set.

47% have SSLv3 still enabled, making them possibly vulnerable to POODLE (if they support ciphers besides RC4 in SSLv3).

Statistics

The new section added to statistics is “Required fallbacks”. It reports what kind of client connection attempts are problematic to servers the scan was still able to connect to.

The set of “big” hellos (big-SSLv3, big-TLSv1.0, big-TLSv1.1, big-TLSv1.2) are “Christmas tree packet” like connection attempts – where every option, extension and cipher suite supported by the OpenSSL tool was enabled.

In case the big-TLSv1.2 test was unsuccessful, the tool runs additional tests: “no-npn-alpn-status” where it queries the server again with a TLSv1.2 packet, but drops the NPN, ALPN and certificate status request (OCSP staple) extensions. “small-TLSv1.2” and “small-TLSv1.0” are reduced size packets, that still include SNI, elliptic curves (albeit limited to the P-256, P-384 and P-521 curves) and signature algorithms, but offer only limited set of ciphers, modelled after Firefox ClientHello. All those are from connections that were tolerant to at least one of the “big-” hello’s.

Then there are “inconclusive ” results, which show servers which the tool was able to connect to only using at least one of the “no-” or “small-” fallbacks but not using the “big-“. Such servers are not included in the general stats, but they do exclude servers which didn’t provide valid certificates.

SSL/TLS survey of 1693 websites from Alexa's top 2 thousand
Stats only from connections that did provide valid certificates
(or anonymous DH from servers that do also have valid certificate installed)

Supported Protocols       Count     Percent
-------------------------+---------+-------
SSL2                      44        2.5989
SSL2 Only                 4         0.2363
SSL3                      801       47.3125
SSL3 Only                 26        1.5357
SSL3 or TLS1 Only         295       17.4247
TLS1                      1663      98.228
TLS1 Only                 74        4.3709
TLS1.1                    1265      74.7194
TLS1.2                    1325      78.2634
TLS1.2, 1.0 but not 1.1   89        5.2569

Required fallbacks                       Count     Percent
----------------------------------------+---------+-------
big-SSLv3 Only tolerant                  22        1.2995
big-SSLv3 intolerant                     892       52.6875
big-SSLv3 tolerant                       801       47.3125
big-TLSv1.0 intolerant                   26        1.5357
big-TLSv1.0 or big-SSLv3 Only tolerant   2         0.1181
big-TLSv1.0 tolerant                     1667      98.4643
big-TLSv1.1 intolerant                   29        1.7129
big-TLSv1.1 tolerant                     1664      98.2871
big-TLSv1.2 intolerant                   49        2.8943
big-TLSv1.2 tolerant                     1644      97.1057
inconclusive small-TLSv1.0               7         0.4135
inconclusive small-TLSv1.2               7         0.4135
no-npn-alpn-status intolerant            49        2.8943
small-TLSv1.0 tolerant                   49        2.8943
small-TLSv1.2 intolerant                 2         0.1181
small-TLSv1.2 tolerant                   47        2.7761

RSA and ECDSA performance

Update 2017-07-03: nginx does support hybrid configuration with RSA and ECDSA certificates for single virtual host

As servers negotiate TLS connection, few things need to happen. Among them, a master key needs to be  negotiated to secure the connection and the client needs to be able to verify that the server it connected to is the one it intended to connect to. This happens by virtue of key exchange, either RSA, finite field Diffie Hellman (DH) or Elliptic Curve Diffie Hellman (ECDH). The server itself can be identified using either RSA or Elliptic Curve Digital Signature Algorithm (ECDSA) based certificates. All of them have their strong sides and weak sides, so let’s quickly go through them.

RSA key exchange

This way of deriving the key securing the connection is the simplest of the mentioned. It is supported by virtually all the clients and all the servers out there. Its strong point is relative high performance at small key sizes. Unfortunately, it doesn’t provide Forward Secrecy (PFS), which means that in case of private key leak all previous communication will be possible to decrypt.

Side note: while the acronym PFS stands for “Perfect Forward Secrecy”, the term “Perfect” comes from cryptography where it has a slightly different meaning than in vernacular. To limit confusion I’ll be calling it just “Forward Secrecy”.

Finite field Diffie-Hellman key exchange

Forward Secrecy is the prime feature of the ephemeral version of Diffie–Hellman. Its main drawback is high computational cost. Additionally, not all clients support it with RSA authentication, all versions of Internet Explorer being the prime example. While there exist a non ephemeral version of DH, it doesn’t support PFS nor has widespread support so we will ignore it for now.

Elliptic Curve Diffie-Hellman key exchange

ECDH is the new kid on the block, this means that it is supported only by relatively new clients. The strong points are low computational cost and much smaller key sizes for the same security levels. Similarly to DH, there exist ephemeral and non ephemeral version of it, the latter has limited support in clients and does not provide PFS.

RSA authentication

It’s the venerable and widely supported cryptosystem. Supported by virtually anything that has support for TLS or SSL. With us since the SSLv2. Unfortunately it’s starting to show its age, performance at key sizes providing 128 bit level of security is rather low and certificates are starting to get rather large (over 1KiB in size for 3072 bit RSA).

ECDSA authentication

Elliptic Curve Digital Signature Algorithm, just like ECDH is a new cryptosystem. This causes it to suffer from same problem: no support for it in old clients. It does use much smaller key sizes for the same security margins and is less computationally intensive than RSA.

Benchmarking

While the key exchanges and authentication mechanisms are relatively separate, in TLS the ciphersuites limit the available options. In fact, for the ECDSA ciphers, only ECDH key exchange is available. Or to spell it out, I could test only following configurations:

RSA key exchange - RSA authentication
DHE key exchange - RSA authentication
ECDHE key exchange - RSA authentication
ECDHE key exchagne - ECDSA authentication

The tested key sizes were: 1024 bit RSA as the recently obsoleted commonly used size, 2048 bit RSA as the current standard key size, 3072 bit as the recommended key size for systems that have to remain secure for foreseeable future, 4096 bit as the minimal size that matches the existing CA key sizes and is secure for foreseeable future (as there are virtually no commercial CAs that have 3072 bit key sizes) and finally 256 bit ECDSA as the recommended key size for secure systems for foreseeable future.

The tests were done on Atom D525 @ 1.8GHz with 4GiB of RAM (as an example of lowest performance on a relatively modern hardware) and used httpd-2.4.10-1.fc20.x86_64, mod_ssl-2.4.10-1.fc20.x86_64 with openssl-1.0.1e-39.fc20.x86_64. Interestingly, while this CPU has SSSE3, it doesn’t have AES-NI or SSE4.1, this makes AES-128-GCM faster than AES-128-CBC (50.7MiB/s vs 27.8MiB/s). Every query from client has downloaded 4KiB of data. The graphs show the maximum performance while serving concurrent users (usually around 8-10 at the same time).

So, lets compare RSA authenticated ciphers performance.

rsa-performance

Performance of different RSA key sizes

We can clearly see that that using non PFS enabled key exchanges brings huge performance improvements for legacy key sizes. RSA key exchange at 1024 bit is actually over 3.4 times faster than DHE key exchange. At this small sizes, using 256 bit ECDHE also doesn’t show much performance advantage over DHE, it is just 11% faster (note though that it is also comparable to 3072 bit DHE in security level).

Going to 2048 bit RSA, the performance advantage of not using PFS ciphers quickly shrinks. Here using RSA key exchange over 1024 bit DHE gives about 78% more performance. This configuration, while common because of lack of support of higher DHE parameter sizes in older Apache servers, doesn’t really provide higher security against targeted attack than use of 1024 bit RSA. Use of ECDHE is also looking much more interesting, at only 40% penalty compared to RSA key exchange. Using DHE key exchange with matching parameter sizes give performance that is nearly 7 times slower than pure RSA.

ECDHE shows its real potential at the 3072 bit RSA key size where while providing PFS with matching level of security it requires just 26% more computing power, again outperforming 1024 bit DHE. The ephemeral DH with matching key size gives a truly abysmal 800% drop in performance.

At the 4096 bit level, where the 256 bit ECDHE is the weaker link, pure RSA key exchange is just 18% more power hungry.

So, how does it compare to ECDSA key exchange?

ECDSA vs RSA authenticated connections

ECDSA vs RSA authenticated connections

If we use the currently acceptable 2048 bit RSA key exchange, it will turn out that the RSA is about 3% faster than the combination of ECDHE key exchange and ECDSA authentication (both using 256 bit curve). But if the required security level reaches 128 bits or PFS is required ECDSA with ECDHE is much faster.

ECDHE and  ECDSA with 256 bit curves is 2.7 times faster than 2048 bit RSA with 256 bit ECDHE and 3.4 times faster than 3072 bit RSA alone!

Apache

In other words, if you’re running Apache web servers you should consider using two certificates for the same site. This will allow to negotiate RSA cipher suites with the legacy clients while it will provide lessened load on the server with modern clients.

The configuration is also relatively simple, you just need to specify certificate and key file twice:

SSLCertificateFile /etc/pki/tls/certs/example.com-RSA.crt
SSLCertificateKeyFile /etc/pki/tls/private/example.com-RSA.key
SSLCertificateFile /etc/pki/tls/certs/example.com-ECDSA.crt
SSLCertificateKeyFile /etc/pki/tls/private/example.com-ECDSA.key

and make sure that your SSLCipherSuite doesn’t disable ECDSA authenticated ciphersuites (just check if this command outputs anything: openssl ciphers -v <cipher string from apache> | grep ECDHE-ECDSA).

nginx

For nginx, the configuration is very similar, you will need to run the relatively new 1.11.0 version, or later (see CHANGES) though.

ssl_certificate /etc/pki/tls/certs/example.com-RSA.crt
ssl_certificate_key /etc/pki/tls/private/example.com-RSA.key
ssl_certificate /etc/pki/tls/certs/example.com-ECDSA.crt
ssl_certificate_key /etc/pki/tls/private/example.com-ECDSA.key

Similarly, you need to make sure that the cipher string used doesn’t disable ECDSA authenticated ciphersuites.