Block ciphers modes with forward secrecy for Cryptocat/OTR

Off-the-Record Messaging (OTR) is a protocol that allows people to chat in realtime securely and privately.  One of the most interesting features I’ve found in OTR is the ability to provide Perfect Forward Secrecy (PFS). Simply stated, an encryption system provides forward secrecy if even if suddenly all private keys in existence are compromised, an attacker cannot read past messages sent. To achieve PFS, OTR creates Diffie-Hellman ephemeral keys and derive transient private keys, which are disposed after some time. Nevertheless, as is explained in the section 4.2 of the paper, sometimes keys are kept in memory for long times before disposal if the remote user does not reply to the messages sent.

I can think of many scenarios where this is a drawback:TOPSECRET

  1.  Alice sends many messages in a row, but Bob does not reply.
  2. Alice wants to send a big file (say 10 Mbytes) to Bob using OTR.
  3. Alice wants to send realtime audio/video chunks recorded with his microphone/camera over OTR.
  4.  Alice is downloading a file from a site and at the same time she is sending it to Bob using OTR.

In all these cases she wants that, at any time, if her computer is compromised, the data already sent is protected unconditionally.
Also in the last two cases, going through Diffie-Hellman for every block transmitted may imply a very high overhead, and a reduction in the throughput because of the round trip time  (RTT) required to exchange D-H messages.

My proposal it to change the block chaining mode in the protocol, which is currently AES-CTR, to a special counter mode that uses a one-way hash chain instead of an incremental counter. Also the IV is derived from the shared secret, instead of being public. Because such special modes do not exists in literature (at least as far I could review), I decided to create these new kind of modes myself.

0. Hash Chain Prefix  (HCP)

Let SS be the shared secret, let P[i] be a plaintext block and C[i] the corresponding ciphertext block, and Hash() is a cryptographically secure hash function. Normally the OTR protocols does:

EK=Hash(SS) (encryption key)
MK=Hash(EK) (authentication/MAC key)

My first idea was to change this initialization sequence to:

IVK = Hash(SS | intialization-vector)
EK=Hash(SS | encryption)
MK=Hash(SS | authentication)

And destroy SS afterwards.

We define the sequence:

BUFFER[0]=IVK (the buffer is 256 or 512 bits wide)
BUFFER[i] =Hash(BUFFER[i-1]) (in the implementation, BUFFER[i] overrides the memory positions of BUFFER[i-1])

Encryption: C[i] = AES(EK,BUFFER[i]) XOR P[i]

BUFFER[i] is truncated to 128 bits (the AES block size). After each block i of AES-CTR is generated, the BUFFER is recycled and the previous counter is lost forever.
If an attacker manages to break in the computer at a certain time j, then it’s computationally impossible for the attacker to recover the previous i<j counters (since SS was destroyed and BUFFER[j] cannot be used to recover BUFFER[i]).
The overhead of a Hash evaluation per block is unimportant for a protocol such as Cryptocat, and is much less than a full D-H key exchange per block sent. HCP is approximately 50% as fast as the normal AES-CTR mode.

Counters and Cycles

If you’re worried about short cycles in the hash chain, then you can use a 512-bit digest hash function or take the hash of the previous buffer and append a sequential 128-counter before rehashing such as :

BUFFER[i] =Hash(BUFFER[i-1] || counter[i])

counter[i] = counter[i-1]+1

Still another possibility is to also append the counter after the hash (an take a prefix of the message digest slightly shorter):

BUFFER[i] =Hash(BUFFER[i-1] || counter1[i])

Encryption: C[i] = AES(EK,TRUNC(b,BUFFER[i]) || counter2[i] ) XOR P[i]

counter1[i] = counter1[i-1]+1

counter2[i] = counter2[i-1]+1

Where TRUNC(b,x) is a function that truncates the input argument x to b bits.

For a 128-bit block cipher, we can use 96 bits for the hash (b=96) digest prefix, 32 bits for the unhashed counter2 and 64-bits for the hashed counter1. The advantage of this method is that, as long as we don´t encrypt more than 2^32 blocks,  we can guarantee the security of the encryption operation even if the hash function does not behave as a random function.

Problem: Decryption Oracle

This proposal has a drawback. Suppose an attacker breaks in a computer that is  communicating with OTR. The attacker can try to guess a plaintext and use it to check if the encryption is correct.

Attacker knows: EK, BUFFER[i], the last sent ciphertext block C[i]
Attacker guess: last block is G.

C[i] = AES(EK, BUFFER[i-1]) XOR G

The attacker can compute X = Decrypt(EK,G XOR C[i]).

The attacker can then try to check if Hash(X || Y )=BUFFER[i], where Y is the remaining part of the hash that was not included in X.

If the hash digest length is 512 bits, and the block size is 128 bits then the attack seems still difficult to execute. The only ways I know to attack this construction is by brute-force or try to execute a meet-in-the-middle attack at half of the hash function rounds. Nevertheless the existence of this hash prefix pre-image attack may still be considered a weakness of the scheme.

For example  a 160 hash digest would be totally breakable and a 256 bit hash digest would be at least problematic.

And if the attacker succeeds in guessing G[i] and recovering BUFFER[i], where G[i] is a plaintext in the middle of the communication, then he can decrypt all the following blocks.

To solve this problem I propose four solutions that should be provably secure in the Random Oracle Model, and a two additional solutions whose security I cannot assure. The ROM means that we assume that the hash function used look like a random function, which is a stronger assumption than the standard requirements of pre-image and second-preimage resistance for cryptographic hash functions.

1. Hash Chain Re-Key (HCRK)

Instead of using AES-CTR, re-key AES in each block with BUFFER[i] as key and the plaintext as all zeros. The encryption becomes:
C[i] = AES(BUFFER[i],0) XOR P[i]

Obviously this is not the best way to use a cipher, since key-schedule takes much more time than encryption (more than 2 times in AES). There are ciphers that have fast key schedules, like Threefish, but this is not very common.
But there is also the problem of related key attacks. Most ciphers are not well protected against this attack.

Finally, this construction is prone to a multi-target attack on the key:  when you encrypt a constant file of n blocks, a brute-force attacker can recover the key with 2^k/n operations where k is the size of the key.

2. Hash Chain Hashed (HCH)

This is similar to the original method but the counter is taken to be a hash of each hash chain block. A special string is pre-appended to differentiate between the hash of the chain and the hash that goes to the counter.

BUFFER[i] =Hash(BUFFER[i-1])

CTR[i] = Hash( “CTR” || BUFFER[i])

C[i] = AES(CTR[i],0) XOR P[i]

This is approximately 33% as fast as the normal AES-CTR mode.

Another similar construction is to use two different constants K1 and K2 to differentiate between the hash chain and the counter  hash. It also has the advantage that is does not expand the size of the message to digest.

BUFFER[i] =Hash(K1 XOR BUFFER[i-1])

CTR[i] = Hash( K2 XOR BUFFER[i])

3.  Hash Chain Double Encryption (HCDE)

Use two hash-chain counters (keystreams) simultaneously to encrypt the plaintext:

Let SS be the initial shared secret.

IVK1 = Hash(SS | intialization-vector1)
IVK2 = Hash(SS | intialization-vector2)
EK=Hash(SS | encryption)
MK=Hash(SS | authentication)

And SS is destroyed.

Then we define the sequence:

BUFFER1[i] =Hash(BUFFER1[i-1])

BUFFER2[i] =Hash(BUFFER2[i-1])

Encryption: C[i] = AES(EK,BUFFER1[i]) XOR AES(EK,BUFFER2[i]) XOR P[i]

Now if the attacker knows C[i], G (guess), BUFFER1[i], BUFFER2[i] and EK, he cannot test if G is the correct previous plaintext.

This is almost 25% as fast as the normal AES-CTR mode.

4. Hash Chains Combined into Keystream (HCCK)

(this construction was previously named CCSK in this post, but it was changed because there is already a construction called CCSK!)

IVK1 = Hash(SS | intialization-vector1)
IVK2 = Hash(SS | intialization-vector2)
EK=Hash(SS | encryption)
MK=Hash(SS | authentication)
Then we define the sequence:

BUFFER1[i] =Hash(BUFFER1[i-1])

BUFFER2[i] =Hash(BUFFER2[i-1])

Encryption: C[i] = AES(EK,BUFFER1[i] XOR BUFFER2[i]) XOR P[i]
Now the performance is approximately 33% of the original AES-CTR. The advantage of HCCK over  HCH is that HCCK hashes can be completely computer in parallel by two threads, while HCH cannot do that.

The advantage of HCDE over HCCK is that it is much easier to prove the security of the mode, since it’s mostly a double encryption of two ciphers in a AES-CTR-like mode.

5. Hash Chain for Two Identical Keystreams (HCTIK)

IVK = Hash(SS | intialization-vector)
EK=Hash(SS | encryption)
MK=Hash(SS | authentication)

K1,K2 are fixed constants.

Then we define the sequence:

BUFFER[i] =Hash(BUFFER[i-1])

Encryption: C[i] = AES(EK,BUFFER[i] XOR K1) XOR AES(EK,BUFFER[i] XOR K2) XOR P[i]
The security relies on the assumption that AES is not homomorphic with respect to XOR. The performance is approximately 33% of the original AES-CTR.

6. Hash Chain for Two Different Keystreams (HCTDK)

IVK = Hash(SS | intialization-vector)
EK1=Hash(SS | encryption1)

EK2=Hash(SS | encryption2)
MK=Hash(SS | authentication)

K1,K2 are fixed constants.

Then we define the sequence:

BUFFER[i] =Hash(BUFFER[i-1])

Encryption: C[i] = AES(EK1,BUFFER[i] XOR K1) XOR AES(EK2,BUFFER[i] XOR K2) XOR P[i]
This mode is similar to the previous one, but each encryption uses its own encryption key. The performance is approximately 33% of the original AES-CTR. The only reason HCTIK should be preferred over HCTDK is if the mode is going to be implemented in an ASIC or FPGA and the memory space to store the expanded key schedule is not enough to store both schedules.

The security of HCTIK and HCTDK rely on the fact that encryption of a random block  should be indistinguishable from a random  block while the security of HCCK relies on the fact that the hash digest of random message should indistinguishable from a random digest. And that’s why I prefer HCCK over HCTIK/HCTDK.

Last, if you cannot implement SHA-512 as the undeling hash function of the HC modes, and you only have an AES cipher, it’s possible to construct a 256-bit hash function using AES, as described in the paper AES-Hash.


If one wants to go one step further, then we can also change the OTR protocol to provide deniability during long message transmission, without redoing the D-H key exchange:


MK[i] = Key-derivation-function( MK , i )

For example:

MK[i] = HMAC( MK || i )
Then Alice send each block i authenticating with the key MK[i] and Bob acknowledges the correct reception (by MAC verification), then MK[i] is published, without any loss in privacy.

Mixing Counters with Hash Chains

Generally in protocols such as OTR, you don’t meed to provide forward secrecy for each 128-bit block, but for larger data chunks, such as messages or transactions. This data boundary usually agrees with the authentication boundary, because messages must be kept in memory without processing until the authentication is verified. Also, even if you wish to provide forward secrecy and deniability at the block level over a TCP connection on the Internet, a session layer acknowledge would be required before publishing each MAC key used, so the communication would be very slow. A 10 Kbyte artificial HC/MAC boundary would solve this problem. Between these message boundaries, a simple counter as in AES-CTR can be used, starting from the last hash link of the chain. This removes almost all the overhead of the HC modes described.

Edit: The first comment I received tells about a protocol DUKPT that supposedly does something similar to the HC modes. This protocol is not free (you have to buy the complex ANSI standard)  and seems to be quite old, tied specifically to the DES encryption cipher. But the main difference seems to be that in DUKPT the shared secret or “Base Derivation Key” (BDK) is not destroyed after each transaction. That means that if the BDK is compromised, past secrecy is lost, and that’s what we try to avoid with HC modes.  With HC modes, even if all the private keys in the world are revealed, the past communications cannot be decrypted.

Edit2: As described in my follow up post, SCIMP (the protocol to chat used in Silent Circle software) has a one-way key derivation for each message, similar to the Hash Chain Prefix  (HCP) mode.


I have proposed seven new block cipher modes with forward secrecy:  HCP, HCRK, HCH, HCDE, HCCK, HCTIK and HCTDK. Also I’ve proposed how to provide deniability without executing more Diffie-Hellman key exchanges, with few protocol modifications.

My proposal is to create a new version of OTR that uses a single D-H key-exchange at the beginning and afterwards it uses HCCK. If we use this protocol to any message (being short or long) we achieve a much faster protocol with the same security/privacy guarantees.

  1. #1 by none on February 24, 2013 - 9:27 am

    Look at DUKPT which has been used in payment processing for decades. It supplies forward secrecy by iterating a hash function once for every transaction, more or less. The two sides start out with shared state and stay in sync with each other.

    • #2 by SDLerner on February 24, 2013 - 2:09 pm

      It seems to me that DUKPT has not perfect forward secrecy. If the “Base Derivation Key” (BDK) is compromised, all past communications can be decrypted.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: