Linux: Offline Processing of Encrypted Volumes


This article explains how to process encrypted volumes offline, i.e. without using the Linux kernel's disk encryption subsystem called dm-crypt and without mounting the file system that is stored on an encrypted volume.

The term processing means either

The reader may be interested in the above if

I assume that the reader is familiar with the concept of disk encryption as well as with the theory behind the implementation found in the Linux kernel.

Below I show how to process volumes that are encrypted with AES-256 in the CBC mode using ESSIV and key derivation made with SHA-256, i.e. volumes that can be configured on Linux with the following command:

sudo cryptsetup --hash sha256 -c aes-cbc-essiv:sha256 -s 256 create \
    devmapper-name block-device-node
The complete shell script that demonstrates re-encryption of a volume can be found here.

Preparation of Key Material

Passphrases

The user is expected to know the passphrase of the source volume. This passphrase is read into the variable spwd on line 73. The user is expected to supply the passphrase of the destination volume. This passphrase is read into the variable dpwd on line 84.

73: spwd="${REPLY}"

84: dpwd="${REPLY}"

Decryption and Encryption Keys

A key is derived from the corresponding passphrase using the SHA-256 algorithm by the function keyhex found on lines 11–14. Since OpenSSL's enc(1) expects keys in the form of a sequence of hexadecimal digits, the derived keys are stored in this format. The decryption key of the source volume is stored in the variable skeyhex on line 75 and the encryption key of the destination volume is stored in the variable dkeyhex on line 86:

 11: keyhex ()
 12: {
 13:     echo -n "${1}" | sha256sum | cut -f1 '-d '
 14: }

 75: skeyhex="$(keyhex "${spwd}")"

 86: dkeyhex="$(keyhex "${dpwd}")"

Values of ESSIV Salt

ESSIV requires a salt which is derived from the corrsponsing key using the SHA-256 algorithm by the function salt found on lines 25–28. Since sha256sum(1) expects binary data, the key is converted into an intermediate representation by the helper function key found on lines 18–21. The intermediate representation of the decryption key of the source volume is stored in the variable skey on line 77. The ESSIV salt of the source volume is stored in the variable ssalt on line 79. The intermediate representation of the encryption key of the destination volume is stored in the variable dkey on line 89. The ESSIV salt of the source volume is stored in the variable dsalt on line 91.

 18: key ()
 19: {
 20:     echo -n "${1}" | sed 's/\([[:xdigit:]][[:xdigit:]]\)/\\x\1/g'
 21: }

 25: salt ()
 26: {
 27:     echo -ne "${1}" | sha256sum | cut -f1 '-d '
 28: }

 77: skey="$(key "${skeyhex}")"

 79: ssalt="$(salt "${skey}")"

 89: dkey="$(key "${dkeyhex}")"

 91: dsalt="$(salt "${dkey}")"

Disk Access

Input

The source volume is accessed sequentially, in 512-byte blocks. The total number of blocks in the source volume is stored in the variable blocks on line 95. The number of the current block is stored in the variable i that is initialised on line 101. Reading stops when the number of the current block reaches the total number of blocks in the source volume; the condition is checked on line 102. The source volume is read on line 112. The number of the current block is incremented on line 117.

 95: blocks=$(($(stat -c "%s" "${src}") / 512))

101: i=0
102: while [[ ${i} -lt ${blocks} ]]; do
 
112:     dd if=${src} bs=512 count=1 skip=${i} 2>/dev/null

117:     i=$((${i} + 1))
Please note that the size of the source volume in bytes is expected to be a multiple of 512. This condition is not enforced in the demonstration script.

Output

Data is appended to the destination volume on line 116 using the output redirection facility of the shell.

Data Processing

Computation of ESSIV IV

The value of ESSIV IV depends on the logical number of the current block represented as a 64-bit, little-endian unsigned integer. The conversion from the host representation of integers is performed by the function secno found on lines 33–46. Please note that the function depends on the representation of integers in the shell. The demonstration script does not attempt to verify correctness of arithmetic operations.

 33: secno ()
 34: {
 35:     local sec="${1}"
 36: 
 37:     echo "$(printf "%02x%02x%02x%02x%02x%02x%02x%02x" \
 38:         $((${sec} & 0xff)) \
 39:         $(((${sec} >> 8) & 0xff)) \
 40:         $(((${sec} >> 16) & 0xff)) \
 41:         $(((${sec} >> 24) & 0xff)) \
 42:         $(((${sec} >> 32) & 0xff)) \
 43:         $(((${sec} >> 40) & 0xff)) \
 44:         $(((${sec} >> 48) & 0xff)) \
 45:         $(((${sec} >> 56) & 0xff)))"
 46: }

The value of ESSIV IV is computed as the result of encryption of the logical number of the current block padded with zeroes to the size of an AES block (16 bytes) using AES-256 in the CBC mode with the corresponding salt as the key and zero initialisation vector by the function iv found on lines 52–63. The current value of ESSIV IV for the source volume is stored in the variable siv on line 104. The current value of ESSIV IV for the destination volume is stored in the variable div on line 106.

 52: iv ()
 53: {
 54:     local sector="${1}"
 55:     local salt="${2}"
 56: 
 57:     echo -ne "${sector}0000000000000000" \
 58:         | sed 's/\([[:xdigit:]][[:xdigit:]]\)/\\\\x\1/g' \
 59:         | xargs printf \
 60:         | openssl enc -aes-256-cbc -nopad -nosalt -K "${salt}" -iv 0 \
 61:         | hexdump -v -e '/1 "%02x"' \
 62:         | cut -b1-32
 63: }

104: siv="$(iv "$(secno "${i}")" "${ssalt}")"

106: div="$(iv "$(secno "${i}")" "${dsalt}")"

Decryption

Decryption of the current block of the source volume is performed on lines 113–114.

113: 	| openssl enc -d -aes-256-cbc -nopad -nosalt \
114:	    -K "${skeyhex}" -iv "${siv}" \

Encryption

Encryption of the current block of the destination volume is performed on lines 115–116.

115: 	| openssl enc -aes-256-cbc -nopad -nosalt \
116: 	    -K "${dkeyhex}" -iv "${div}" >> "${dst}"

Performance

Obviously, the demonstration script, which relies solely on available standard system utilities, is maybe the slowest possible implementation of re-encryption, which I wrote with the intent of being understood.

A fast implementation should distribute re-encryption of data blocks among available cores of the CPU; separate data blocks can be processed independently. If a GPU is available, re-encryption can be parallelised to a greater degree which may yield a very much better performance.

Vadim Penzin, September 1st, 2015


I hereby place this article along with the accompanying source code into the public domain.
You are welcome to contact me by writing to dmcrypt at this domain.
I publish this information in the hope that it will be useful, but without ANY WARRANTY.
You are responsible for any and all consequences that may arise as the result of using this information.