BFA7 Blowfish Advanced 7 File Format Specification and Technical Reference (c)1996 AtmuteSoft File Format ~~~~~~~~~~~ When you encrypt a file with Blowfish Advanced 7 (BFA7) its original data will be put in a kind of digital enveloppe. This enveloppe guarantees save data recovering, secure password error and data corruption detection and at last an authentication check. Another design criteria was serialization, that means that a file's bytes always will be written one after the other, thus no seek operation will be necessary at any time. This enables BFA7 to work with backup utilities, especially streamer tapes, and avoids a lot of i/o overhead because the read/write heads of harddisks mustn't be moved too much. But the most important argument was to develop a format that can be extended in future versions easily but also be read by older versions if the new added informations aren't important for an accurate decryption. BFA7's successor is already under development and will become a native Windows(tm) 95 program. Normally the movement from a 16bit to a 32bit operation system (just think about the long filenames) is a major undertaking and often ends in a necessary redesign of file formats. But not to Blowfish Advanced 7. It has already been prepared for the future. A file encrypted BFA7 will rawly look like that : ============== | loader | ============== | header | ============== ----- | infoblock | / \ | (filename) | | ============== | | data chunk | encrypted (*) | data chunk | | | ... | | ============== | | tailer | \ / ============== ----- Before I start to explain this scheme please remember that all data structures are declared in the Borland Pascal programming language for the 16bit DOS operating system with no alignment between the structure members. And you'll often hear about keys and passwords. Both are the same, there's no difference between them, although a password is mostly a real word or sentence and a key the generally description for the corresponding ASCII bytes or the bytes entered in numercial/ mouse/antitap mode The first unit in a cryptfile is the so called "loader", in Pascal its structure looks like the following : TCryptfileLoader = record LatestVersion : Word; Signature : Longint; HeaderSize : Byte; end; The loader is, as you can see, only 7 bytes long, so it can be read out very quickly. This allows BFA7 to detect unencrypted or unsupported files (especially in viewing mode) fast and easily. 'LatestVersion' contains the version number of the program necessary to decrypt this file. Future version may have new features which won't be handled correctly with BFA7. The minor version number is stored in the low, the major version number in the high byte. 'Signature' is a 4byte ID which always contains the hexdecimal constant $75191114 (yes, it's a birtday). If BFA7 detects this value the file will be identified as a cryptfile, otherwise it'll output the "(not encrypted)" message. 'HeaderSize' contains the size of the following header structure. This allows future version to store additional data in the header. BFA7 will read them - and ignore them. Now you see why the loader is called "loader". After the loader has been interpreted correctly BFA7 will load TCrypfileLoader.HeaderSize bytes and treat them as the header in the following structure : TCryptfileHeader = record Version : Word; TailerSize : Byte; FileInfChecksum : Word; FileInfSize : Byte; CBC_IV_Left : Longint; CBC_IV_Right : Longint; end; The header contains all data necessary to start the decryption process. Please remember that all bytes following the header are already encrypted (*). 'Version' stores the version of the program which has encrypted the actual file. Therefore future version might have the chance to fix problems or speed up the decryption process. 'TailerSize' contains the length of the cryptfile's tailer. More about the tailer will follow later. 'FileInfChecksum' contains a CRC16 checksum of the following file information block (FIB). As already mentioned, the FIB will be encrypted. This gives BFA7 the possibility to detect wrong passwords very fast. If a wrong password is given, the FIB won't be decrypted correctly, the CRC16 check will fail and BFA7 will break with the "password/algorithms failed" message. Please watch that the CRC16 is only a "fingerprint" of the FIB. If the calculated CRC16 is equal to the one stored in the header it does NOT mean that the password is the right one for 100%, but only with a propability of 1:65536! Let us assume that you have to try out 2^32 passwords because your estimated (binary) key length is 4 bytes. The CRC16 check will show success every 65536th password on average. With 2^32 possibilities at all you will get 2^32/65536 possible passwords. Have fun while trying them all out! Secure passwords should always be longer, so the CRC16 check is not a security lack, but a fine method to detect wrong passwords quickly, even if you want to encrypt your whole file system (do you really have more that 65536 files on your drive?). Even if a wrong key passes the CRC16 check BFA7 will detect it by the global CRC32 check. Just to be mentioned : BFACRACK, the brute force key searching program for registered users is trying to get the passwords by decrypting the FIB and checking the CRC16. This allows you to find passwords you don't remember completely or you may have typed in incorrectly. But BFACRACK is no wizzard! And BFA7 has no trapdoors. 'FileInfSize' is the size of the FIB. Future versions will have larger FIB's, so it's necessary to play the same game as we did it with the sizable header. 'CBC_IV_Left' and 'CBC_IV_Right' contain the so called Cipher Block Chaining Initalisation Vector (CBC IV), a 64bit random value that is splitted into two 32 parts. This encryption start value makes every cryptfile unique, so even if you encrypt the file with the same password you will always get different cryptfiles. Because BFA7 encrypts all data in 64bit blocks the CBC IV must have the same size. All the following blocks, which contain the encrypted data are linked together. And the first element in this chain is simply this vector. You may miss an entry for the algorithm(s) used for encryption. Well, I decide to resign for that option for two reasons : a) decryption of multiple files can be managed much faster b) it's an additional secret information. If someone doesn't know the algorithm(s) you've choosen (provided that you didn't used those definded in the BFA.INI) (s)he has to try out all combinations in addition to all possible passwords. BFA7 now offers 5 different algorithms now, so there are 25 combinations (20 for mixed and 5 for single encryption). The file information block (FIB) is placed direcly after the header and has the following form : TCryptfileInfoBlock = record Attribute : Byte; DateTime : Longint; FileNameLen : Byte; Dummy : array[1..2] of Byte; end; 'Attribute' contains the original attribute of the encrypted file, which will be restored after decryption. 'DateTime' is the original date+time stamp of the encrypted file and also be restored after decryption. 'FileNameLen' stores the length of the filename appended to the FIB. If you forced BFA7 not to store the original filename then the value is always 0. 'Dummy' aligns the size of TCryptfileInfoBlock to the next 64bit (= 8 bytes) border, which you can check easily : size of 'Attribute' 1 byte + size of 'DateTime' 4 bytes + size of 'FileNameLen' 1 byte + size of 'Dummy' 2 bytes = 8 bytes = 64 bits This 8byte border wasn't choosen arbitrary, it depends on how the used algorithms (Blowfish, GOST, TDES and Cobra) work. They always take 8 bytes together in a block (that's why they called block ciphers) and encrypt them. If your data isn't adjustable to an 8byte border you will have to add some dummy bytes. These dummy bytes can be everything, but it's better to choose them via a random generator, as BFA7 does. The FIB is encrypted, as already mentioned. So nobody who doesn't know the right password/key will be able to get the original file's attribute, date+time stamp and the length of the filename. If you decide to store the original filename in the cryptfile then the filename's bytes will be adjusted with random bytes to the next 64bit border and after that be encrypted. E.g. with the filename "test1.txt" BFA7 has to encrypt 9 bytes . First it must add (2*8)-9 = 7 random bytes, then it encrypts these two 8byte blocks and appends them after the FIB. Now it's obvious why we have to store the original filename length into the FIB. The encrypted file data follows right after the FIB and the optional stored filename. For easier handling and because BFA7 offers additional data compression the original data is splitted, encrypted and compressed in so called "chunks". A chunk is a block with a small header and an appended data area. The header is defined by this way : TChunkHeader = record Flags : Byte; ChunkLen : Word; OrigBytes : Word; end; It's not necessary for BFA7 to encrypt a chunk's header, because it doesn't contain any private data (this also saves 3 bytes per chunk since the header musn't be aligned from 4 to 8 bytes). 'Flags' defines a bit mask to describe the header's characteristics. The following bits are valid, all others are undefined and always set to zero : 0 0 0 0 0 x x x | | | | | +--- 1 = last, 0 = normal chunk | +----- 1 = compressed, 0 = uncompress +------- 1 = 24bit, 0 = 16bit chunk The first bit is set if no chunk will follow the actual one, so the decryption routine can prepare itself to read out the waiting tailer. The second bit is set if the chunk contains compressed data, if so the decryption routine has to pass the decrypted data through a decompressor to get the original bytes. Although the third bit is always set to zero in BFA7 it already has been defined. The Windows(tm) 95 version will, thanks to the 32bit technology, be able to handle chunks up to the size of 16 megabytes, so this flag will be set then. Unfortunately 16bit DOS based BFA7 won't be able to decrypt, but will recognize such extended chunks (and refuse to handle them, instead of wondering about following strange data). 'ChunkLen' is a 16bit value containing the size of the following (unadjusted) data. After BFA7 has read out the chunk header this value will be round up to the next 8 byte border, telling the program how much bytes to load and decrypt. 'OrigBytes' contains the number of the original (uncompressed) bytes that were stored in the chunk. If the bytes were not compressed it's easy to see that the nuber in 'ChunkLen' is equal to that in 'OrigBytes'. To make the whole chunking process more clearly let's try an example : your original file is 80001 bytes large. BFA7 gets the maximum number of possible bytes (65536-4096=61440). Let's assume these bytes are compressed to about 45223 bytes, then the put out chunk will look like this : ------------------------------------- header : Flags = 0 0 0 0 0 0 1 0 ChunkLen = 45223 OrigBytes = 61440 ------------------------------------- data : 45223+1 bytes (45224 / 8 = 0) ------------------------------------- Now you've 80001-61440=18561 bytes left. Let's assume that this rest isn't compressible. BFA7 tries to compress, but the compressed result will be larger that the original. It's the last chunk, so the it have look like this now: ------------------------------------- header : Flags = 0 0 0 0 0 0 0 1 ChunkLen = 18561 OrigBytes = 18561 ------------------------------------- data : 18561+7 bytes (18568 / 8 = 0) ------------------------------------- You see that you can't say exactly that a cryptfile of BFA7 is compressed or not. It CAN contain compressed chunks. There's only one exception : if you use the /cf option the program will even stored compressed data that is larger than the original. But BFA7 knows nothing about you and this option until it has read out the chunk header and interpreted it. After the final chunk has been added the cryptfile is closed with the tailer structure : TCryptfileTailer = record OrigFileLen : Longint; FileChecksum : Longint; end; The tailer is also encrypted, but doesn't need any additional bytes to align, because it already fits into a single 64bit element. 'OrigFileLen' contains the number of original bytes. Please remember that cryptfiles are read and written in a linear way. Because of that the original file length can only be stored after all bytes have been read (, compressed) and encrypted. 'FileChecksum' is a CRC32 checksum of the file's original bytes. It's mainly there for detecting data corruption, e.g. bad sectors on floppy disks or undetected transmission errors, but it's also good for the authentication of a file. Because the CRC32 is encrypted nobody can add or remove some data from your file without the knowledge of your password. Algorithms ~~~~~~~~~~ BFA7 offers the user 5 different algorithms. All algorithms are 64bit block chipers and used in CBC (Cipher Block Chaining) mode. Blowfish - The algorithm was designed by Bruce Schneier (schneier@counterpane.com) and gave BFA7 its name. Blowfish is a very fast algorithm, performing excellent on modern 32bit processors. Another advantage is its variable key size up to 448 bits (56 bytes). It was first published in DDJ 4/94. After a year of intensive cryptanalyse it was still unbroken. Blowfish32 - Standard Blowfish encryption works with 16 rounds, that means that every 64bit block will pass the encryption function 16 times. Blowfish is easy to extend to make more or less rounds. Blowfish32 now is just Blowfish with 32 instead of 16 rounds. That means that the data will be twice more irrecognizable as with Blowfish16, but the encryption and decryption will take twice more time, too. Although Blowfish32 might be "overkill" it's a good additional feature. GOST - An algorithm descending from the former Sovjet Union. It's like the russian counterpart to the DES algorithm. Although it has been used for a long time there are no known weaknesses. The only strange part is a short table of fixed data (the so called substitution boxes, s-boxes). These data is exchangable and might have influence on the algorithms security. Now you can speculate that some institutions might have had "better" s-boxes and some "minor" institutions s-boxes easier to crack, but nobody knows. The s-boxes used in BFA7 are choosen randomly so there's no built-in weakness. GOST isn't as fast as Blowfish, but it encrypts data in 32 rounds as standard (however the encryption function is simpler than the one of Blowfish). GOST's maximum key length is 32 bytes. TDES - DES is the standard encryption algorithm, designed by IBM in the middle 70es and very often used all over the world. Although it has been cryptanalysed for more than 20 years now nobody has found a real weakness. The only problem of DES is its short key length : 7 bytes. If you have some very fast computers you can try out all possible passwords within a few hours ("very fast" means machines which will cost about 1 million dollars). There are some DES variants, extending the original algorithm to a new one which has a larger key. The most popular one is triple-DES, where the 64bit data block will be encrypted almost 3 times with DES, using 3 different keys. Originally I didn't want to implement any DES variant, because DES is rather slow (to say nothing of triple-DES), but a lot of people trust this proved algorithm. To speed things up triple-DES was modified in an uncritical way. Before and after the original encryption a 64bit block should be permutated (refering to the DES reference papers). Originally DES was implemented in hardware, not in software, and these permutations are good for loading the block's bytes into the hardware registers. But they don't affect DES's security - so it's the programmers decision to implement the permutation stage or not. BFA7's implementation doesn't use these permutations to obtain more speed. I called this "reduced" algorithm TDES instead of triple-DES, because it's not the original algorithm, of course. The key length is 21 bytes. For the experts : TDES gets a block with 8 bytes (with no MSB conversion), encrypts it with the byte 0..6 of the key, decrypts it with byte 7..13 of the key and encrypts it with byte 17..20 of the key. Cobra - This is a new algorithm, designed by Christian Schneider (100542.2132@compuserve.com). It was published in the newsgroup sci.crypt.research in the middle of April 1996. Normally it's silly to implement an unproved algorithm, even Blowfish demands for more analyse before you can call it really save. But Cobra wasn't designed from the scratch. You can describe it rather as a mutation of Blowfish using some interesting extending techniques. A real difference is its key size : 32 bytes (64bit Cobra) instead of 56 bytes (Blowfish). Cobra was originally designed to be a 128bit block chipher with 24 encryption rounds. Due to it's open architecture it can be reduced or extended for larger or smaller block handling. BFA7 uses Cobra in a 64bit block form with 16 rounds. For more information about Cobra please contact Chris via email. If you combine algorithms (mixed encryption), then BFA7 will work in the following manner : The 64bit block will first be encrypted with the algorithm #1. The encrypted block will then be encrypted a second time with algorithm #2. Decryption will do the whole procedure in reverse : first decrypt it with algorithm #2, then with algorithm #1. So two different algorithms will be melted together. CBC vector updating is done the same way as in single encryption mode. Weak keys ~~~~~~~~~ Weak keys are such passwords for which the encryption algorithm doesn't produce a secure encrypted output. Only Blowfish and TDES have known weak keys, but they will be correctly identified by BFA7. You can test this by giving BFA7 the numerical password 0,0,0,0 for TDES. The program will then pop up a warning message that a weak key was detected and give you the chance to quit the program. Key setup ~~~~~~~~~ As you might already have recognized every algorithm has a different key size. BFA7 allows you to use a password/key variable in its size from 3 to the maximum number of possible characters. If you don't enter a password with the maximum length then it'll be extended by just repeating the bytes. If you choose e.g. "hello" as your password and TDES as the encryption algorithm, the password will be extended to "hellohellohellohellohelloh" (21 characters). Of course there are other possibilities to extend passwords, with hash functions like MD5, for example. The problems with hash functions is that you have to trust them to crunch or expand the original key without additional weaknesses and that most of them put out a fixed key (MD5 e.g. 16 bytes) which has to be expanded by other (sometimes risky) techniques. But the simple repeating method (SRM) used by BFA7 allows a very direct input of passwords/keys, especially in binary mode (numercial input). The only disadvantage of BFA7's SRM key setup is that you have to take care of the password you choose. Let's assume you want to use "bonbon". 6 characters, 308.915.776 (26^6) possibilities, right? Wrong! If you only would have typed "bon" it would have been the same because of the repetition technique. So the real number of possibilities with "bonbon" is 17.576 (26^3) - your password is too weak! An attacker doesn't know how long your password is, of course. But (s)he will try the lower sized passwords first. To avoid insecure encryption BFA7 checks for such password "clones" and will warn you if you've entered something weak like "bonbon". Random generator ~~~~~~~~~~~~~~~~ The random generator BFA7 uses is a pseudo-random-sequence generator using linear feedback shift registers. Although the program needs random bytes only to fill unaligned 8byte (64bit) blocks and for generating the CBC initalisation vectors this method is much saver than the conventional built-in linear congruential generators of the programming languages. Data compression ~~~~~~~~~~~~~~~~ BFA7 uses the LZH (LZ77 with adaptive huffman coding) algorithm to compress data. The original LZ77 algorithm was coded by Haruhiko Okumura, the adaptive Huffman coding by Haruyasu Yoshizaki (developer of the LHA compression program). The translation into a Turbo Pascal unit was done by Douglas Webb. The compression speed is really slow in BFA7. This depends on the pure Pascal implementation. Future version will have an assembly engine to speed up the compression process. -end-