Reverse-Engineering BSA From Scratch
Foreword #
I’ve been building a little tool for assisting with the reverse-engineering of compressed archive files, primarily for game hacking: Refriender
This blog post is a stream-of-consciousness look at how I actually went about reverse-engineering a file format (the BSA archives used in Skyrim Special Edition, specifically, though it’s used in lots of Bethesda titles). I make mistakes, I make bad assumptions, and I don’t complete the entire reverse-engineering process, though I do cover the majority of it! This format is well-documented (by modders), but I intentionally didn’t look at any of that until after I completed this. This is my honest, raw approach to analyzing the file I picked.
Process #
Pick the smallest BSA file (‘Skyrim - Textures8.bsa’ from Skyrim Special Edition, md5sum 3ca6cbfcfca7b41f3939352d4cb27717 if you want to follow along! Also, a hexdump of the first 0xf20 bytes is available here: https://gist.github.com/daeken/f564026303d9a8f5504ebf222a8989e8) and run Refriender with -v
.
Searching for Deflate blocks
Found 17546 possible starting positions
Removing overlapping blocks
Found 14598 non-overlapping blocks
Searching for Zlib blocks
Found 0 possible starting positions
Removing overlapping blocks
Found 0 non-overlapping blocks
Searching for Gzip blocks
Found 0 possible starting positions
Removing overlapping blocks
Found 0 non-overlapping blocks
Searching for Bzip2 blocks
Found 0 possible starting positions
Removing overlapping blocks
Found 0 non-overlapping blocks
Searching for Lzw blocks
Found 0 possible starting positions
Removing overlapping blocks
Found 0 non-overlapping blocks
Searching for Lz4Raw blocks
Found 0 possible starting positions
Removing overlapping blocks
Found 0 non-overlapping blocks
Searching for Lz4Frame blocks
Found 94 possible starting positions
Removing overlapping blocks
Found 80 non-overlapping blocks
So we have two possibilities: Deflate or Lz4Frame. Given that Lz4Frame starts with a magic number, this is most likely to be the right algorithm. In fact, if we look at some of the blocks that were found, we see a ton of small deflate blocks (likely noise, though it’s possible some are real) and a few large Lz4 blocks:
[deflate] 0x77B66 - 0x77B6D (compressed length 0x7, decompressed length 0xB5)
[deflate] 0x794D8 - 0x794ED (compressed length 0x15, decompressed length 0xA2)
[deflate] 0x7990C - 0x7991F (compressed length 0x13, decompressed length 0xF1)
[lz4frame] 0x7A985 - 0x27A985 (compressed length 0x200000, decompressed length 0x200000)
[deflate] 0x7AA09 - 0x7AA17 (compressed length 0xE, decompressed length 0x8C)
[deflate] 0x7AA8C - 0x7AA97 (compressed length 0xB, decompressed length 0x101)
[deflate] 0x7ABDC - 0x7ABE8 (compressed length 0xC, decompressed length 0x80)
[deflate] 0x7B105 - 0x7B11C (compressed length 0x17, decompressed length 0x91)
[deflate] 0x7B1ED - 0x7B203 (compressed length 0x16, decompressed length 0x11A)
Given that, let’s narrow in on LZ4. What does Refriender say the blocks actually are? -a lz4frame -i
:
[lz4frame] 0xF12 - 0x7A954 (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0x7A985 - 0x24C511 (decompressed length 0x5555E0): Microsoft DirectDraw Surface (DDS): 2048 x 2048, compressed using DXT5
[lz4frame] 0x24C544 - 0x350FFC (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0x35102F - 0x5CFD51 (decompressed length 0x5555E0): Microsoft DirectDraw Surface (DDS): 2048 x 2048, compressed using DXT5
[lz4frame] 0x5CFD7A - 0x5F89A3 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x5F89CD - 0x5F93E6 (decompressed length 0x55E0): Microsoft DirectDraw Surface (DDS): 128 x 128, compressed using DXT5
[lz4frame] 0x5F9410 - 0x6E53FC (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0x6E5424 - 0x79AD74 (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0x79AD9F - 0x7A3F5B (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0x7A3F85 - 0x816824 (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0x81685B - 0x862CC7 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x862CFB - 0x87E911 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x87E944 - 0x87E99F (decompressed length 0x5E0): Microsoft DirectDraw Surface (DDS): 32 x 32, compressed using DXT5
[lz4frame] 0x87E9D6 - 0x8BE9D6 (decompressed length 0x2F000): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x8A0918 - 0x9882CD (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0x988306 - 0xA6EC74 (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0xA6ECAD - 0xB4FC0D (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0xB4FC46 - 0xC38423 (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0xC3845C - 0xD17747 (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0xD1777A - 0xD1DD1C (decompressed length 0x2AB30): Microsoft DirectDraw Surface (DDS): 512 x 256, compressed using DXT5
[lz4frame] 0xD1DD4C - 0xD26E15 (decompressed length 0x2AB30): Microsoft DirectDraw Surface (DDS): 256 x 512, compressed using DXT5
[lz4frame] 0xD26E4A - 0xD427EF (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0xD42828 - 0xDE819A (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0xDE81CC - 0xDFB0BF (decompressed length 0x2AB30): Microsoft DirectDraw Surface (DDS): 256 x 512, compressed using DXT5
[lz4frame] 0xDFB0F6 - 0xE1E9C9 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0xE1EA00 - 0xE41FDA (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0xE42011 - 0xE65A6F (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0xE65AA6 - 0xE89A17 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0xE89A4E - 0xEACE1A (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0xEACE65 - 0x107CC42 (decompressed length 0x5555E0): Microsoft DirectDraw Surface (DDS): 2048 x 2048, compressed using DXT5
[lz4frame] 0x107CC8D - 0x1276129 (decompressed length 0x5555E0): Microsoft DirectDraw Surface (DDS): 2048 x 2048, compressed using DXT5
[lz4frame] 0x1276162 - 0x1276977 (decompressed length 0x15E0): Microsoft DirectDraw Surface (DDS): 64 x 64, compressed using DXT5
[lz4frame] 0x12769B1 - 0x1277215 (decompressed length 0x15E0): Microsoft DirectDraw Surface (DDS): 64 x 64, compressed using DXT5
[lz4frame] 0x127724C - 0x1277D1A (decompressed length 0x15E0): Microsoft DirectDraw Surface (DDS): 64 x 64, compressed using DXT5
[lz4frame] 0x1277D51 - 0x129C135 (decompressed length 0x2AB30): Microsoft DirectDraw Surface (DDS): 512 x 256, compressed using DXT5
[lz4frame] 0x129C16C - 0x12A4442 (decompressed length 0xAB30): Microsoft DirectDraw Surface (DDS): 256 x 128, compressed using DXT5
[lz4frame] 0x12A4477 - 0x12A9D2A (decompressed length 0xAB30): Microsoft DirectDraw Surface (DDS): 256 x 128, compressed using DXT5
[lz4frame] 0x12A9D62 - 0x12B600E (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x12B6043 - 0x12C61D9 (decompressed length 0x155D8): Microsoft DirectDraw Surface (DDS): 512 x 256, compressed using DXT1
[lz4frame] 0x12C6216 - 0x12E69F8 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x12E6A37 - 0x12F8D90 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x12F8DC6 - 0x1300A6B (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x1300A97 - 0x1358136 (decompressed length 0xAAB30): Microsoft DirectDraw Surface (DDS): 512 x 1024, compressed using DXT5
[lz4frame] 0x1358161 - 0x1361ECB (decompressed length 0x2AB30): Microsoft DirectDraw Surface (DDS): 512 x 256, compressed using DXT5
[lz4frame] 0x1361EF6 - 0x1382839 (decompressed length 0x2AB30): Microsoft DirectDraw Surface (DDS): 256 x 512, compressed using DXT5
[lz4frame] 0x1382866 - 0x139CF5F (decompressed length 0x2AB30): Microsoft DirectDraw Surface (DDS): 512 x 256, compressed using DXT5
[lz4frame] 0x139CF8D - 0x1440634 (decompressed length 0xAAB30): Microsoft DirectDraw Surface (DDS): 512 x 1024, compressed using DXT5
[lz4frame] 0x144065D - 0x1451991 (decompressed length 0x155D8): Microsoft DirectDraw Surface (DDS): 256 x 512, compressed using DXT1
[lz4frame] 0x14519BC - 0x147FC18 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x147FC43 - 0x14D3ED3 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x14D3F0B - 0x14DAE08 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x14DAE3B - 0x14EE135 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x14EE169 - 0x151686A (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x15168A3 - 0x1528246 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x152827A - 0x15443B3 (decompressed length 0x2AB30): Microsoft DirectDraw Surface (DDS): 512 x 256, compressed using DXT5
[lz4frame] 0x15443E6 - 0x1555E81 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x1555EB5 - 0x155AF92 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x155AFBF - 0x158C054 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x158C086 - 0x159394D (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x1593980 - 0x15A4CD8 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x15A4D09 - 0x15AC19E (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x15AC1CF - 0x15B335F (decompressed length 0xAB30): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT1
[lz4frame] 0x15B3394 - 0x15C3F49 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x15C3F78 - 0x15C696F (decompressed length 0x55E0): Microsoft DirectDraw Surface (DDS): 128 x 128, compressed using DXT5
[lz4frame] 0x15C69A1 - 0x15CF2D7 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x15CF308 - 0x15D4695 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x15D46CC - 0x15DEBEF (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x15DEC27 - 0x15F80C2 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x15F80F9 - 0x15FEA6E (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x15FEA9F - 0x1612B0F (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x1612B47 - 0x1632E90 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x1632EC4 - 0x163DF09 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x163DF39 - 0x168D52E (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x168D562 - 0x1695381 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x16953B1 - 0x169FF20 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x169FF52 - 0x16B55DB (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x16B560C - 0x16BC7DD (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x16BC811 - 0x16CEBCA (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x16CEBFD - 0x16D66DF (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x16D6710 - 0x16E6F4E (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x16E6F7D - 0x16F6625 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x16F665A - 0x16FC162 (decompressed length 0xAB30): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT1
[lz4frame] 0x16FC198 - 0x170C766 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x170C798 - 0x1720C83 (decompressed length 0x2AB30): Microsoft DirectDraw Surface (DDS): 512 x 256, compressed using DXT5
[lz4frame] 0x1720CB5 - 0x1726BD1 (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x1726BFE - 0x172D56E (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x172D5A0 - 0x173286B (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x1732899 - 0x175A977 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x175A9A6 - 0x1762209 (decompressed length 0xAB30): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT1
[lz4frame] 0x176223B - 0x1770FA8 (decompressed length 0x2AB30): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT1
[lz4frame] 0x1770FDD - 0x177544A (decompressed length 0x155E0): Microsoft DirectDraw Surface (DDS): 256 x 256, compressed using DXT5
[lz4frame] 0x177547E - 0x179C111 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x179C14A - 0x17C296E (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
[lz4frame] 0x17C29A7 - 0x1814306 (decompressed length 0x555E0): Microsoft DirectDraw Surface (DDS): 512 x 512, compressed using DXT5
We have textures! If we only wanted to unpack these, we could throw -a lz4frame -e some_directory
in and boom, we have all the textures unpacked. But let’s take a step back and figure out more about this file format.
Search for pointers to the blocks, assuming a 0-32 byte header. -a lz4frame -f 0-32
:
Block 0xF12 has pointers from: 0x57ADBC
Pointers with offset 0: 1
Finding pointers to 1 bytes before the blocks
Block 0xF12 (- 1 == 0xF11) has pointers from: 0x5AAFD3
Pointers with offset 1: 1
Finding pointers to 2 bytes before the blocks
Pointers with offset 2: 0
Finding pointers to 3 bytes before the blocks
Pointers with offset 3: 0
Finding pointers to 4 bytes before the blocks
Pointers with offset 4: 0
Finding pointers to 5 bytes before the blocks
Block 0x8A0918 (- 5 == 0x8A0913) has pointers from: 0x57C052
Pointers with offset 5: 1
Finding pointers to 6 bytes before the blocks
Pointers with offset 6: 0
Finding pointers to 7 bytes before the blocks
Pointers with offset 7: 0
Finding pointers to 8 bytes before the blocks
Block 0xF12 (- 8 == 0xF0A) has pointers from: 0x5B5DA2
Pointers with offset 8: 1
Finding pointers to 9 bytes before the blocks
Pointers with offset 9: 0
Finding pointers to 10 bytes before the blocks
Block 0xF12 (- 10 == 0xF08) has pointers from: 0x56EFB7, 0x5B945F
Pointers with offset 10: 2
Finding pointers to 11 bytes before the blocks
Pointers with offset 11: 0
Finding pointers to 12 bytes before the blocks
Pointers with offset 12: 0
Finding pointers to 13 bytes before the blocks
Pointers with offset 13: 0
Finding pointers to 14 bytes before the blocks
Pointers with offset 14: 0
Finding pointers to 15 bytes before the blocks
Block 0x35102F (- 15 == 0x351020) has pointers from: 0x59EC15
Pointers with offset 15: 1
Finding pointers to 16 bytes before the blocks
Block 0xF12 (- 16 == 0xF02) has pointers from: 0x57243C
Pointers with offset 16: 1
Finding pointers to 17 bytes before the blocks
Pointers with offset 17: 0
Finding pointers to 18 bytes before the blocks
Block 0xF12 (- 18 == 0xF00) has pointers from: 0x5C49F2
Pointers with offset 18: 1
Finding pointers to 19 bytes before the blocks
Pointers with offset 19: 0
Finding pointers to 20 bytes before the blocks
Pointers with offset 20: 0
Finding pointers to 21 bytes before the blocks
Pointers with offset 21: 0
Finding pointers to 22 bytes before the blocks
Pointers with offset 22: 0
Finding pointers to 23 bytes before the blocks
Pointers with offset 23: 0
Finding pointers to 24 bytes before the blocks
Pointers with offset 24: 0
Finding pointers to 25 bytes before the blocks
Pointers with offset 25: 0
Finding pointers to 26 bytes before the blocks
Pointers with offset 26: 0
Finding pointers to 27 bytes before the blocks
Pointers with offset 27: 0
Finding pointers to 28 bytes before the blocks
Pointers with offset 28: 0
Finding pointers to 29 bytes before the blocks
Pointers with offset 29: 0
Finding pointers to 30 bytes before the blocks
Pointers with offset 30: 0
Finding pointers to 31 bytes before the blocks
Pointers with offset 31: 0
Finding pointers to 32 bytes before the blocks
Pointers with offset 32: 0
Well, that looks like a bust. Some blocks seem to have pointers, but they’re iffy at best; probably just coincidental. What we’d ideally be looking for is one offset value that has pointers to each block, indicating some kind of directory. Instead, let’s look at this from a different angle.
[lz4frame] 0xF12 - 0x7A954 (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
[lz4frame] 0x7A985 - 0x24C511 (decompressed length 0x5555E0): Microsoft DirectDraw Surface (DDS): 2048 x 2048, compressed using DXT5
[lz4frame] 0x24C544 - 0x350FFC (decompressed length 0x1555E0): Microsoft DirectDraw Surface (DDS): 1024 x 1024, compressed using DXT5
There are 0x31 bytes between the first two blocks and 0x33 between the next two. What are they?
0007a950 xx xx xx xx 2c 74 65 78 74 75 72 65 73 5c 5f 62 |....,textures\_b|
0007a960 79 6f 68 5c 66 75 72 6e 69 74 75 72 65 5c 64 72 |yoh\furniture\dr|
0007a970 61 66 74 69 6e 67 74 61 62 6c 65 30 31 2e 64 64 |aftingtable01.dd|
0007a980 73 e0 55 55 00 |s.UU. |
0024c510 xx 2e 74 65 78 74 75 72 65 73 5c 5f 62 79 6f 68 |..textures\_byoh|
0024c520 5c 66 75 72 6e 69 74 75 72 65 5c 63 72 61 66 74 |\furniture\craft|
0024c530 69 6e 67 74 61 62 6c 65 30 31 5f 6e 2e 64 64 73 |ingtable01_n.dds|
0024c540 e0 55 15 00 |.U.. |
Those are filenames for sure! And the 0x2c/0x2e before it seems to be the length of the filename. Right after the filename, in the first block we have e0 55 55 00
, which is 0x5555e0 in little endian. Concidentally, that’s the length of the decompressed block! The same holds true for the second example here, so I think we’re in a good place. We now completely understand how the archive packs files.
However, we still don’t know how to actually find these files. We do know that they have variable-length headers and that blocks are laid out one after the other in the file, so let’s find pointers to just after the end of blocks. -F 0
Finding pointers to 0 bytes after the blocks
Block 0x24C544-0x350FFC has end pointers from: 0x182
Block 0x5CFD7A-0x5F89A3 has end pointers from: 0x1A2
Block 0x5F89CD-0x5F93E6 has end pointers from: 0x1B2
Block 0x6E5424-0x79AD74 has end pointers from: 0x1D2
Block 0x79AD9F-0x7A3F5B has end pointers from: 0x1F9
Block 0x7A3F85-0x816824 has end pointers from: 0x22B
Block 0x81685B-0x862CC7 has end pointers from: 0x23B
Block 0x862CFB-0x87E911 has end pointers from: 0x24B
Block 0x87E944-0x87E99F has end pointers from: 0x25B
Block 0x8A0918-0x9882CD has end pointers from: 0x27B
Block 0xA6ECAD-0xB4FC0D has end pointers from: 0x29B
Block 0xB4FC46-0xC38423 has end pointers from: 0x2AB
Block 0xD42828-0xDE819A has end pointers from: 0x2FB
Block 0xDE81CC-0xDFB0BF has end pointers from: 0x30B
Block 0xE1EA00-0xE41FDA has end pointers from: 0x32B
Block 0xE42011-0xE65A6F has end pointers from: 0x33B
Block 0xE89A4E-0xEACE1A has end pointers from: 0x38D
Block 0xEACE65-0x107CC42 has end pointers from: 0x39D
Block 0x1277D51-0x129C135 has end pointers from: 0x438
Block 0x129C16C-0x12A4442 has end pointers from: 0x448
Block 0x12A9D62-0x12B600E has end pointers from: 0x468
Block 0x12B6043-0x12C61D9 has end pointers from: 0x478
Block 0x1361EF6-0x1382839 has end pointers from: 0x4F0
Block 0x139CF8D-0x1440634 has end pointers from: 0x510
Block 0x144065D-0x1451991 has end pointers from: 0x53D
Block 0x14519BC-0x147FC18 has end pointers from: 0x54D
Block 0x147FC43-0x14D3ED3 has end pointers from: 0x57A
Block 0x14D3F0B-0x14DAE08 has end pointers from: 0x58A
Block 0x14EE169-0x151686A has end pointers from: 0x5AA
Block 0x15168A3-0x1528246 has end pointers from: 0x5BA
Block 0x152827A-0x15443B3 has end pointers from: 0x5CA
Block 0x15443E6-0x1555E81 has end pointers from: 0x5DA
Block 0x1555EB5-0x155AF92 has end pointers from: 0x5EA
Block 0x1593980-0x15A4CD8 has end pointers from: 0x61A
Block 0x15AC1CF-0x15B335F has end pointers from: 0x63A
Block 0x15B3394-0x15C3F49 has end pointers from: 0x64A
Block 0x15C69A1-0x15CF2D7 has end pointers from: 0x66A
Block 0x15CF308-0x15D4695 has end pointers from: 0x67A
Block 0x15DEC27-0x15F80C2 has end pointers from: 0x69A
Block 0x15FEA9F-0x1612B0F has end pointers from: 0x6BA
Block 0x163DF39-0x168D52E has end pointers from: 0x6EA
Block 0x168D562-0x1695381 has end pointers from: 0x6FA
Block 0x169FF52-0x16B55DB has end pointers from: 0x71A
Block 0x16BC811-0x16CEBCA has end pointers from: 0x73A
Block 0x16D6710-0x16E6F4E has end pointers from: 0x75A
Block 0x16E6F7D-0x16F6625 has end pointers from: 0x76A
Block 0x16F665A-0x16FC162 has end pointers from: 0x77A
Block 0x170C798-0x1720C83 has end pointers from: 0x79A
Block 0x1726BFE-0x172D56E has end pointers from: 0x7BA
Block 0x1732899-0x175A977 has end pointers from: 0x7DA
Block 0x176223B-0x1770FA8 has end pointers from: 0x817
Block 0x177547E-0x179C111 has end pointers from: 0x85B
End pointers with offset 0: 52
That sure as hell looks like some kind of offset table to me! That also explains at least some of the large gap between the beginning of the file and the first compressed block at 0xF12.
Let’s look at a small portion of this table.
00000180 27 00 fc 0f 35 00 b1 b0 07 70 bf b5 ed 5e 52 8c |'...5....p...^R.|
00000190 02 00 51 fd 5c 00 e7 df 08 6f 05 85 5a 6e 43 0a |..Q.\....o..ZnC.|
000001a0 00 00 a3 89 5f 00 ee df 08 6f 05 85 5a 6e 16 c0 |...._....o..Zn..|
000001b0 0e 00 e6 93 5f 00 b1 b0 06 6f 84 e8 5b c8 78 59 |...._....o..[.xY|
000001c0 0b 00 fc 53 6e 00 ee df 09 70 c0 64 e3 db e7 91 |...Sn....p.d....|
000001d0 00 00 74 ad 79 00 16 74 65 78 74 75 72 65 73 5c |..t.y..textures\|
000001e0 5f 62 79 6f 68 5c 70 6c 61 6e 74 73 00 b1 b0 0b |_byoh\plants....|
000001f0 68 9b 20 5d dd c9 28 07 00 5b 3f 7a 00 21 74 65 |h. ]..(..[?z.!te|
00000200 78 74 75 72 65 73 5c 5f 62 79 6f 68 5c 63 6c 75 |xtures\_byoh\clu|
00000210 74 74 65 72 5c 72 65 73 6f 75 72 63 65 73 00 ee |tter\resources..|
00000220 df 0d 63 a3 71 bc 59 a3 c4 04 00 24 68 81 00 b1 |..c.q.Y....$h...|
00000230 b0 0a 6f 77 55 d8 7b 4a bc 01 00 c7 2c 86 00 ec |..owU.{J....,...|
00000240 e2 09 68 38 c9 1d 80 8e 00 00 00 11 e9 87 00 b1 |..h8............|
00000250 b0 0d 73 53 fe 27 b0 40 1f 02 00 9f e9 87 00 ee |..sS.'.@........|
00000260 df 0f 63 c0 79 3e b5 ee 79 0e 00 df 08 8a 00 ee |..c.y>..y.......|
00000270 df 0f 63 c1 79 3e b5 a7 69 0e 00 cd 82 98 00 ee |..c.y>..i.......|
00000280 df 0f 63 c2 79 3e b5 99 0f 0e 00 74 ec a6 00 ee |..c.y>.....t....|
00000290 df 0f 63 c3 79 3e b5 16 88 0e 00 0d fc b4 00 ee |..c.y>..........|
000002a0 df 0f 63 c4 79 3e b5 24 f3 0d 00 23 84 c3 00 ee |..c.y>.$...#....|
000002b0 df 09 67 7b 78 85 c4 d5 65 00 00 47 77 d1 00 b1 |..g{x...e..Gw...|
000002c0 b0 06 68 30 7f f6 c4 f9 90 00 00 1c dd d1 00 b1 |..h0............|
000002d0 b0 0b 63 22 c4 2d e7 da b9 01 00 15 6e d2 00 ee |..c".-......n...|
000002e0 df 0f 73 54 a3 5a ed ab 59 0a 00 ef 27 d4 00 ee |..sT.Z..Y...'...|
000002f0 df 08 68 b1 85 bc ef 25 2f 01 00 9a 81 de 00 b1 |..h....%/.......|
00000300 b0 0d 63 bf 4a c3 f3 0a 39 02 00 bf b0 df 00 b2 |..c.J...9.......|
00000310 b0 0d 63 bf 4a c3 f3 11 36 02 00 c9 e9 e1 00 b3 |..c.J...6.......|
00000320 b0 0d 63 bf 4a c3 f3 95 3a 02 00 da 1f e4 00 b4 |..c.J...:.......|
00000330 b0 0d 63 bf 4a c3 f3 a8 3f 02 00 6f 5a e6 00 b5 |..c.J...?..oZ...|
00000340 b0 0d 63 bf 4a c3 f3 03 34 02 00 17 9a e8 00 31 |..c.J...4......1|
00000350 74 65 78 74 75 72 65 73 5c 5f 62 79 6f 68 5c 63 |textures\_byoh\c|
00000360 6c 6f 74 68 65 73 5c 63 68 69 6c 64 72 65 6e 63 |lothes\childrenc|
00000370 6c 6f 74 68 65 73 76 61 72 69 61 6e 74 73 5c 6d |lothesvariants\m|
Rather than a linear list of file entries, it appears that they’re separated by which directory the file is in. Let’s narrow in on blocks that are referenced from 0x220-0x350, since that appears to be a contiguous list:
Block 0x7A3F85-0x816824 has end pointers from: 0x22B
Block 0x81685B-0x862CC7 has end pointers from: 0x23B
Block 0x862CFB-0x87E911 has end pointers from: 0x24B
Block 0x87E944-0x87E99F has end pointers from: 0x25B
Block 0x8A0918-0x9882CD has end pointers from: 0x27B
Block 0xA6ECAD-0xB4FC0D has end pointers from: 0x29B
Block 0xB4FC46-0xC38423 has end pointers from: 0x2AB
Block 0xD42828-0xDE819A has end pointers from: 0x2FB
Block 0xDE81CC-0xDFB0BF has end pointers from: 0x30B
Block 0xE1EA00-0xE41FDA has end pointers from: 0x32B
Block 0xE42011-0xE65A6F has end pointers from: 0x33B
Each of these entries is 0x10 bytes. Let’s figure out what’s in there, aside from the file pointer we know we’re going to find. To do that, we’re going to look at two entries at once; the reason we’re doing that is because we don’t know whether the pointer is at the beginning or end (or middle) of an entry.
00000220 xx xx xx xx xx xx xx xx xx xx xx 24 68 81 00 b1 |..c.q.Y....$h...|
00000230 b0 0a 6f 77 55 d8 7b 4a bc 01 00 c7 2c 86 00 ec |..owU.{J....,...|
00000240 e2 09 68 38 c9 1d 80 8e 00 00 00 |..h8....... |
Decoding these (little endian) we get the following values:
0x00816824
0x6f0ab0b1
0x7bd85577
0x0001bc4a
0x00862cc7
0x6809e2ec
0x801dc938
0x0000008e
Well, what the hell are those? We know that the first and fifth entries in this are pointers to blocks, but the rest isn’t so obvious. Let’s look at the blocks that are pointed to.
[lz4frame] 0x79AD9F - 0x7A3F5B (compressed length 0x91BC, decompressed length 0x1555E0)
[lz4frame] 0x7A3F85 - 0x816824 (compressed length 0x7289F, decompressed length 0x1555E0)
[lz4frame] 0x81685B - 0x862CC7 (compressed length 0x4C46C, decompressed length 0x555E0)
[lz4frame] 0x862CFB - 0x87E911 (compressed length 0x1BC16, decompressed length 0x555E0)
The value that immediately catches my eye is 0x0001bc4a vs 0x1BC16. Given that we know we’re dealing with a variable header, this seems like a non-coincidence. Does this tell us the full length of the entry? Well, if we take the difference of 0x87E911 (the end of the block with that length) and 0x862CC7 (the previous one) we get … 0x1bc4a! And given that the value right after that length is the pointer to the end of the previous block, we now know these things:
0x00816824 <-- pointer to file
0x6f0ab0b1
0x7bd85577
0x0001bc4a <-- length of next file
0x00862cc7 <-- pointer to next file
0x6809e2ec
0x801dc938
0x0000008e <-- length of some third file
The other values in here are too different to be likely to be flags (we’d usually see a lot of similar bits) and don’t seem to correlate to the block in terms of lengths or anything, so my guess is one is a CRC32 or somesuch. Let’s look at the first directory’s entries and see if we can figure out the larger structure.
00000120 77 00 6f 00 9d 0e 00 00 00 00 00 00 19 74 65 78 |w.o..........tex|
00000130 74 75 72 65 73 5c 5f 62 79 6f 68 5c 66 75 72 6e |tures\_byoh\furn|
00000140 69 74 75 72 65 00 b1 b0 0f 63 8e 10 b2 01 73 9a |iture....c....s.|
00000150 07 00 e1 0e 00 00 b1 b0 0f 64 8e 10 b2 01 bd 1b |.........d......|
00000160 1d 00 54 a9 07 00 ee df 11 63 0f 48 09 14 eb 4a |..T......c.H...J|
00000170 10 00 11 c5 24 00 ee df 11 64 0f 48 09 14 55 ed |....$....d.H..U.|
00000180 27 00 fc 0f 35 00 b1 b0 07 70 bf b5 ed 5e 52 8c |'...5....p...^R.|
00000190 02 00 51 fd 5c 00 e7 df 08 6f 05 85 5a 6e 43 0a |..Q.\....o..ZnC.|
000001a0 00 00 a3 89 5f 00 ee df 08 6f 05 85 5a 6e 16 c0 |...._....o..Zn..|
000001b0 0e 00 e6 93 5f 00 b1 b0 06 6f 84 e8 5b c8 78 59 |...._....o..[.xY|
000001c0 0b 00 fc 53 6e 00 ee df 09 70 c0 64 e3 db e7 91 |...Sn....p.d....|
000001d0 00 00 74 ad 79 00 16 74 65 78 74 75 72 65 73 5c |..t.y..textures\|
000001e0 5f 62 79 6f 68 5c 70 6c 61 6e 74 73 00 b1 b0 0b |_byoh\plants....|
Okay, so the byte at 0x12C is 0x19, which is the length of the filename (but this time, seemingly including a null byte? Weird that our earlier file entries didn’t have that) and that means the contents of the directory must start at 0x146. Additionally, the distance to the path (and length) of the next directory is 0x90 – that’s nicely divisible by our 0x10 entry size! We’re on the right path. So what are the values in that 0x90 bytes?
0x630fb0b1
0x1b2108e
0x79a73
0xee1
0x640fb0b1
0x1b2108e
0x1d1bbd
0x7a954
0x6311dfee
0x1409480f
0x104aeb
0x24c511
0x6411dfee
0x1409480f
0x27ed55
0x350ffc
0x7007b0b1
0x5eedb5bf
0x28c52
0x5cfd51
0x6f08dfe7
0x6e5a8505
0xa43
0x5f89a3
0x6f08dfee
0x6e5a8505
0xec016
0x5f93e6
0x6f06b0b1
0xc85be884
0xb5978
0x6e53fc
0x7009dfee
0xdbe364c0
0x91e7
0x79ad74
This data is much more ‘patternful’ than the little snippet we looked at before. Looking at the contents, I think it’s fair to say that my CRC32 theory is shot. My guess is now a combination of flags and fields smaller than 32-bit. Anyway, let’s figure out where our pointer and length are in this. The value 0x7a954 sticks out to me, because I know I saw a block near there earlier:
[lz4frame] 0x7A985 - 0x24C511 (compressed length 0x1D1B8C, decompressed length 0x5555E0)
And double-checking, adding the length from that table (0x1d1bbd) to the pointer just after it (0x7a954) gives us a value of 0x24c511. So the directories have a structure that’s like this:
struct Directory {
uint8 filenameLen;
char[filenameLen] filename;
DirectoryEntry[unknownLen] entries;
}
struct DirectoryEntry {
uint32 unknown, unknown2
uint32 blockLen, blockPointer;
}
But how does it know how many entries are in the directory? I’ll leave that as homework for you. I hope this gave you some insight into the process I go through to reverse-engineer a file format like this, and if you’re curious to check my work you can find a reference for the file format here: https://en.uesp.net/wiki/Skyrim_Mod:Archive_File_Format
Happy hacking,
- Sera Tonin Brocious (Daeken)