Corrupted Images with a twist
The problem
I grabbed my personal old backup hard disk and check what was actually on it, since I couldn’t remember (start to label!).
While browsing the files a little I noticed that none of the images work anymore. No application recognized them as valid JPEG/GIF/whatever. Other files such as text and music worked fine though. The standard image viewer of Cinnamon gave me an additional hint; it said: “Not a JPEG file: starts with 0x03 0x00”
So I took a look at the binary form of the file with hexdump -C 1cup.jpg
and saw this:
00000000 03 00 00 00 02 00 00 00 ac 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 01 00 04 80 14 00 00 00 30 00 00 00 |............0...|
00000020 00 00 00 00 4c 00 00 00 01 05 00 00 00 00 00 05 |....L...........|
00000030 15 00 00 00 f6 3d 3e a7 87 82 14 d6 fa d4 89 69 |.....=>........i|
00000040 e9 03 00 00 01 05 00 00 00 00 00 05 15 00 00 00 |................|
00000050 f6 3d 3e a7 87 82 14 d6 fa d4 89 69 e9 03 00 00 |.=>........i....|
00000060 02 00 60 00 04 00 00 00 00 00 18 00 ff 01 1f 00 |..`.............|
00000070 01 02 00 00 00 00 00 05 20 00 00 00 20 02 00 00 |........ ... ...|
00000080 00 00 14 00 ff 01 1f 00 01 01 00 00 00 00 00 05 |................|
00000090 12 00 00 00 00 00 14 00 bf 01 13 00 01 01 00 00 |................|
000000a0 00 00 00 05 0b 00 00 00 00 00 18 00 a9 00 12 00 |................|
000000b0 01 02 00 00 00 00 00 05 20 00 00 00 21 02 00 00 |........ ...!...|
000000c0 01 00 00 00 00 00 00 00 50 b6 00 00 00 00 00 00 |........P.......|
000000d0 00 00 00 00 ff d8 ff e0 00 10 4a 46 49 46 00 01 |..........JFIF..|
000000e0 01 01 00 60 00 60 00 00 ff db 00 43 00 06 04 04 |...`.`.....C....|
…
Wow, so the file has some weird header but looks good starting at offset 0x000000d4
according to Wikipedia’s list of file signatures for the entry of JPEGs.
I am not entirely sure where that header came from. Either some export/backup went wrong or there was some virus on my Windows machine back in the day which added a small something to each file. If you know what’s going on, drop me a message via one of the channels to the left (preferrably Twitter).
The solution
In order to see the actual image now, I simply cat
ted the file with an offset to get rid of the prefix.
This could be with Linux with easy by using the command tail -c +213 1cup.jpg > 1cup_recovered.jpg
. The 213 is the hexadecimal offset of d4 in decimal +1 (see Post-fence-error, typical problem in basic programming etc).
Voilà! The image could now be opened and I could finally look at the medieval joke about that weird video we all know (but rather wouldn’t like to know).
Now that I know the files were still intact but prefixed with some weirdness, I checked multiple files if the offset would always be around 213 characters.
And it was exactly the case! Not only that, but all files were equally broken, MP3s included. But here, the players didn’t bother at all.
Now that we know all files have the headers, I only needed a simple Python script which would strip this prefix from all files recursively.
#!/usr/bin/env python3
import logging
import argparse
import os
CORRUPT_HEADER = b'\x03\x00'
ORIGINAl_HEADER_OFFSET = 0xd4
def fix_files(path: str, recursive: bool):
for node in os.listdir(path):
rel_path = path + '/' + node
if recursive and os.path.isdir(rel_path):
fix_files(rel_path, True)
if os.path.isfile(rel_path):
logging.debug('Checking: {}'.format(rel_path))
# Check if file is affected
with open(rel_path, 'rb') as f:
file_header = f.read(len(CORRUPT_HEADER))
if file_header != CORRUPT_HEADER:
# File is not affected or already fixed
continue
# Read file w/o weird header
logging.debug('Fixing: {}'.format(rel_path))
f.seek(ORIGINAl_HEADER_OFFSET)
original_content = f.read()
# Save fixed content
with open(rel_path, 'wb') as f:
f.write(original_content)
logging.info('Corrected file {}'.format(rel_path))
if __name__ == '__main__':
argparser = argparse.ArgumentParser()
argparser.add_argument('path', help='specify the path with files inside to work on')
argparser.add_argument('-r', '--recursive', action='store_true', help='not only work in `path` but subdirectories as well')
argparser.add_argument('-v', '--verbose', action='store_true', help='enable debug output')
args = argparser.parse_args()
if args.verbose:
logging.getLogger().setLevel(logging.DEBUG)
else:
logging.getLogger().setLevel(logging.INFO)
fix_files(args.path.rstrip('/'), args.recursive)
The script is very basic, without any error handling or parallelism, but it got the job done (after quite some time).