ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Post New Requests Here

IDEA: Split file when some binary patterns are found?

(1/2) > >>

ConstanceJill:
Hello there .o/

I'm looking for a program which does the following :
- can run on both 32 and 64 bit versions of Windows
- lets the user specify several binary patterns and associated file extensions (either within the program's interface or via a simple editable text configuration file), a source file to "scan", and a destination folder
- looks for the binary patterns in the source file to "cut" it into new files: each output file being a slice of the "source" file, starting where a matching pattern was found. The extension for each slice should be the one the user associated to the corresponding binary pattern.

The main point of the thing is to easily "extract" some files which are actually uncompressed concatenated binary resources such as pictures or sounds, which quite a few video games use, based on their file format headers.

The good news is it actually already exists as free and open source software — see http://trollz1.free.fr/?p=deconcat and http://trollz1.free.fr/projects/deconcat.c/0.3-r1/_README — however :
- I've had it fail to work unexpectedly several times, so I guess a more stable version would be a great improvement ^^'
- it would also be nice if it didn't have that 4 GB file limitation
- it could be much more user friendly as it's console driven (I guess a GUI would help)

I've also found a while ago a program named ripper5 which seems to do a similar job but it was made for DOS and, as such, can't run natively on 64 bit Windows.


Can someone please either have a look at deconcat to make it work better, or make a new similar program (the choice is yours ^^) ?
I would be very grateful :}

MilesAhead:
Do you have any experience with ripper5?  If it does the job you might see if it will work in DosBox.  It's a free 16 bit Dos emulator.

It may be limited as to file size though.

ConstanceJill:
Having to use ripper5 from DOSbox isn't exactly what I'd call more user friendly than just using deconcat itself from the command line ^^'
Also, from the small documentation and few examples in its config file, I don't know if/how one could define a header where the first few bytes can be anything and then you'd have those few ones with values which help you can recognize the file format. In fact it seems I even failed to make it recognize a header made only of static bytes starting from the first.
Further, you can't start it at unlocked speed or you get a runtime error, though it seems safe to run it first and then increase the cycles as it progresses through the file being scanned: that's kind of a hassle, so I can't really recommend it to just anyone.

I'm not making this request only to have something easier to use for myself, but so that pretty much anyone can play around with resource filled files and easily "extract" them: you'd just have to tell them what they need in the config file for a specific source. But if you first need them to install DOSbox and learn its basic commands, I guess many people would just run away xD

MilesAhead:
Deconcat looks like it heralds back to Win98.

If I understand the desired requirements the splitter would have to handle files gretaer than 4 GB, scan for multiple binary signatures, then on finding one it has to write out the binary stream until it finds another of the multiple signatures.

It seems like a recipe for combinatorial explosion.
Unless I am misunderstnading and the resource file uses a marker for end of blob or whatever.  If the program has to be able to find x number of arbitrary sequences then the optimization of searching for the first byte of any sequence would still have to compare each byte of the file to a table of initial tag bytes.

It seems like it would take a very long time to scan even without writing out the slices.

ConstanceJill:
If I understand the desired requirements the splitter would have to handle files gretaer than 4 GB, scan for multiple binary signatures, then on finding one it has to write out the binary stream until it finds another of the multiple signatures.-MilesAhead (April 12, 2014, 01:56 PM)
--- End quote ---
That's right. And the possibility to quickly enable/disable searching for some patterns without deleting them from the config file would be a nice feature :p

[…]
It seems like it would take a very long time to scan even without writing out the slices.-MilesAhead (April 12, 2014, 01:56 PM)
--- End quote ---
It does indeed if you do that without optimisation, by reading and writing the files byte after byte (I had myself made such a program in Turbo Pascal back in the days, and it was *very* slow... still is if I try to run it in DOSbox).
It's not much of a problem however if you use a buffer. deconcat 0.3-r1 itself does a pretty good job at this: on my current machine, it scans a 195 MB file from the "Thief: Deadly Shadows" game and splits it into 6713 ogg files in only 19 seconds with the default deconcat.conf which you can download on its website.

Navigation

[0] Message Index

[#] Next page

Go to full version