topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Tuesday March 19, 2024, 6:24 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: IDEA: Split file when some binary patterns are found?  (Read 7234 times)

ConstanceJill

  • Supporting Member
  • Joined in 2012
  • **
  • Posts: 203
    • View Profile
    • Donate to Member
IDEA: Split file when some binary patterns are found?
« on: April 12, 2014, 05:29 AM »
Hello there .o/

I'm looking for a program which does the following :
- can run on both 32 and 64 bit versions of Windows
- lets the user specify several binary patterns and associated file extensions (either within the program's interface or via a simple editable text configuration file), a source file to "scan", and a destination folder
- looks for the binary patterns in the source file to "cut" it into new files: each output file being a slice of the "source" file, starting where a matching pattern was found. The extension for each slice should be the one the user associated to the corresponding binary pattern.

The main point of the thing is to easily "extract" some files which are actually uncompressed concatenated binary resources such as pictures or sounds, which quite a few video games use, based on their file format headers.

The good news is it actually already exists as free and open source software — see http://trollz1.free.fr/?p=deconcat and http://trollz1.free....cat.c/0.3-r1/_README — however :
- I've had it fail to work unexpectedly several times, so I guess a more stable version would be a great improvement ^^'
- it would also be nice if it didn't have that 4 GB file limitation
- it could be much more user friendly as it's console driven (I guess a GUI would help)

I've also found a while ago a program named ripper5 which seems to do a similar job but it was made for DOS and, as such, can't run natively on 64 bit Windows.


Can someone please either have a look at deconcat to make it work better, or make a new similar program (the choice is yours ^^) ?
I would be very grateful :}
« Last Edit: April 19, 2014, 07:47 AM by ConstanceJill »

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: Split file when some binary patterns are found ?
« Reply #1 on: April 12, 2014, 07:18 AM »
Do you have any experience with ripper5?  If it does the job you might see if it will work in DosBox.  It's a free 16 bit Dos emulator.

It may be limited as to file size though.

ConstanceJill

  • Supporting Member
  • Joined in 2012
  • **
  • Posts: 203
    • View Profile
    • Donate to Member
Re: Split file when some binary patterns are found ?
« Reply #2 on: April 12, 2014, 08:07 AM »
Having to use ripper5 from DOSbox isn't exactly what I'd call more user friendly than just using deconcat itself from the command line ^^'
Also, from the small documentation and few examples in its config file, I don't know if/how one could define a header where the first few bytes can be anything and then you'd have those few ones with values which help you can recognize the file format. In fact it seems I even failed to make it recognize a header made only of static bytes starting from the first.
Further, you can't start it at unlocked speed or you get a runtime error, though it seems safe to run it first and then increase the cycles as it progresses through the file being scanned: that's kind of a hassle, so I can't really recommend it to just anyone.

I'm not making this request only to have something easier to use for myself, but so that pretty much anyone can play around with resource filled files and easily "extract" them: you'd just have to tell them what they need in the config file for a specific source. But if you first need them to install DOSbox and learn its basic commands, I guess many people would just run away xD

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: Split file when some binary patterns are found ?
« Reply #3 on: April 12, 2014, 01:56 PM »
Deconcat looks like it heralds back to Win98.

If I understand the desired requirements the splitter would have to handle files gretaer than 4 GB, scan for multiple binary signatures, then on finding one it has to write out the binary stream until it finds another of the multiple signatures.

It seems like a recipe for combinatorial explosion.
Unless I am misunderstnading and the resource file uses a marker for end of blob or whatever.  If the program has to be able to find x number of arbitrary sequences then the optimization of searching for the first byte of any sequence would still have to compare each byte of the file to a table of initial tag bytes.

It seems like it would take a very long time to scan even without writing out the slices.

ConstanceJill

  • Supporting Member
  • Joined in 2012
  • **
  • Posts: 203
    • View Profile
    • Donate to Member
Re: Split file when some binary patterns are found ?
« Reply #4 on: April 12, 2014, 03:29 PM »
If I understand the desired requirements the splitter would have to handle files gretaer than 4 GB, scan for multiple binary signatures, then on finding one it has to write out the binary stream until it finds another of the multiple signatures.
That's right. And the possibility to quickly enable/disable searching for some patterns without deleting them from the config file would be a nice feature :p

[…]
It seems like it would take a very long time to scan even without writing out the slices.
It does indeed if you do that without optimisation, by reading and writing the files byte after byte (I had myself made such a program in Turbo Pascal back in the days, and it was *very* slow... still is if I try to run it in DOSbox).
It's not much of a problem however if you use a buffer. deconcat 0.3-r1 itself does a pretty good job at this: on my current machine, it scans a 195 MB file from the "Thief: Deadly Shadows" game and splits it into 6713 ogg files in only 19 seconds with the default deconcat.conf which you can download on its website.

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: Split file when some binary patterns are found ?
« Reply #5 on: April 12, 2014, 05:11 PM »
I wasn't talking about byte by byte file i/o.  I was talking about the pattern match.  To search for the patterns, since multiple are allowed, you would naturally compare each byte to a table of the first characters from all the patterns.  On match then continue the comparison, etc..

From what you say it sounds more like the resource file is homogenous.  In that case once you found the first match you would limit the search to that pattern.  Otherwise I don't see how it could be done without big iron.

ConstanceJill

  • Supporting Member
  • Joined in 2012
  • **
  • Posts: 203
    • View Profile
    • Donate to Member
Re: Split file when some binary patterns are found ?
« Reply #6 on: April 12, 2014, 05:47 PM »
That particular file is homegenous indeed, but it may not always be the case.
And well, I don't exactly know why it's so fast (took a quick look at the source code, but I don't really know much about C), but since it is, I guess that means that no, looking for several patterns isn't such a big deal after all.

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: Split file when some binary patterns are found ?
« Reply #7 on: April 12, 2014, 06:17 PM »
That particular file is homegenous indeed, but it may not always be the case.
And well, I don't exactly know why it's so fast (took a quick look at the source code, but I don't really know much about C), but since it is, I guess that means that no, looking for several patterns isn't such a big deal after all.
-ConstanceJill (April 12, 2014, 05:47 PM)

I don't see how you draw the conclusion concerning the heterogeneous case from the homogenous benchmark.  In any case I suspect there's a reason the utility has lain dormant.

ConstanceJill

  • Supporting Member
  • Joined in 2012
  • **
  • Posts: 203
    • View Profile
    • Donate to Member
Re: Split file when some binary patterns are found ?
« Reply #8 on: April 13, 2014, 02:25 AM »
I don't see how you draw the conclusion concerning the heterogeneous case from the homogenous benchmark.
Who said I had only tested it with that one particular file?
Either way I don't get where you're trying to go from there. Are you implying that the program would be "cheating" and stopping to search for the other binary patterns once it has found one or something like this?
You seem to be really convinced that such operation just cannot be done fast for some reason… I don't really understand why. And even if it has to be take a bit of time to be done thoroughly, who cares, as long as it stays within a reasonable time frame?
In addition, if the user can choose to disable the search for some patterns, it should then make it operate faster, right?

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: Split file when some binary patterns are found ?
« Reply #9 on: April 13, 2014, 07:19 AM »
I think I'll leave this to others as the utility in question is Win9x based.

(  Edit:  Actually I'm in error. It's Dos based. Even C based languages are vastly different now.  )
« Last Edit: April 13, 2014, 08:05 AM by MilesAhead »