Home | Blog | Software | Reviews and Features | Forum | Help | Donate | About us
topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • December 09, 2016, 07:41:51 AM
  • Proudly celebrating 10 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Best Language For Binary Parsing?  (Read 3722 times)

Ehtyar

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,237
    • View Profile
    • Donate to Member
Best Language For Binary Parsing?
« on: October 02, 2008, 04:31:50 PM »
Hi all.
If you needed to parse some binary data (for examples, PE headers), what scripting language would you use and why? I've seen a few in Python (though they've all been too complex what I require), though I would probably be more comfortable in Perl. Anyway, I just wanted to see if anyone had any opinions before I start investing too much time in it. Any suggestions are welcome.

Thanks for any replies, Ehtyar.

Ehtyar

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,237
    • View Profile
    • Donate to Member
Re: Best Language For Binary Parsing?
« Reply #1 on: October 11, 2008, 06:57:45 PM »
Well, as one might imagine, a dynamically typed language makes for quite a nightmare in attempting to parse binary data. Usually you've got either a string or an int, and usually when reading from a file, you get a string.
There was a lot of ord(), a lot of pack()/unpack(), a lot of sprintf() and a lot of split()/join(), but I've managed it in Perl in a most un-elegant manner which I'm entirely unhappy with. Ironically, where I would have liked a scripting language, a compiled one would have suited far better (when will someone make a C script that is usable).I was going to try it in Python also, but I'm now painfully aware that Python would be just as difficult-a-language to write this in as Perl.
To rephrase my original question, is anyone aware of a statically typed scripting language that I could possible employ in this endeavor?

Thanks, Ehtyar.

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 36,421
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: Best Language For Binary Parsing?
« Reply #2 on: October 11, 2008, 07:00:18 PM »
because python is so closely tied to an underlying base of C code and exposes most of it, i would think python could be a good option.

scancode

  • Honorary Member
  • Joined in 2007
  • **
  • Posts: 638
  • I will eat Cody someday.
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Best Language For Binary Parsing?
« Reply #3 on: October 11, 2008, 07:02:55 PM »
Euphoria (1, 2) seems well suited for the task. Just load the file on a sequence and hack away :)

Ehtyar

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,237
    • View Profile
    • Donate to Member
Re: Best Language For Binary Parsing?
« Reply #4 on: October 20, 2008, 03:27:38 AM »
For the uninformed, I spoke to Mouse Man and Scan Man outside of the forum, and unfortunately Python's typing is not quite strong enough, and while Euphoria appears to be strongly typed, it is mostly superficial.
My solution came in the form of C#, believe it or not. When I came across the BinaryReader class I did some further investigation (because who would not want to use that for binary parsing), and eventually found C# scripting engines. The strong typing, coupled with BinaryReader made C# a perfect choice.
Those that know me well will be shocked to pieces at the thought of Ehtyar using .NET....never fear friends, the script interpreters run on Mono ;)

Ehtyar.

CWuestefeld

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,002
    • View Profile
    • Donate to Member
Re: Best Language For Binary Parsing?
« Reply #5 on: October 20, 2008, 08:04:27 AM »
A bit of a turnaround, as my day-to-day work is in C# (and T-SQL), and I'm just learning python, but I would have looked toward python in preference to C#.

I fail to see what python's dynamic typing has to do with it. The duck typing of python means that the types influence more strongly what you can do with an object than a real reflection of what it "is". And the fact that something's potentially recognizable doesn't affect the fact that it's being read literally, as a string. But that's not really relevant, since you should be dealing with actual strings anyway. From Programming Python 3rd Edition, section 4.2.1.5:
Quote
In all cases, data transferred between files and your programs is represented as Python strings within scripts, even if it is binary data. This works because Python string objects can always contain character bytes of any value

In C#, the foreach feature is a big step forward from C/C++. But it's still a long way off from python's generators (although this is less true with LINQ).

Ehtyar

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,237
    • View Profile
    • Donate to Member
Re: Best Language For Binary Parsing?
« Reply #6 on: October 20, 2008, 02:56:12 PM »
Quote
In all cases, data transferred between files and your programs is represented as Python strings within scripts, even if it is binary data. This works because Python string objects can always contain character bytes of any value
It may *work*, but it's a complete nightmare to parse.
Perl deals with strings the same way. To get an integer of a byte, I had to call ord() to be certain Perl didn't whine about a string. To parse the file, I had to load the entire header into memory in one, then unpack it to a hash. To get a hex representation of most things, I had to call unpack() again. For compare operations, I often had to resort to regex.
I can't understand what makes anyone think Python (or any dynamically typed language for that matter) would be a better choice here then one that's statically typed, and provides functions to deal with specific types. If someone can show me an example of Python parsing various datatypes read from a file and displaying them in multiple representations, I might be more receptive. But I can say unequivocally, in my experience, C# was a dream compared to Perl.

Ehtyar.