topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 10:26 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: SeqBox - A file container that can be restored after total loss of FS structures  (Read 14478 times)

Mark0

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 652
    • View Profile
    • Mark's home
    • Donate to Member
https://github.com/MarcoPon/SeqBox

A SeqBox container have a blocksize sub/equal to that of a sector, so can survive any level of fragmentation. Each block have a minimal header that include a unique file identifier, block sequence number, checksum, version. Additional, non critical info/metadata are contained in block 0 (like name, file size, crypto-hash, other attributes, etc.).

If disaster strikes, recovery can be performed simply scanning a volume/image, reading sector sized slices and checking blocks signatures and then CRCs to detect valid SBX blocks. Then the blocks can be grouped by UIDs, sorted by sequence number and reassembled to form the original SeqBox containers.


Tools are in Python 3.x, so they should work just about anywhere. I tried on a server with Linux, Win 10 PC, Raspberry Pi, Android.

Mark0

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 652
    • View Profile
    • Mark's home
    • Donate to Member
I have updated the readme adding a sort of guide tour / demo to illustrate how the tools can be used:

https://github.com/M...lob/master/readme.md



Pretty nightmarish! Now on to SBXScan to search for pieces of SBX files around, and SBXReco to get a report of the collected data:



How is that for a fragmented floppy image?  ;D

Mark0

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 652
    • View Profile
    • Mark's home
    • Donate to Member
Updated to v1.0.0.
Recovery tests have been done in various configurations:

Tests

Seqbox recoverability have been practically tested with a number of File Systems. The procedure involved using a Virtual Machine to format a small (about 100MB) disk image with a certain FS, filling it with a number of small files, then deleting some randomly to free enough space to copy a serie of SBX files. This way every SBX file results fragmented in a lot of smaller pieces. Then the image was quick-formatted, wipefs-ed and the VM shutdown.
After that, from the host OS, recovery of the SBX files was attempted using SBXScan & SBXReco on the disk image. 

  • Working: BeFS, BTRFS, EXT2/3/4, FATnn/VFAT/exFAT, AFFS, HFS+, JFS, MINIX FS, NTFS, ProDOS, ReiserFS, XFS, ZFS.
  • Not working: OFS (due to 488 bytes blocks)

Being written in Python 3, SeqBox tools are naturally multi-platform and have been tested successfully on various versions of Windows, on some Linux distros either on x86 or ARM, and on Android (via QPython). No test was done on OS X but it should works there as well (feedback welcome).

If someone can try it on a Mac (maybe with the new APFS), it will be much appreciated.  :Thmbsup:
« Last Edit: April 05, 2017, 05:03 AM by Mark0 »

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
This looked interesting at the opening post.
Is now beginning to look even more interesting ... but I don't believe in magic.

Mark0

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 652
    • View Profile
    • Mark's home
    • Donate to Member

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,959
    • View Profile
    • Donate to Member
hi Mark0,
this does sound interesting for long-term archiving

Possible / hypothetical / ideal uses cases

    Last step of a backup - after creating a compressed archive of something, the archive could be SeqBox encoded to increase recovery chances in the event of some software/hardware issues that cause logic / file system's damages

wondering: is that archive then (also) accessible in the normal manner (if without SeqBox password), or will SeqBox be required to access it?
Tom

Mark0

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 652
    • View Profile
    • Mark's home
    • Donate to Member
You could consider an SBX archive like a ZIP one, with only 1 file inside, and stored / non-compressed.
So yes, the file will need to be decoded / extracted with SBXDec to get the original file.
But, like there are software that can access files inside a ZIP archive without extracting it (like music players that directly can play MP3s stored inside a compressed archive), nothing prohibits to do the same with SBX archives.
It would be actually very easy, since the format is very simple, and the original file isn't changed in any way, just divided in blocks. As such even random access is possible without any issues.

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,959
    • View Profile
    • Donate to Member
You could consider an SBX archive like a ZIP one, with only 1 file inside, and stored / non-compressed.
So yes, the file will need to be decoded / extracted with SBXDec to get the original file.
But, like there are software that can access files inside a ZIP archive without extracting it (like music players that directly can play MP3s stored inside a compressed archive), nothing prohibits to do the same with SBX archives.
It would be actually very easy, since the format is very simple, and the original file isn't changed in any way, just divided in blocks. As such even random access is possible without any issues.

thanks for that info :up:
Tom

solaris65

  • Supporting Member
  • Joined in 2012
  • **
  • Posts: 18
    • View Profile
    • Ambient Mechanics
    • Donate to Member
Hi Mark0,

First of all, I just want to second what's been said already about how interesting this is..even though I don't fully understand what it is or its possible applications.  I'm not a "coder" and the only thing I've ever built in my life was a garden wall with my late father many years ago.  My area of expertise is in music production. Once upon a time, I was dabbling with the idea of having a VST (a digital musical instrument plug-in / app) of my own built and was researching possible sources to do just that and somehow found my way here and seem to have stuck around.

Anyway, sorry for going off on a tangent there.  This really does interest me, mainly for  it's possible use in backing up my compositions and all musical data pertaining to the same..BUT..(and you'll have to excuse my total ignorance here) I don't really understand how I would actually use it in a day-to-day setting..or even how to set it up to begin with.

For the record, my studio system is a Windows 7 Pro 64 bit PC and I understand how to use Zip programs, which I gather this is something akin to..but what I really need it a very simple "walk-through" that would explain..as if to a child..what this is and how I can use it.

Again, my apologies for the ignorance on my part..but this really does sound like something everyone SHOULD be using, if I have understood it correctly..and I just need some clarification and help with it.  Hope you don't mind supplying both when and if you have the time to do so.

Kind regards

Dan (solaris65)

Mark0

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 652
    • View Profile
    • Mark's home
    • Donate to Member
but this really does sound like something everyone SHOULD be using

Well, I don't think we are quite there at the moment! :)
Let's say that, aside from experimenting, it could surely be used as additional back up/security measure.

Being a suite of command line tools it's not super user friendly, but I hope to add additional documentation and examples soon.



solaris65

  • Supporting Member
  • Joined in 2012
  • **
  • Posts: 18
    • View Profile
    • Ambient Mechanics
    • Donate to Member
but this really does sound like something everyone SHOULD be using

Well, I don't think we are quite there at the moment! :)
Let's say that, aside from experimenting, it could surely be used as additional back up/security measure.

Being a suite of command line tools it's not super user friendly, but I hope to add additional documentation and examples soon.




Thanks for the swift reply, Mark0..much appreciated.  :Thmbsup:

OK..fair enough.  In the meantime, I sincerely wish you all the best in developing this and I obviously hope this project gets to the point where it has a user-friendly UI and easy-to-follow help file system / PDF / Doc / txt file to go with it.  Despite you obvious modesty, I still think this is something everyone could and should be using, as..if I understand it correctly..it would mean a person would never lose a file to drive corruption again.

Anyway..all the best to you and I'll certainly be keeping an eye on how this project is progressing.

All the best

Dan (solaris65)

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Very neat  :Thmbsup:

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
...I'm not a "coder" and the only thing I've ever built in my life was a garden wall with my late father many years ago. ...
____________________
Building a garden wall (one that actually stays up) is an exercise and a half in itself. Been there, done that. You were fortunate to have had your father helping you.

Mark0

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 652
    • View Profile
    • Mark's home
    • Donate to Member
I have designed another related tool, that can both cover other scenarios and integrate with SeqBox:

https://github.com/MarcoPon/BlockHashLoc

BlockHashLoc

The purpose of BlockHashLoc is to enable the recovery of files after total loss of File System structure, or without even knowing what FS was used in the first place.

The way it can recover a given file is by keeping a (small) parallel BHL file with a list of crypto-hashes of all the blocks (of selectable size) that compose it. So it's then possible to read blocks from a disk image/volume, calculate their hashes, compare them with the saved ones and rebuild the original file.

With adequately sized blocks (512 bytes, 4KB, etc. depending on the media and File System), this let one recover a file regardless of the FS used, or the FS integrity, or the fragmentation level.

This project is related to SeqBox. The main differences are:

  • SeqBox create a stand-alone file container with the above listed recovery characteristics.
  • BHL realize the same effect with a (small) parallel file, that can be stored separately (in other media, or in the cloud), or along the original as a SeqBox file (so that it can be recovered too, as the first step), so it can be used to add a degree of recoverability to existing files.