Messages - widgewunner [ switch to compact view ]

Pages: prev1 ... 8 9 10 11 12 [13] 14 15 16 17 18 19next
61
f0dder is correct. This sounds like a job for Regex! If you can post a few examples of the html files (after changing the actual names/emails of course), I'm sure we can help you out.

I'm a regex addict and live for this sort of thing! (sick I know!) :)

62
General Software Discussion / Re: Mass checksum checker
« on: February 14, 2010, 08:48 PM »
I agree with everything you just said. Thanks for the input. When I said Git was small, I was certainly wrong with regard to the tool itself. I guess I was thinking in terms of Git's impact on the tree you are putting under revision control. Where CVS and SVN obtrusively place folders in every directory in a tree, Git only needs one. And the size of the repository is small - yes, Git does file compression into what it calls "pack" files.

And I have to admit that you are all absolutely correct when you say Git is not really appropriate for the specific task of file verification. Especially for binary files. I guess my recent infatuation with this tool has got me wanting to evangelize its praises, and it seemed to me if someone was asking about verifying a bunch of files, they may also be wanting to track changes as well - in which case Git may be something worth looking into.

FYI - the msysgit installation is sort of like cygwin. It includes a mini-unix environment which includes the following command line tools (which explains its size):
basename, bash, bzip2, cat, chmod, cmp, cp, curl, cut, date, diff, du,
env, expr, false, find, gawk, git, git-*, gpg, gpgkeys_curl, gpgkeys_finger,
gpgkeys_hkp, gpgkeys_ldap, gpgsplit, gpgv, grep, gzip, head, id, kill,
less, ln, ls, md5sum, mkdir, msmtp, mv, openssl, patch, perl, ps, rm, rmdir,
rxvt, scp, sed, sh, sleep, sort, split, ssh, ssh-add, ssh-agent, ssh-keygen,
ssh-keyscan, tail, tar, tclsh, tclsh85, tee, touch, tr, true, uname, uniq,
vim, wc, wish, wish85, xargs, CA, tclConfig and tkConfig.

Sorry for the distraction. Back to your regular scheduled programming...

63
General Software Discussion / Re: Mass checksum checker
« on: February 14, 2010, 12:25 PM »
As you know,  I am also a big fan of Git.
When you gain a little proficiency with it, this is a great resource to the power of Git.
I agree. (In fact, I just received my hard copy of the book last week). And Scott Chacon's Gitcasts are also very good. I have found both the online and written documentation to be nothing short of excellent. (i.e. My first book on git was O'Reilly's: Version Control with Git by Jon Loeliger - also highly recommended.)

... git is powerful and interesting, but suggesting it as a way to get file hashes? That's kinda like using a frying pan to drive in nails :) ...
LOL! Yes, you have a point here. (Although a frying pan does a pretty good job of it!)

... (oh, and while the implementation model might be elegant, git as a whole definitely isn't - ugh!).
Can you elaborate? My understanding is that when Git first came out it was quite difficult to use, and the documentation was lousy, but it has since matured and those days are gone. I am new to Git and have had nothing but a good experience with it so far. It is small, lightning fast and non-obtrusive. It compresses your data down to a bare minimum. And contrary to what some might believe, it is actually very easy to use. Once installed, setting up a repository to track a directory tree consists of four commands:
cd branch/to/be/followed        # change to the directory you want to be managed
git init                        # initialize a new repository
git add .                       # recursively add all files and folders
git commit -m "initial commit"  # commit the tree to the repository

Just repeat the last two commands any time you want to add a new version to the repository. Yes, there are much more powerful and complex commands that git can perform, but these are completely unnecessary for the purpose described here. There is also a GUI interface, but I can't comment on that as I am a command line kind of guy for this kind of stuff.

It not only provides you with SHA1 hash of every version of every file in your tree (and thus guarantees the integrity of each and every one), it has very powerful ways for you to inspect the changes that have been made to the files over time. It also has commands for copying entire repositories to other drives/servers which provides a very effective backup methodology.

However, I am not at all sure how well it would handle terabytes of data!?

I think Git is a little unsuitable, it keeps a copy of the whole file in one revision. It's good for distributed code, but not for file verifying.
I would disagree. Accurate file verification is one of the founding premises of Git. Yes it stores entire files, but it is very efficient. In Linus's talk, he mentions that the repository containing the entire history of the Linux sources (from 2005-2007), was only half the size of one checked out version of the source tree itself!

p.s. You guys did go check out hashtab didn't you? It is definitely a "must-have"!

Cheers!

64
... Think bigger, though: systemwide standards. I can user Ctrl+Arrowkeys to jump at word boundaries in most edit controls (whether single- or multiline), I can use shift+navigation to select, ctrl+backspace/del to delete to the respective word endings, there's home/end/pgup/pgdn, et cetera. ONE set of (reasonable) keybindings that are simple to remember and work across pretty much every application on the system.
Exactly! Learn once - use everywhere.

And this tangent is certainly NOT off-topic. The standardization of the keystroke bindings for Windows is one of the reasons I keep sticking with it (and the text editors which are keyboard intensive). These keystrokes have become indelibly burned into my brain. Even Mac has adopted the CUT, COPY PASTE hotkey conventions. IMHO this is a *very* good thing.

As an aside - EditPad pro has a one-click selection for WORDSTAR key bindings. Can you say CTRL+K? ;)

65
General Software Discussion / Re: Mass checksum checker
« on: February 14, 2010, 01:30 AM »
Multiple files:
If your data is located in one branch on a file system, you can use GIT, the version control software used to manage the Linux kernel. It uses SHA1 hashes to identify every version of every file in the tree as well as the entire tree (and every version of the tree). The design is actually quite simple and secure - the entire history of the tree of files is rendered down to one single SHA1 hash. If any version of any file is modified (or the disk is corrupted in any way), this SHA1 is changed. Thus, if the SHA1 has not changed, you can be sure that all the versions of every file are intact. (This web page does a pretty good job of explaining how this works.) Setting up GIT is quite simple using a command line interface. Its free and open source. The preferred Windows version is available at Google code: msysgit.

I've only recently gotten into using source control software but have become a fan of GIT's elegant design and useful functionality. It was this lecture by Linus Torvalds on YouTube that turned me onto the beauty of GIT. If you go to the Git documentation home page (http://git-scm.com/documentation) there is a link to this and other videos describing GIT. I've been using it heavily for a couple months now and have had no trouble whatsoever. It is a very cool tool!

Single file:
One of the first apps I install when setting up a new box is: hashtab. Once installed, just right click on any file and select Properties. On Windows, this little beauty simply adds a new File Hashes tab to the file's property sheet which displays the various hashes of the file like so:
http://i29.photobucket.com/albums/c253/ridge-runner/SCREENSHOTS/HashTab.png

Just copy any hash code text into the Hash Comparison text box and it gives a green check mark indicating which hash type has matched (or a red X if none matched). You can select to display a variety of different hash algorithms - here is the settings page which shows the supported hash types:
http://i29.photobucket.com/albums/c253/ridge-runner/SCREENSHOTS/HashTabSettings.png

Hope this helps! :)

Pages: prev1 ... 8 9 10 11 12 [13] 14 15 16 17 18 19next
Go to full version