... (oh, and while the implementation model might be elegant, git as a whole definitely isn't - ugh!).
Can you elaborate? My understanding is that when Git first came out it was quite difficult to use, and the documentation was lousy, but it has since matured and those days are gone.
First, let me state that I think the git model seems pretty solid overall: the way repository information is stored in the .git folder (and the way it's structured), the way server communication is done for remote repositories et cetera. I haven't looked into each and every detail (e.g. I don't know if file blobs are stored directly or if they're compressed (can't see why they would be)), but I understand the idea of storing blobs and referring to just about everything through their SHA-1 hash values (I wonder why SHA-256 wasn't chosen, considering some known SHA-1 defects, but not too big a deal - the foucs isn't to guard against attackers but to avoid collisions under normal
My gripes are more around the end-user tools. One thing is that the Windows port is still a bit rough (blame then *u*x people for not writing properly modular and portable code), this is something I can live with - but gee, even after creating hardlinks for all the git-blablabla.exe in libexec/git-core, the msysgit install still takes >120meg disk space... subversion is 7.4meg. Of course msysgit comes with a lot more
than svn, but I shouldn't need
all that extra. And I don't currently have time to check what is absolutely necessary and what's just icing on the cake; considering that git was originally a bunch of scripts, and the unix tradition of piecing small things together with duct tape, I don't feel like playing around right now
The more important thing is how you use the tools. Sure, for single-developer local-only no-branch usage, it's pretty much a no-brainer, and most of the terminology matches traditional C-VCS. There's some subtle differences here and there that you have to be aware of, though - like what HEAD means. IMHO it would have been better to use new terminology for some things - like "stage" instead of "add" (having "add" overloaded to handle both add-new-file and add-to-staging-area is bad). "Checkout" for switching branches doesn't seem like the smartest definition to me, either. And not knowing about renames (but depending on client-tool to discover this, probably through matching SHA-1 values?) also seems like a bit of a mistake. Relatively minor points, but things that can
Where things can get hairy is when you collaborate with other people, especially on juggling with branches and "history rewriting". Git has some very
powerful features that lets you mess things up bigtime - which by itself might not be a problem, but with the various overloads the commands have, and that history rewriting (ie, commit --amend
) seems pretty common operations, you really have to be pretty careful. Some of it aren't much of an issue if you're a single developer (although you can
inadvertantly destroy branch history, which can be bad), but you need to be really careful with rebasing once you work with other people - the Pro Git
book has a good example of why
All that said, I'm considering moving my own VCS to git. D-VCSs clearly have advantages
over C-VCS, and while the git-windows tools have rough edges and you have to be careful and
you can do some insane things, it's fast
and I believe the underlying technology has gotten an important bunch of things right. I'll probably check out some of the other D-VCS tools before deciding, like bazaar and mercurial.
I am new to Git and have had nothing but a good experience with it so far. It is small, lightning fast and non-obtrusive.
Wouldn't say it's small (at least not msysgit
), but fast indeed, and I like that it has a single .git folder instead of a per-subfolder .svn (that's un-obtrusive for me).
It compresses your data down to a bare minimum.
Does it actually compress anything (apart from server communications), or are you just referring to only storing each blob once, identified by SHA-1 hash value? It's also my understanding that files are stored in entirety, whereas other systems store patchsets. This makes checkouts extremely fast, but if you have huge files and only change a few lines, it does take up more disk space (usually not a big problem, most sane people don't store server logs under vcs, and keep source code files small... 50 committed edits to the main Notepad++ .cpp file would be ~16meg though
However, I am not at all sure how well it would handle terabytes of data!?
Better than other VCSes... but it's not suitable for just computing
I think Git is a little unsuitable, it keeps a copy of the whole file in one revision. It's good for distributed code, but not for file verifying.
I would disagree. Accurate file verification is one of the founding premises of Git. Yes it stores entire files, but it is very efficient. In Linus's talk, he mentions that the repository containing the entire history of the Linux sources (from 2005-2007), was only half
the size of one checked out version of the source tree itself!
The point is that to "compute hashes" with git, you'll be putting files under version control. You don't necessarily want to do that for a bunch of, say, ISO images. Heck, I'd even say it's very likely you don't want to do this. First, you don't need the file under VCS, second you don't want the extra duplicate
it creates (remember, every file will live in the .git object stash as well as a checked out copy).
Anyway, a lot of this discussion should really probably be split out to a topic about git, since it's drifted quite far away from the topic of file checksumming.