351
General Software Discussion / Re: Deduplication, encryption, security and... Dropbox
« Last post by Armando on April 15, 2011, 10:12 AM »Thanks Cloq. More stuff to consider... 



As Ashkan Soltani was able to test in just a few minutes, it is possible to determine if any given file is already stored by one or more Dropbox users, simply by observing the amount of data transferred between your own computer and Dropbox's servers. If the file isn't already stored by Dropbox, the entire file will be uploaded. If Dropbox has the file already, just a few kb of communication will occur.
XML Marker hasn't been updated in quite a while (2004).-sajman99 (April 07, 2011, 06:39 PM)
Project status:
It has been more 6 years(!) since the last release, but development has not stopped. XML Marker 2.0 should be out soon and will include the following features:
* Read XML files in Unicode using UTF 8 and UTF 16 encoding.
* Edit json files.
* Tree will not collapse when editing the text.
* Bookmarks
* Path selector
* Faster and uses less memory
(I'm optimistic...)
But I'm lacking time these days, so here's an article offering an interesting point of view.
Which reminds me that I need to backup to my off site hard drive...
LyX is effectively perfect for that type of stuff.I follow this thread with great interest, but lack of time got me never further than actually using the software.
So keep up the good work!-Shades (March 16, 2011, 01:05 PM)
I do have a question though: I assume it is possible to search for a particular change by a particular user using any of the GUI tools?
(the lack of that option is something I absolutely hate from WinCVS / CVS (which I am forced to work with).)-Shades (March 16, 2011, 01:05 PM)
) :Nice that you're continuing with this - appreciated-f0dder (March 15, 2011, 03:02 AM)
I still don't have a definitive favorite wrt. GUIs. TortoiseGit, gitk and git-gui all feel a bit rough around the edges. I'm not sure exactly what it is, probably the overall combination of several smaller flaws ("mid 90's linux GUI look", progress not always updated before the command is done (related to calling git.exe and parsing textual output?), et cetera).-f0dder (March 15, 2011, 03:02 AM)
PS: "The sluggishness is probably" -> ends mid-sentence.-f0dder (March 15, 2011, 03:02 AM)

] ). 
(Not that useful, but still nice to have — I did notice though that it can slow down folder navigation to a crawl… I solved that by killing/restarting the icon cache process)

Then I found these instructions (thanks Mr David P. Caldwell) and I finally got it to work.Features :
• Support for operations: Commit, Revert, Resolve, Compare, Update, Tags, Push, Pull, Incoming, Outgoing, Merge, Clone, Bundle, Archive, Annotate (Blame), View File, Grep, Bookmarks, Rebase
• File history dialog with support of file differencing
• Support tracking of file renames in file history dialog
• Support external file comparison tools
• Revisions graph log. This feature is similar to hgk, hgview and tortoiseHg change log window
• Support for merging revisions
• Support for rebasing revisions
• Support for inline file difference in most of package windows
• Support for bookmarks




Armando: nice post - DoCoCoins coming your way. More, please!
On speed: first, you have to do some manual cleanup of Git repositories every once in a while, to do garbage collection and (re)pack the repository - this helps wrt. speed and disk space consumption. This can of course be automated, but it's something you have to automate, Git doesn't do it for you.-f0dder (March 08, 2011, 03:53 AM)

"Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance."
Some git commands may automatically run git gc; see the --auto flag below for details.
Occasionally, Git automatically runs a command called “auto gc”. Most of the time, this command does nothing. However, if there are too many loose objects (objects not in a packfile) or too many packfiles, Git launches a full-fledged git gc command. The gc stands for garbage collect, and the command does a number of things: it gathers up all the loose objects and places them in packfiles, it consolidates packfiles into one big packfile, and it removes objects that aren’t reachable from any commit and are a few months old.
You can run auto gc manually as follows:
$ git gc --auto
Again, this generally does nothing. You must have around 7,000 loose objects or more than 50 packfiles for Git to fire up a real gc command. You can modify these limits with the gc.auto and gc.autopacklimit config settings, respectively.
builtin/merge.c: const char *argv_gc_auto[] = { "gc", "--auto", NULL };
builtin/receive-pack.c: "gc", "--auto", "--quiet", NULL,
git-am.sh: git gc --auto
git-rebase--interactive.sh: git gc --auto &&
git-svn.perl: command_noisy('gc', '--auto');
From git grep -- --auto on git.git, those results looked interesting. The notable one is builtin/merge.c meaning that the ever so common git pull should trigger a git gc --auto.
Additionally, unless your 'non-technical' staff is doing rather 'advanced' stuff (at which point they wouldn't be 'non-technical' anymore), I don't see why they would ever need to run git gc manually instead of just letting git gc --auto handle everything.
If you need the fastest system and don’t mind occassional performance glitches and volatile repository sizes, git is the way to go. Note that its speed drops to half the speed of Mercurial (and less for big repositories, since the time for the garbage collection rises linearly) if Git is forced to avoid frequently growing and shrinking repositories by running the garbage collection every ten commits. Also note that git gc --auto allowed the repository to grow to more than 5 times the minimum size and that garbage collection created 100% load for 3 of my 4 cores which would slow down all other processes on a server, too (otherwise gc would likely have been slower by factor 3)....
If you need reliable performance and space requirements, Mercurial is the better choice, especially when called directly via its API. Also for small repositories with up to about 200 commits, it is faster than git even without garbage collection.

Also, there's definitely been a lot of progress in Git over the years. Considering that more and more has been going from shell scripts -> native C code, it's probably not fair looking at a 2 years old benchmark-f0dder (March 08, 2011, 03:53 AM)
On Tortoise... I've been considering moving away from it. Yes, the icon overlays in explorer are kinda nice, but that's about it - I feel that a dedicated app would probably provide more efficient workflows. Also, having cache processes for t-svn, t-git, t-hg seems a bit much, and I often run into problems with those processes keeping folders locked when they shouldn't.
Dunno which app(s), though - I wasn't all impressed when I checked out SmartGit, but can't remember exactly why, guess I'll give it another chance. I think my 'meh' was partially causes by the program being implemented in Java, and (worse) not having an option to use the system JDK but installing it's own separate copy.-f0dder (March 08, 2011, 03:53 AM)


Each git command is like a Swiss Army knife. For example, git checkout can switch the working directory to a new branch, update file contents to that of a previous revision, and even create a new branch. It’s an efficient way to work once you learn all the arguments and how they interact with each other.
Mercurial is like a well-equipped kitchen — it has a lot of tools that each do one simple, well-defined thing, and do it well. To switch the working directory you use hg update. To update file contents to what they were at a previous revision you use hg revert. To create a new branch you use hg branch. This means there are more commands to learn, but each command is much simpler and more specific to a single conceptual task.
1 As a benchmark, Git and Mercurial repositories were seeded with approximately 1500 files totaling 35 M of data. The servers were running in Chicago and the clients in Mountain View (51 ms ping time). The operation of cloning the remote repository (similar to a initial checkout in traditional version control systems) averaged 8.1 seconds for Mercurial and 178 seconds for Git (22 times slower). A single file in the repository was then changed 50 times and the clients pulled the updates. In this case, Mercurial took 1.5 seconds and Git required 18 seconds (12 times slower). When the Git protocol was used instead of HTTP, Git's performance was similar to Mercurial (8.7 seconds for cloning, 2.8 seconds for the pull).

I worry this kind of thing would lead us to over-complication.. Let's try to keep this as simple as we possibly can while trying for most of the functionality we think is important. I don't think we have so much software that we need to get crazy with letting people filter and sort entries, etc. A good compromise might simply be a simple tagging system, and a way to present a page of software with that tag -- then we could create arbitrary collections using tags.-mouser (March 06, 2011, 02:30 PM)