I
1. Just today, the community has received news of its recent loss, two days ago, of a prominent data-availability activist, Aaron Swartz. Interesting here, the criminal prosecution authorities seem to have been much more motivated for him to be treated as a big-criminal than even his alleged victim, MIT. (Edit: Well, that doesn't seem to be entirely true: JStor is said to not having been insisting on him being prosecuted, but M.I.T. wanted him to to be made "pay" - so without them, he'd probably be alive. And it's ironic, that M.I.T., a VICTIM of these "resell them their own academic papers, and at outrageous price" scheme, made themselves prosecutors of Aaron, instead of saying, we're not happy about what he tried to do, but he tried for all of us. M.I.T. as another Fifth Column representative, see below.) So there is cloud for paid content, and your getting access without paying, big style, then perhaps even re-uploading gets you 20 or 30 years in jail if the prosecutors have their way, and that's where it goes (cf. Albert Gonzalez).
(Edit: First he made available about 20 p.c. of an "antecedents rulings" db, absolutely needed for preparing lawsuits in a common-law legal system based on previous, similar cases' rulings (they charge 8 cent a page, which doesn't seem that much, but expect to download (and pay for) perhaps 3,000 pages in order to get those 20, 30 that'll be of relevance), then he tried to download academic journal articles from JStor, the irony here being that the academic world is paid by the general public (and not by academic publishers or JStor) for writing these articles, then pays high prices (via their university, so again the general public pays at the end of the day (university staff and overall costs being public service on the Continent anyway, and in the U.K. and the U.S., it's the students who finance all this, then charge 500 bucks the hour in order to recoup this investment afterwards). The prosecutor asked for 35 years of imprisonment, so Swartz would even have been to be called "lucky", had the sentence stayed under 25 years. (From a competitor of JStor, I just got "offered" a 3-page askSam review from 1992 or so, in .pdf, for 25 euro plus VAT if I remember well...))
(Edit - Sideline: There is not only the immoral aspect of making pay the general public a second time for material it morally (and from financing it to begin with) owns already, there is also a very ironic accessibility problem now that becomes more and more virulent: Whilst in their reading rooms, universities made academic journals available not only to their staff and their students, but also to (mildly) paying academics from the outside, today's electronic-only papers are, instead of being of ubiquitous access now, in most universities, not even available anymore to such third-parties, or not even bits of those texts can be copied and pasted by them, so in 2013, non-university academics sit before screens and are lucky to scribble down the most needed excerpts from the screen, by hand: The electronic "revolution" thus makes more and more people long for the Seventies' university revolution: the advent of photocopiers - which for new material, in most cases, ain't available anymore: Thanks to the greediness of traders like JStor et al, we're back to handwriting, or there is no access at all, or then, 40 bucks for 3 pages.)
2. On the other hand, there's cloud-as-storage-repository, for individuals as for corporations. Now this is not my personal assertion, but common sense here in Europe, i.e. the (mainstream) press here regularly publishes articles convening about the U.S. NSA (Edit: here and afterwards, it's NSA and not NAS, of course) having their regular (and by U.S. law, legal) look into any cloud material stored anywhere in the cloud on U.S. servers, hence the press's warning European corporations should at least choose European servers for their data - whilst of course most such offerings come from the U.S. (Edit: And yes, you could consider developers of cloud sw and / or storage as sort of a Fifth Column, i.e. people that get us to give away our data, into the hands of the enemy, who should be the common enemy.)
3. Then there is encryption, of course (cf. 4), but the experts / journalists convene that most encryption does not constitute any prob for the NSA - very high level encryption probably would but is not regularly used for cloud applications, so they assume that most data finally gets to NSA in readable form. There are reports - or is it speculations? - that NSA provides big U.S. companies with data coming from European corporation, in order to help them save cost for development and research. And it seems even corporations that have a look upon rather good encryption of their data-in-files, don't apply these same security standards to their e-mails, so there's finally a lot of data available to the NSA. (Even some days ago, there's been another big article upon this in Der Spiegel, Europe's biggest news magazine, but that wasn't but another one in a long succession of such articles.) (Edit: This time, it's the European Parliament (!) that warns:
http://www.spiegel.d...ie-usa-a-876789.html - of course, it's debatable if anybody then should trust European authorities more, but it's undebatable that U.S. law / juridiction grants patents to the first who comes and brings the money in order to patent almost anything, independently of any previous existence of the - stolen - ideas behind this patent, i.e. even if you can prove you've been using something for years, the patent goes to the idea-stealing corporation that offers the money to the patent office, and henceforward, you'll pay for further use of your own ideas and procedures, cf. Edit of number 1 here - this for the people who might eagerly assume that "who's nothing got to hide shouldn't worry".)
4. It goes without saying that those who say, if you use such cloud services, use at least European servers, get asked what about European secret services then doing similar scraping, perhaps even for non-European countries (meaning, from GB, etc. straight to the U.S., again), for one, and second, in some European countries, it's now ILLEGAL to encrypt data, and this is then a wonderful world for such secret services: Either they get your data in full, or they even criminalize you or the responsible staff in your corporation. (Edit: France's legislation seems to have been somewhat lightened up instead of being further enforced as they had intended by 2011. Cf
http://rechten.uvt.n...ryptolaw/cls2.htm#fr )
5. Then, there are accessibility probs, attenuated by multi-storage measures, and provider-closing-down-the-storage, by going bankrupt or by just commercial evil: It seems there are people out there who definitely lost data with some Apple cloud services. (Other Apple users seem to have lost bits of their songs bought from Apple, i.e. Apple, after the sale, seem to censor unwanted wording within such songs - cannot say for sure, but read some magazine articles about such proceeding from them - of course, this has only pittoresque value in comparison with "real data", hence the parentheses, but this seems to show that "they" believe to be the master of your data, big-style, AND for the little, irrelevant things - it seems to indicate their philosophy.
(Edit: Another irony here: Our possible data is generally deemed worthless, both from "them", and from some users (a fellow here, just weeks ago: "It's the junk collecting hobby of personal data."), whilst anything you want or need access to (and even a 20-years-old article on AS), deemed "their data", is considered pure gold, 3 pages for 40 bucks - so not only they sell, instead of just making available to the general public its own property, but on top of that, those prices are incredibly inflated.
But here's a real gem. Some of you will have heard of the late French film <i>auteur</i>, Eric Rohmer, perhaps in connection with his most prominent film, <i>Pauline at the Beach</i>. In 1987, he did an episodic film, <i>4 aventures de Reinette et Mirabelle</i>, from which I recommend the fourth and last episode to you which on YT is in three parts, in atrocious quality but with English subtitles, just look for "Eric Rohmer Selling the Painting": It's a masterpiece of French Comedy, and do not miss the very last line! (For people unwilling to see even some minutes of any French film, you'd have learned here the perfect relativeness of the term "value" - it's all about who's the owner of the object in question at any given moment.) If you like the actor, you might want to meet him again in the 1990 masterpiece, <i>La discrète</i> (There's a <i>Washington Post</i> review in case you might want to countercheck my opinion first. And yes, of course there's some remembrance of the very first part of Kirkegaard's Enten-Eller to be found here.)...)
II
6. There is the collaboration argument, and there is the access-your-data-from everywhere, without juggling with usb sticks, external harddisks and applics like GoodSync Portable and - I'm trying to be objective - where there is a data loss problem, and thus a what-degree-of-encryption-is-needed prob too: Your notebook can be lost or be stolen, and the same goes for these external storage devices. But let's assume the "finder" / thief here will not be the NSA but, on most cases, not even your competitor, but just some anonymous person dumping your data at least when it's not immediately accessible, i.e. here, except for special cases, even rudimentary encryption will do.
7. I understand both arguments under 6, and I acknowledge that cloud services offer much better solutions for both tasks than you can obtain without these. On the other hand, have a look at Mindjet (ex-MindManager): It seems to me that even within a traditional workgroup, i.e. collaborators physically present in the same office, perhaps in the same room, collaboration is mandatorily done by cloud services and can't be done just by the local workgroup means / cables - if this is correct (I'm not certain here), this is overly ridiculous or worse, highly manipulative on part of the supplier.
8. Whenever traditional desktop applications "go cloud", they tend to lose much of their previous functionality within this process (Evercrap isn't but ONE such, very prominent example, but there are hundreds), and the arguments, "we like to hold it simple" and such idiotic excuses, and even when there's highly profession developer, as is in this case of Neville, it seems that the programming effort for the cloud functionality at least heavily slows down any traditional, "enhancement" or even transposition programming of the functionality there has been - of course, how much transposition is needed, depends on the how-much-cloud-it-will-be part of the venue of that particular sw. As a general rule though, users of traditions sw going cloud use a lot of functionality and / or have to wait for years for their sw to recoup afterwards, from this more or less complete stalling of non-cloud functionality. (Hence the "originality" of Adobe's CS's "cloud" philosophy where the complete desktop functionality is preserved, the program continuing to work desktop-based, with only (? or some additional collaborative features, too, hopefully?) the subscription functionality laid off cloudwise.
III
9. Surfulater is one of the two widely known "site-dumping" specialist applics out there, together with the German WebResearch, the latter being reputed "better" in the way, it's even more faithful to the original for many pages being stored, i.e. "diffult" pages are rendered better, and in the way that it's quicker (and quicker perhaps especially with such "difficult" pages), whilst the former is reputed to be more user-friendly in the way of everyday handling of the program, sorting, accessing, searching... whatever. I don't have real experience (i.e. over short trial) with either program, so I the term here is "reputation", not "facts are...". It seems to be common knowledge, though, that both progs do this web page dumping much better than even the heavyweights in traditional pim world, like Ultra Recall, MyInfo, MyBase, etc.
10. Whilst very few people use WebResearch as a pim (but there are some), many people use Surfulater as a general pim - and even more people complain about regular pim's not being as good for web page dump, as these two specialists are. For Surfulater, there's been going on that extensive discussion on the developer's site that has been mentioned above, and it seems it's especially those people who use the prog as a general pim, are very affected by their pim threatening to go cloud, more or less, since it's them that would be most affected by the losing-functionality-by-going-cloud phenomenon described in number 8. Neville seems to reassure them, data will be available locally, and by cloud, which is very ok. But then, even Surfulater as it is today, is missing much functionality that would be handy for making it a real competitor within the regular pim range, and you'll be save in betting on these missing features not being added high-speed too soon: the second number 8 phenomenon (= stalling, if not losing).
11. So my personal opinion on Surfulater and WebResearch is, why not have a traditional pim, with most web content in streamlined form, i.e. no more systematic dumping web pages into your pim or into these two specialist tools, but selecting relevant text, together with the url and the download date/time "stamp", and pasting these into your pim, as plain text you will then format according to your needs, meaning right after pasting, you'll bold those passages there that will have motivated you to download the text to begin with, and this way, instead of having your pim, over the years, collect an incredible amount of mostly crap data, you'll constitute yourself a valid respository of neat data really relevant to your tasks. Meaning, you'll do a first data focussing / condensing of data right on import.
12. If your spontaneous reaction to this suggestion is, "but I don't have time to do this", ask yourself if you've been collecting mostly crap up so far: If you don't have 45 or 70 sec. for bolding those passages that make the data relevant to you (the pasting of all this together should take some 3 sec. with an AHK macro if really needed, or better, by an internal function of your pim, which could even present you with a pre-filled entry dialog to properly name your new item here), you probably shouldn't dump this content into your pim to begin with. Btw, AHK allows for dumping even pictures (photos, graphics) from the web site to your pim, 1-key-only (i.e. your mouse cursor should be anywhere in the picture, and then, it'd be one key, e.g. a key combination assigned to a mouse key), and eventually, you pim should do the same. Of course, you should dump as few such pictures as is absolutely necessary, to your pim, but technically (and I know that in special cases this would be almost necessary, but in special cases only), it's possible to have another AHK macro for this, and also your pim could easily implement such functionality.
13. This resets these two specialists to their specialist role: Dumping complete web pages in those rare cases this might be necessary, e.g. mathematicians and such who regularly download web pages with lots of formulas, i.e. text and multiple pictures spread all over the text - but then, there are less and less such web pages today, since most of them, for such content, have links to pdf's today instead, and of course, your pim should be able to index pdf's you link to from within (i.e. it should not force you to "embed" / "import" them to this end). Also, there should be a function (if necessary, i.e. if absent from your pim, by AHK) that does this downloading of the pdf, then linking to it from within your pim, 1-key style, i.e. sparing you the effort to first download the pdf and then do the linking / indexing within your pim, which is not only an unnecessary step but also will create "orphaned" pdf's, that will not be referenced to within you pim).
IV
14. This means we need better pim's / personal and small workgroup IMS / information management systems, but not in the way, "do better web page import", but in the way, "enhance its overall IM functionality, incl. PM and incl. half-automatted web page CONTENT import (incl. pdf processing)". Please note here that while a "better" import of web-pages-as-a-whole is an endless tilt at windmills that blocks the programming capacity of any pim developer to an incredible degree, ever one, and worse today than it ever did, such half-automation of content dumping / processing is extremely simple to implement on the technical level. This way, pim developers wouldn't be blocked by such never-ending demands (and never-ending almost independently to their respective efforts to fulfill them) to better reproduce imported web pages (and to quicker import them) anymore, but could resume their initial task, which is to conceive and code the very best IMP possible.
15. Thus, all these concerns about Surfulater "going cloud", and then, how much, are of the highest academic interest, but of academic interest only: In your everyday life, you should a) adopt better pim's than Surfulater is (as a pim), and then enhance, by AHK, core data import (and processing) there, and then b) ask for the integration of such feature into the pim in question, in order to make this core data processing smoother. Surfulater, and WebResearch, have their place in some special workflows, but in those only, and for most users, it's certainly not a good idea to constitute web page collections, be it within their pim, or be it presumably better-shaped collections within specialized web page dumpers like Surfulater and WebResearch, whose role should be confined to special needs.
V
16. And of course, with all these (not propanda-only, but then, valid) arguments about cloud for collaboration and easy access (point 6), there's always that aspect that "going cloud" (i.e. be it a little bit, be it straightforward) enables the developer to introduce, and enforce, the subscription scheme he yearn so much for, much better than this would ever be possible for any desktop application, might it offer some synch feature on top of being desktop, or not. Have a look upon yesterday's and today's Ultra Recall offer on bits: Even for normal updates, they now have to go bits-way, since there's not enough new functionality even on a major upgrade, so loyal, long-term users of that prog mostly refuse to update halfprice (UR Prof 99 bucks, update 49 bucks, from which, after payment processor's share, about 48,50 bucks should go to the developer), and so, some of them at least "upgrade" by bits, meaning a prof. version is starting starting price 39 bucks, from which 19,50 go to bits, 50 cent or something to the payment processor, leaving about 19,00 bucks for the developer; it's evident that with proper development of the core functionality (and without having to cope with constant complaints of the web page dump quality of his prog (Edit: prices corrected)), the developer easily could get 50 bucks for major upgrades of his application, instead of dumping them for 19 bucks: And that'd mean, much more money for the developer, hence much better development quality, meaning more sales / returns, hence even better development...
17. As you see here, it's easy to get into a downward spiralling, but it's also easy to create a quality spiral upwards: It's all just a question of properly conceiving your policy. And on the users' side, a rethinking of traditional web page dumps is needed imo. It then would be a faux prob to muse about how to integrate a specialist dumper into your workflow: rethink your workflow instead: Three integral dumps a years don't ask for integration, but for a link to .mht.
18. And yes, I got the irony in downloading for then uploading again, but then, I see that's for preserving current states, while the dumped page will change its content or even go offline. But this aspect, over the policy of "just dump the content of real interest, make a first selection of what you'll really need here", in most possible cases could only prevail for legal reasons, and in these special cases, neither Surfulater nor WebResearch are the tool you'd need.
VI
19. As said, the question of "how Neville will do it", i.e. the distribution between desktop (data, data processing) and cloud (data, data processing again), and all the interaction needed and / or provided, will be of high academic interest, since he's out to do something special and particular. But then, there's a real, big prob: We get non-standardization here, again. Remember Dos printer drivers, as just one but everyday example of the annoyances of non-standardization? Remember those claims, for this prog and that, "it's HP xyz compliant"? Then came Windows, an incredible relief over all these pains. On the other hand, this buried some fine Dos progs since soon no more drivers for then current printers, and other probs; just one example is Framework, a sw masterpiece by Robert Carr et al. (the irony here being that Carr's in cloud services today).
20. Now, with the intro of proprietary cloud functionality, different for many such applications going cloud today, we're served pre-Windows incompatibility chaos again, instead of being provided more and more integration of our different applications (and when in a traditional work group, you at least had common directories for linked- and referenced-to shared files). That's "good" (short-term only, of course) for the respective developers, in view of the subscription advantage for them (point 16), but it's very bad for the speed of setting-in-place of really integrated workflow, for all of us, i.e. instead of soon providing a much better framework for our multiplied and necessarily more and more intervowen tasks (incl. collaboration and access-from-everywhere, but not particular to this applic or that), and for which from a technical pov, "time's ripe", we have to cope with increasing fractionization in what developers in search of a steady flow of income (cf. the counter-example in point 16, and then cf. Mindjet) think what particular offerings are "beneficial" for us.
All the more so you should discard any such proprietary "solution" from your workflow when it's not necessarily an integral part of that. Don't let them make you use 5 "collaborative" applications in parallel just because there's 5 developers in need of your subscription fee.