DonationCoder.com Software > Clipboard Help+Spell
Feature request: Web clipping, permanent note keeping
mouser:
As an aside: What I've said before is that I am always open to creating a *new* notetaking/pim tool, if someone could convince me that there is a real need for one that is highly specialized and focused on solving a narrow particular need/approach/workflow/personality. There are some really good general purpose note tools out there that I have no interest in trying to compete with. But if someone could come up with a streamlined focused idea for a lightweight notetaker with a specific point of view i'd bite.
IainB:
@rjbull:
I don't know SQL, and don't really understand tags or CHS's virtual folders, so I hope that the more useful features can be accessed without writing code... I'd like to have at least simple Boolean logic.
As for OCR, that sounds to me a step too far. That is, a lot of effort required to build something that wouldn't be used all that often. But as CHS now stores images, and can accept external tools, is there any way of adding an OCR program as an external tool?
-rjbull (August 24, 2011, 02:57 PM)
--- End quote ---
SQL: I don't know SQL either, but am learning it. The UI (User Interface) should probably be more friendly and insulate the user from having to use SQL unless he/she wants to use it (I would, for example). That's kinda how CHS has the SQL implemented in Virtual Folders at the moment, I guess, but the insulation and the UI-frienliness bits could probably be improved upon.
OCR: Actually, OCR is arguably well overdue and a step in the right direction, rather than "a step too far". That's why I gave the examples above on the subject. Quite a few applications seem to be using OCR to scan captured images (e.g., photos, fax, scanned document images, PDF-based images) to make them text-searchable (where text is detected in the image), rather than transcribe the OCR to text as data. I gather that there is some good public domain software about that can be used to do this, so you don't have to reinvent the wheeel - for example, for CHS.
Examples of applications that do this "OCR for text-searchable" are:
* Evernote.
* Qiqqa (what a brilliant reference management tool that is!).
* OneNotes (a pretty good PIM).
* Google docs (can also do OCR-to-text, if you ask for it).
Some applications also make the text in the image copyable to clipboard, but do not actually produce a text output file. This may be as good as full OCR-to-text for most users' purposes.
IainB:
@mouser:
As an aside: What I've said before is that I am always open to creating a *new* notetaking/pim tool, if someone could convince me that there is a real need for one that is highly specialized and focused on solving a narrow particular need/approach/workflow/personality. There are some really good general purpose note tools out there that I have no interest in trying to compete with. But if someone could come up with a streamlined focused idea for a lightweight notetaker with a specific point of view i'd bite.
-mouser (August 24, 2011, 11:24 PM)
--- End quote ---
Yes, there are indeed "...some really good general purpose note tools out there" and I would not recommend that anyone try to compete with them, simply because most of them are off down the same GP ("general purpose") track and with a moronic design approach that has them stuck in a "fixed tree structure" paradigm and a self-defeating focus on superficial "features" and "look and feel". Info Select v10, for example, has just implemented "The Ribbon" interface, but apparently not because any users wanted it or even asked for it! Oh no, it was the Chief Designer who seems to have thought it would be "a good idea". So the ribbon was implemented (mandatory) and several valid requests to consider their genuine and long overdue requirements for functionality - from users (such as myself, for example) - remained largely ignored. So I and others have walked with our feet.
The major requirements for a PIM would typicaly include:
* That information/data can easily be captured from anywhere in its databse, and referenced. (Not forgetting email, which is data.)
* That the database be of as non-proprietary a structure as possible, to avoid dependency.
* That the creation of meta-data be automated in a rational and standardised fashion wherever possible.
* That the information/data need not necessarily always be stored in the database, or may need to be secondary copies of the data, if other applications may need to reference the same data or search/index it, or if it is inefficient to store that data because of bloat.
* That the information/data can be stored in a highly flexible structure for categorisation and identification, such that the structure can be restructured/re-arranged or the information/data be re-categorised or resorted to meet a new requirement for a different presentation or view of that information.
* That any restructuring, rearrangement or re-categorisation be as automated as possible, thus minimising/avoiding the need to make manual updates or changes to the data or the meta-data, except those necessary to update the data for currency/correctness.
The points I would make, in response to your "aside", are:
(a) that in CHS, you effectively already have a *new* notetaking/PIM tool. You could either branch it into the new tool, or keep it integral with CHS. I would recommend the latter, until you are driven to make them separate (see rationale below).
(b) if, as you say, you need someone to convince you that there is a real need for one that is highly specialized and focused on solving a narrow particular need/approach/workflow/personality, then you have me and a small army of people with like/similar needs. This is not an imaginary army, though it might be difficult to estimate its actual numbers, but you will find them in forums on the Internet, writing disconsolate reviews of these GP PIMs and often decrying the GP PIM rubbish and bemoaning the obsolescence of Lotus Agenda, for example.
The important differences are that Lotus Agenda (by design) and CHS (apparently by accident?!) are lightweight RDBMS tools. I would respectfully suggest that you may be so close to CHS that you could be unable to appreciate the full power/potential of what has been done in CHS already.
Capturing data/content (text/images): This is the rationale for my suggesting (above) that you should keep the clipping tool and the PIM together (consolidated) until driven to make them separate.
Capturing data/content efficiently and effectively is of critical importance - a mandatory "must have" user requirement for a PIM.
However, the GP PIMs are mostly hopeless at efficiently and effectively capturing data/content in a usable form. Typically, you have to go into the blasted PIM, press the insert button or do something specific to prepare the PIM to recieve external input in the PIM's proprietary format, then go to the external source to select/capture the input. And you still can't do it properly. This is why I use the Firefox Add-in "Scrapbook" to capture web pages - there's no better tool. I have to put up with the fact that my database is in disparate islands - the Scrapbook files are one such island. I can search that using Scrapbook functionality (slow) or Desktop search (I used to prefer Google desktop, but Win7 search is pretty good now).
Info Select was a bit different:
* It has a "cliipper" tool that sits in the Systray with a yellow lightning bolt icon. You select your text/material to copy in (say) your browser, then go to the Systray to click on the lightning bolt, which turns red whilst it is busy doing the copy, and then turns green when it has succeeded. About 98% of the time, it does not work. Mostly it stays red and you have to manually copy/paste the material. Good idea in the desgn, but a failue in execution on the implementation. So nobody uses it now.
* It also could read and store web pages using its internal browser. This was based on the IE engine, but was broken by new versions of IE, so it is of no use at all now.
* It had an ability to integrate with and process email. The functionality for this brilliant design idea was poorly implemented and has been broken by newer technology on the input side, so it is of no use at all now. (I think email functionality may have been removed altogether after v8.)
One major lesson there is to have the PIM application control the clipping and input and the conformance to clip format and other input standards. (Get the point?)
OneNote 2007 was a bit different:
* It has a "Clip to OneNote" button that appeared in most of your applications. You just selected the area (text/image) to copy, pressed the button, and an image clip went to OneNote. This was pretty efficient, effective and useful. However, the "Clip to OneNote" functionality was broken by the Win7 x64 OS, and to make it work now you have to use a really kludgey workaround devised by a clever OneNote MS engineer. I don't use it much now. What a pity.
One major lesson there is to have the PIM application control the clipping and input and the conformance to clip format and other input standards. (Sorry to labour the point.)
rjbull:
I don't want to learn SQL, so like the idea of having a nice UI for the main functions, with bare-naked SQL available for those skilled in the art.
I don't see OCR as a must-have. When I used to have to extract data from scanned-image PDFs of patents, I was content to use a separate application do the job. The more so, as OCR is often imperfect, so it was better to put the converted text into an editor/word processor for spell-checking and reformatting.
I'm not keen on tags/keywords. You have to know what you need in advance, and be consistent in applying them. I'd rather have really good retrieval from title + body text of the clips, which is why I like to see Boolean searching. If you must have keywords, then yes, it would be nice to be able to apply them to multiple clips at a time. Also to have a CintaNotes-like feature where you had some kind of drop-down or auto-completion.
I don't expect very close integration with an e-mail client. There are too many clients to service them all. I can either include information through the clipboard, or by exporting from TheBat! and importing or clipping the resulting text file.
It should be possible to store everything within the database, even if only a copy, to make the database portable and easier to back up.
I just want CHS to become a really good storage and retrieval database for information that passes through the clipboard, with an accent on Web clips, as well as a transient clips tool.
IainB:
I don't want to learn SQL, so like the idea of having a nice UI for the main functions, with bare-naked SQL available for those skilled in the art.
-rjbull (August 25, 2011, 08:29 AM)
--- End quote ---
Sorry, if you thought I was suggesting that anything in this requirement of yours should not be met. I apologise for not explaining myself very well above. I was suggesting pretty much exactly what you say you do/don't want here. No disagreement at all, though I would not use a tautology to describe SQL.
I don't see OCR as a must-have. When I used to have to extract data from scanned-image PDFs of patents, I was content to use a separate application do the job. The more so, as OCR is often imperfect, so it was better to put the converted text into an editor/word processor for spell-checking and reformatting.
-rjbull (August 25, 2011, 08:29 AM)
--- End quote ---
Sorry again if I did not not explain myself very well above. It's not that I disagree with what you say, it's just that I think we have to strive to move forwards in the use of technology, to use it more effectively and efficiently, and to overcome the constraints of that technology. History has shown that we can do this. That's how we got men on the moon, for example. In our OCR case, I can better explain if I make a comparison: OCR is to data gathering/extraction what push-button dialling was to the telephone. I feel sure that some people may have felt that the push-buttons were an annoying but passing fad and didn't work terribly well, but would we be advantaged nowadays by retaining the circular phone dial? The answer is self-evident - "No". Though I have to admit that I dislike push-buttons on cellphones, because my finger-ends are too large and spatula-like for the smaller buttons, I would not recommend returning to the dials.
I'm not keen on tags/keywords. You have to know what you need in advance, and be consistent in applying them. I'd rather have really good retrieval from title + body text of the clips, which is why I like to see Boolean searching. If you must have keywords, then yes, it would be nice to be able to apply them to multiple clips at a time. Also to have a CintaNotes-like feature where you had some kind of drop-down or auto-completion.
-rjbull (August 25, 2011, 08:29 AM)
--- End quote ---
Sorry again if I did not not explain myself very well above. I share your lack of keeness for tags/keywords, and for pretty much exactly the same reasons as you give. Which was why being able to turn a group or "favourite" into a VF (Virtual Folder) in CHS - by enabling the use of some SQL - blew me away. That's exactly what you need to enable Boolean searching, see? It's what you want, and it was exactly what I had been used to using in Lotus Agenda. VF = Virtual Tag. As a worked example, I decided that I wanted to filter out all those records (clips in CHS) that contained a reference to one of the main religions. So I used my budding arcane knowledge of SQL to write an SQL filtering statement like this:
((Lower(ClipText) LIKE '%islam%') OR (Lower(ClipText) LIKE '%muslim%') OR (Lower(ClipText) LIKE '%roman catholic%') OR (Lower(ClipText) LIKE '%christian%') OR (Lower(ClipText) LIKE '%anglican%'))
A user-friendly UI would turn that into a VF with the name "Religion" with something like this: ("Islam" OR "Muslim" OR "Roman Catholic" OR "Christian" OR Anglican")
For obvious reasons, I want to be able to perform these Boolean searches on date and time fields. I would also want to have Condition-->Action, where Boolean Conditions can be used to result in an Action: e.g. IF ("Islam" OR "Muslim" OR "Roman Catholic" OR "Christian" OR Anglican") THEN Keyword = "Religion". Though you don't really need to do that in this case, because you can simply check to see if a record is a member of a VF/group named "Religion", forcing or setting a Keyword or label like "Religion" has the advantages that you can automate the labelling of many records at once (saves time over manually setting each record to that Keyword) and that once you have done that you can then disable the SQL that did it, thus ensuring that your population of "Religion" records is fixed at that point, unless you directly manually assign other records to that Keyword at a later stage. This has a lot to do with "flexibility" in creating your meta-data - you can meet your data managment needs very precisely.
I don't expect very close integration with an e-mail client. There are too many clients to service them all. I can either include information through the clipboard, or by exporting from TheBat! and importing or clipping the resulting text file.
-rjbull (August 25, 2011, 08:29 AM)
--- End quote ---
Sorry again if I did not not explain myself very well above. The requirement is to use emails (some, not all - only those that you feel would be useful) as records in your database. Importing sent/received emails to the database from an email client or from a web-based email service would suffice.
I learned from my experience of Info Select: Version 7 was integrated with Outlook, which was of no use to me as I did not use Outlook. Version 8 started to abandon Outlook integration, and incorporated its own rather good email client. I used that email client to manage most of my email, though I tended to continue to use Pegasus - the email client I preferred at that time - for managing Listserver groups. Info Select 8 enabled you to convert emails (which were in text or html format) to plain unformatted text, at the press of a button - a very handy feature.
It should be possible to store everything within the database, even if only a copy, to make the database portable and easier to back up.
-rjbull (August 25, 2011, 08:29 AM)
--- End quote ---
Whenever I read that such-and-such "should" be the case, it usually means that what I have read is an arbitrary, unsubstantiated opinion (unless there is some special law or rule that supports it). The case would seem to be no different here.
I would recommend that we leave the design of the database to the engineers who are building the thing - or, as my mother used to say to me, "Don't teach your grandmother to suck eggs." The engineers are usually far from stupid, and are likely to be able to fully appreciate the need for portability and backup and a few other things, some of which we might not even have thought about, as well.
In the case of CHS, we seem to have the text records held within the database, and the image clips as .PNG files held in a defined folder outside of the database. There will be good theoretical and/or pragmatic reasons for this.
I just want CHS to become a really good storage and retrieval database for information that passes through the clipboard, with an accent on Web clips, as well as a transient clips tool.
-rjbull (August 25, 2011, 08:29 AM)
--- End quote ---
So do I, except that CHS looks to me as though it is already a pretty good database to hold clipped data/images, enabling increasingly flexible and sophisticated filtering/tagging, sorting and retrieval - through the use of SQL (that enables Boolean search filters). CHS now retains the source URL of clipped data, and I think @mouser is probably contemplating how best to deal with partial/whole web clips without reinventing the wheel.
Flagging records as "permanent" or "transient" could be done by enabling some of the functionality decribed above - as I put it (above), "This has a lot to do with "flexibility" in creating your meta-data - you can meet your data managment needs very precisely."
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version