Home | Blog | Software | Reviews and Features | Forum | Help | Donate | About us
topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • December 05, 2016, 06:38:03 AM
  • Proudly celebrating 10 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Software idea for speed reading/skimming text by color-coding important keywords  (Read 11271 times)

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 175
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Hi forum,

I read a LOT on the Internet every day -- like blogs, news, forums etc.

Very often however, I'd simply like to get the basic idea of the contents e.g. of a (long) blog or news text, but can't be bothered to read my way through the entire length of it in the first place.

Therefore I wish there was a software (or online service) that could do the following:

  • 1)   Rank and list the most frequently used keywords in any (selected) amount of text.
  • 2)   Remove small and common words from the ranking/list, and only keep like a dozen important keywords from the text.
  • 3)   Finally, apply different text highlighter colors to the respective occurrences of those text keywords.


Almost all of this already can be done semi-manually by using Wordcounter, Firefox and Googlebar Lite.

Wordcounter does 1) and 2):


and Googlebar Lite:


does 3):


Here's an example of what an article looks like with the highlights applied.

Although that might be a bit too much color (maybe like 5 to 7 colors and respective keywords would be good for a start), I am quite convinced that this would be a great way of quickly ploughing through the masses of text that one comes across during a typical day, and where it would be nice to get a basic idea of it's contents first -- before maybe going through some of it in somewhat more depth.

Well anyway, it would be GREAT if this idea could be made into a Firefox extension (or something) that automatically applies steps 1) through 3) to any amount of text that for example has been selected on a web page....

Thanks heaps already for comments or further ideas that anyone might have regarding this idea,

Cheers David.P

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 36,406
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
seems like a great idea to me.. and i can see just as interesting would be to let user define their own custom word sets that list certain words that should shown in certain colors.. would be great for people who are specialists looking for certain things.  nice if you could switch wordsets depending on your task.

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 175
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Thanks mouser, for your approval as well as your idea about custom word sets for specialist research and the like.

Awww I'm actually already dreaming about a toolbar button in Firefox that simply applies keyword highlighting automatically to every web page that I visit...! This would be such an INCREDIBLE time saver for me...

Anyone....? As for me, I'll donate in the double-digit range for a working solution....

Probably, for a robust solution and in order to automatically identify important keywords (and leave out most general, common words) it would be good to cross-check the pages' keywords against a general language word frequency list, and only highlight those frequent keywords on the page that are LESS frequent on a general scale (and therefore potentially MORE important for the currently displayed text).

Cheers David.P

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 36,406
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
I think the easiest thing would be to separate the task of *AUTOMATICALLY* determining what words to highlight using frequency analysis, from the task of actually highlighting/coloring words based on manually configured wordlists.

Then a separate tool can be used to create wordlists based on frequency counting, etc.  Lot's of tools for that, and people can share their lists, etc., i don't see too much value trying to build this functionality into the addon itself.  Just have it let the user create and switch between WordColorSets that specify which words should be shown in which colors.

I think it's a great idea, and would make a terrific NANY 2011 project for someone looking for an idea and willing to make a FireFox plugin.

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 175
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Mouser, thank you! Maybe this thread should be moved to the NANY section then?

For me personally, I think it would be important to have frequency counting and word list creation done automatically somehow (could be done by a separate tool of course or by an online service etc., you name it). This is because I think there would be not many cases where a "static" or "custom" word list would be the same even for only as little as two different web pages, blog lines or the like.

Cheers David.P

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 36,406
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Quote
This is because I think there would be not many cases where a "static" or "custom" word list would be the same even for only as little as two different web pages, blog lines or the like.

ok so i see reading back now that this part of your idea is actually more interesting/novel than i gave it credit for initially.  you are saying that, for any given page, it would be interesting to see highlighted words with some frequency pattern local to that page.  You suggested that high frequency (but globally uncommon) words might be most useful to highlight though i can see the opposite being the case..  Either way it is an interesting aspect of the idea (though i'm still a bit doubtful about it being more useful that coloring based on custom tweaked manual word sets).

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 175
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
you are saying that, for any given page, it would be interesting to see highlighted words with some frequency pattern local to that page.

Exactly. Actually what I'd like this tool to do is doing some form of "pre-reading" in any amount of text for me, and highlighting - as intelligently as possible - the most important keywords (i.e. the words that show best what that certain text is about in the first place).

Therefore, the algorithm for identifying the unique keyword set of any given text would be of crucial importance in my view.

Cheers David.P

barney

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 1,282
    • View Profile
    • Donate to Member
Take a look at this Wikipedia tool. (Note that it works on more than Wikipedia.)  I think it falls short of your requirements, but it has worked middlin' well for me when perusing really long documents.

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 175
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Yeah -- thanks for the information.

This leads me to the search term "Automatic Text Summarizing" where obviously quite a lot of research has already been done.

However I think the algorithm for keyword extraction would not even have to be that complicated.

Anyway, now that the function of that possible future automatic keyword color highlighting tool is laid out, I already miss that function on every blog or news site that I visit, really hard!

Cheers David.P

barney

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 1,282
    • View Profile
    • Donate to Member
Quote
However I think the algorithm for keyword extraction would not even have to be that complicated.

I'm a bit dubious on that one.  If it's a client-supplied list, no problem.  But if it's based on frequency, that's a horse of a different color.  The critical keyword(s) for a given text might not be as high in frequency as many other words.

For instance, you're reading an article on The Indigent Population of Certain Polynesian Islands.  Indigent would certainly be a keyword, but might never be mentioned save in the title.  References within the article might be native, poor, disadvantaged and the like.  So you'd need a pretty strong thesaurus algorithm to catch the proper keywords, ones relevant to the thrust of the article, for an appropriate summary.

It's fairly easy to condense an article, not so easy to have the condensed version contain the meat of the original article.  We had a team working on that for some documentation several years ago.  Five members, as I recall, and they had difficulty reaching consensus on distillations even when they were involved in open discussion of the material to be condensed.  I'd hate to have to try to write that program, alone or in a dev team.  'Twould be an interpretational nightmare, methinks.

Unlike Web page keywords, frequency in an article is not necessarily indicative of importance or relevance.  I suppose, if the title were true to the purpose of the article, you could extract keywords from it, then do a thesaurus lookup for relevant words, but even that would be a nasty job, since many thesaurus entries would not be meaningful to the article's purpose.

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 175
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
True, however for a start I'd be perfectly happy with a simple highlighting of the most frequent words of any text, minus a list of common words, like Wordcounter.com already does not too bad, see the result in my orignal post above.

The next step then could be some more sophisticated algorithms like matching with linguistic word lists, and more.

Anyway, the first step should be doable with very simple tools, even if it is simply most frequent words minus words that are shorter than "n" characters.

Most importantly, this way somebody should be able pick this idea up and maybe start hacking together a simple Firefox extension, for a start.

Cheers David.P

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,220
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
I really like the idea a lot. It's really the kind of things that's right up my alley.

There are a few things I would note about it though.

* Keyword frequency is essential
* Thesaurus looking is also essential, especially for well written works
* A grammar engine would also be needed though. The above are not enough

A grammar engine (like a spell checker) would let you work with your frequencies and synonyms to combine them for longer highlighted sections. The effect is that you could then highlight CONCEPTS better, rather than just keywords.

For example:

A)
A grammar engine (like a spell checker) would let you work with your frequencies and synonyms to combine them for longer highlighted sections. The effect is that you could then highlight CONCEPTS better, rather than just keywords.

B)
A grammar engine (like a spell checker) would let you work with your frequencies and synonyms to combine them for longer highlighted sections. The effect is that you could then highlight CONCEPTS better, rather than just keywords.

The second there is easier to read as it groups a unit of thought rather than a unit of language.

The key though is being able to locate a grammar engine that you can use in the software as everything else is relatively easy in comparison. There's no way that anyone could develop a grammar engine for software like this as the scope is simply far to large there.
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,220
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Here's one example I found: http://www.wintertre.../wgrammar/index.html

They're kind of hard to find, and quite expensive, which I fully expected.
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 175
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
All agreed, but this will be the icing. Let's start with the cake.

Cheers David.P

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 175
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Mission Accompished
« Reply #14 on: January 11, 2011, 10:05:54 AM »
Hi forum,

proudly announcing that the first version of the tool discussed above now is ready and working.

Please read on here (sorry only Google Translation for the time being):

Reading 3.0: Tool for professional readers and news junkies - exclusive premiere




Cheers David.P
« Last Edit: April 05, 2011, 08:08:16 AM by David.P, Reason: screenshot update »

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 36,406
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
wow that looks great.  :up:

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 175
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Thanks mouser :Thmbsup: Please also note that much of the (programming) credits and kudos go to "my" team mentioned at the bottom of the page linked above.

Cheers David.P

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 175
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Here's the latest version. This one also works in Opera, thanks to a fix by QuHno.

Regards David.P
« Last Edit: April 07, 2011, 01:16:28 AM by David.P, Reason: typo »