ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Software idea for speed reading/skimming text by color-coding important keywords

<< < (2/4) > >>

mouser:
This is because I think there would be not many cases where a "static" or "custom" word list would be the same even for only as little as two different web pages, blog lines or the like.
--- End quote ---

ok so i see reading back now that this part of your idea is actually more interesting/novel than i gave it credit for initially.  you are saying that, for any given page, it would be interesting to see highlighted words with some frequency pattern local to that page.  You suggested that high frequency (but globally uncommon) words might be most useful to highlight though i can see the opposite being the case..  Either way it is an interesting aspect of the idea (though i'm still a bit doubtful about it being more useful that coloring based on custom tweaked manual word sets).

David.P:
you are saying that, for any given page, it would be interesting to see highlighted words with some frequency pattern local to that page.-mouser (November 08, 2010, 09:41 AM)
--- End quote ---

Exactly. Actually what I'd like this tool to do is doing some form of "pre-reading" in any amount of text for me, and highlighting - as intelligently as possible - the most important keywords (i.e. the words that show best what that certain text is about in the first place).

Therefore, the algorithm for identifying the unique keyword set of any given text would be of crucial importance in my view.

Cheers David.P

barney:
Take a look at this Wikipedia tool. (Note that it works on more than Wikipedia.)  I think it falls short of your requirements, but it has worked middlin' well for me when perusing really long documents.

David.P:
Yeah -- thanks for the information.

This leads me to the search term "Automatic Text Summarizing" where obviously quite a lot of research has already been done.

However I think the algorithm for keyword extraction would not even have to be that complicated.

Anyway, now that the function of that possible future automatic keyword color highlighting tool is laid out, I already miss that function on every blog or news site that I visit, really hard!

Cheers David.P

barney:
However I think the algorithm for keyword extraction would not even have to be that complicated.
--- End quote ---

I'm a bit dubious on that one.  If it's a client-supplied list, no problem.  But if it's based on frequency, that's a horse of a different color.  The critical keyword(s) for a given text might not be as high in frequency as many other words.

For instance, you're reading an article on The Indigent Population of Certain Polynesian Islands.  Indigent would certainly be a keyword, but might never be mentioned save in the title.  References within the article might be native, poor, disadvantaged and the like.  So you'd need a pretty strong thesaurus algorithm to catch the proper keywords, ones relevant to the thrust of the article, for an appropriate summary.

It's fairly easy to condense an article, not so easy to have the condensed version contain the meat of the original article.  We had a team working on that for some documentation several years ago.  Five members, as I recall, and they had difficulty reaching consensus on distillations even when they were involved in open discussion of the material to be condensed.  I'd hate to have to try to write that program, alone or in a dev team.  'Twould be an interpretational nightmare, methinks.

Unlike Web page keywords, frequency in an article is not necessarily indicative of importance or relevance.  I suppose, if the title were true to the purpose of the article, you could extract keywords from it, then do a thesaurus lookup for relevant words, but even that would be a nasty job, since many thesaurus entries would not be meaningful to the article's purpose.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version