Author Topic: need duplicate word scanner (Read 9660 times)

bit · « **on:** October 18, 2014, 04:27 PM »

In Microsoft Word 2003, which I'm not terribly familiar with, is there any way -or app- to scan documents for duplicate words?
A way is needed to discover if any sentence is buried deep in the document, which uses the same descriptive word twice in an obviously redundant manner.
There would also be a need to exclude common words like 'the', 'and', 'of', and so on, and to choose if the search is case sensitive or not.
Or, if there is any other program, for instance Thunderbird Compose or whatever, which the document could be copied & pasted into, to accomplish the same goal.
Any help would be most gratefully appreciated.

dr_andus · « **Reply #1 on:** October 18, 2014, 05:12 PM »

Have you looked at SmartEdit? There used to be a free version I think, but maybe the trial version will do the job, too.

bit · « **Reply #2 on:** October 18, 2014, 07:42 PM »

Have you looked at SmartEdit? There used to be a free version I think, but maybe the trial version will do the job, too.
-dr_andus (October 18, 2014, 05:12 PM)

^Way cool. Tnx!

mouser · « **Reply #3 on:** October 18, 2014, 08:53 PM »

it is a cool idea.

bit · « **Reply #4 on:** October 19, 2014, 01:58 PM »

Have you looked at SmartEdit? There used to be a free version I think, but maybe the trial version will do the job, too.
-dr_andus (October 18, 2014, 05:12 PM)

Yes, SmartEdit is really quite awesome.
I got the basic non-MS Word model to work with MS Word 2003.
However, I don't seem to be able to get it to do what I originally wanted, which is to find unnecessarily duplicated words -such as adverbs- in the same sentence.
It just doesn't seem to have any way to do that.

dr_andus · « **Reply #5 on:** October 19, 2014, 03:42 PM »

However, I don't seem to be able to get it to do what I originally wanted, which is to find unnecessarily duplicated words -such as adverbs- in the same sentence.
It just doesn't seem to have any way to do that.
-bit (October 19, 2014, 01:58 PM)

It should be possible to do it, but with a bit of manual work involved. First, select the "compile adverb usage list," "compile repeated phrases list", and "compile repeated words list" (and uncheck all others). Then hit the "run checks" button in the toolbar. Then click on the "results" tab and select one by one "open adverb usage list," "open repeated phrases list," and "open repeated words list". Depending on the size of your text document, it should be manageable to pick out repeated words and check whether they are in the same sentence, by just going through the results manually.

mouser · « **Reply #6 on:** October 19, 2014, 04:05 PM »

i like the idea of a program that can analyze a corpus and then identify when you have two many of the same "rare" words in a sentence, or too many "ultrarare" words in a document/paragraph.

rjbull · « **Reply #7 on:** October 19, 2014, 04:23 PM »

Of related interest, TextSTAT - Simple Text Analysis Tool:

Concordance software for Windows, GNU/Linux and MacOS

TextSTAT is a simple programme for the analysis of texts. It reads plain text files (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. This version includes a web-spider which reads as many pages as you want from a particular website and puts them in a TextSTAT-corpus. The new news-reader, too, puts news messages in a TextSTAT-readable corpus file.
TextSTAT reads MS Word and OpenOffice files. No conversion needed, just add the files to your corpus...
In TextSTAT you can use regular expression which provides you with powerful search possibilities. The programme is multilingual. Because it uses Unicode internally, TextSTAT can cope with many different languages and file encodings.

Freeware. Found via Mark Wieczorek's "My Favorite Smallware".

bit · « **Reply #8 on:** October 19, 2014, 09:37 PM »

i like the idea of a program that can analyze a corpus and then identify when you have two many of the same "rare" words in a sentence, or too many "ultrarare" words in a document/paragraph.
-mouser (October 19, 2014, 04:05 PM)

It would be sort of a 'duplicate wild card' search.
Something like the old DOS 'star-dot-star' [*.*] that would define 'any' duplicates between period dots, for a whole document, without having to name particular words.
Also to be able to exclude simple common words like 'a', 'and', 'the', and so on.

It really seems quite simple, and Google already does it;
-'x' = 'any dictionary word' (the original DOS 'wild card' or star-dot-star *.*),
-find 'x', where 'x' repeats 2x and < (is less than [i.e. 'within (the borders of))] '.' --'.' (period dot/exclamation point/question mark) to 'same',
-exclude list; a, an, and, the...
A few variables might need to be written, to include every way a sentence can begin or end, as with quotes, and so on.
Well anyway, it's always fun to 'dream' or have a 'wish list'.
Forget about that new fad, the 'bucket list'; all I want when I get that far, is a dreamy-looking 'wish list' to fasten my eyes on.

bit · « **Reply #9 on:** October 20, 2014, 09:24 PM »

SmartEdit's 'Repeated Word Settings' can be changed to 'Display when word occurs at least [2 to 100] times'.
I set it to [2] times.
For example, with the word 'again', it should only pop up any sentence -or fragment thereof- which shows [2] instances of 'again' in it.
And it has a results exclusion list of 'a, an, and, the' and so on that is quite comprehensive enough.
However, the 'Repeated Word List' does not seem to work properly according to its own parameters; it simply is not finding what it purports to find.
A search for all instances of the word 'again' pops up sentences which only use 'again' [1] time, not [2] times or more.
Perhaps I am somehow not using the software properly, but that's as far as I have managed to get with it yet.

I just reset it to find only sentences or fragments of sentences which contain only [3] or more instances, and it is still finding single instances of words.

Well...I think...it means [2] to [100] times in 'entire document', not 'individual sentences or fragments of sentences'.
That's too bad; if so, the 'Repeated Word Search' feature is not truly useful to me.
If I'm right, all I can do is hope SmartEdit will get an upgrade soon to fix that deficiency.

It does have a 'Repeated Phrase List', but that misses the ball too, by not singling out duplicates of [2] or more instances of individual words in sentences.

While I'm at it, I might as well wish for the Moon, and hope SmartEdit will one day include a feature to search for sentences containing [2] different specific words in the same sentence, such as 'again' + 'stenography'.

I would still highly recommend SmartEdit to whoever asks; it is feature-rich in some very nice ways.

IainB · « **Reply #10 on:** October 21, 2014, 02:00 AM »

A few years back, I was assigned to work on a huge documentation project that was using Word 2003 as the main documentation tool. I read a book called "Taking Word for Windows to the Edge" (or something like that), and learned lots of good stuff from it. One thing it taught me to do was to switch ON all spelling, grammar-checking and proofing functionality in the settings. This greatly assisted in automated checking of all written text - including thoroughly parsing the grammar and checking for repetitive use of words. For example, if you wrote something repetitive but properly punctuated such as (say) "Very, very good but the rest of it was very, very bad and very, very smelly." it would not object to any of it, but it would spot any duplicated "very" that was without the necessary punctuation to make it grammatically correct.
However, it was not smart enough to check for bad use of English - for example, by pointing out that repetitive use of a phrase such as "Very, very something" was potential redundancy.

It could also sometimes spot the use of jargon and would suggest alternative terms.

I am currently using Word 2013, and it still has all this functionality.

I did once briefly trial an old software package called Grammatik (per Wikipedia) that went some way beyond MS Word's limits:

Grammatik was the first grammar checking program developed for home computer systems. Aspen Software of Albuquerque, NM, released the earliest version of this diction and style checker for personal computers, in 1981. Grammatik was first available for a Radio Shack - TRS-80, and soon had versions for CP/M and the IBM PC. Reference Software of San Francisco, CA, acquired Grammatik in 1985. Development of Grammatik continued, and it became an actual grammar checker that could detect writing errors beyond simple style checking.

Subsequent versions were released for the DOS, Windows, Macintosh and Unix platforms. Grammatik was ultimately acquired by Corel and is integrated in the WordPerfect word processor.

I don't know, because I haven't tried it, but SmartEdit looks like it goes some way towards doing the same kind of thing, using a different approach. It's clearly aimed at parsing/improving writing, anyway.

bit · « **Reply #11 on:** October 22, 2014, 08:33 AM »

Like the hungry fish said, "No corpus is truly complete until you finny shit."

bit · « **Reply #12 on:** October 24, 2014, 02:09 PM »

I'll go back to my fav review method; text-to-speech playback.
Works like a dream.

dr_andus · « **Reply #13 on:** October 30, 2014, 08:45 AM »

Another possibility: Hemingway

Hemingway highlights long, complex sentences and common errors; if you see a yellow highlight, shorten the sentence or split it. If you see a red highlight, your sentence is so dense and complicated that your readers will get lost trying to follow its meandering, splitting logic — try editing this sentence to remove the red.

Adverbs are helpfully shown in blue. Get rid of them and pick verbs with force instead.

You can utilize a shorter word in place of a purple one. Mouse over it for hints.

Phrases in green have been marked to show passive voice.

dr_andus · « **Reply #14 on:** November 03, 2014, 05:28 AM »

Have you looked at SmartEdit? There used to be a free version I think, but maybe the trial version will do the job, too.
-dr_andus (October 18, 2014, 05:12 PM)

SmartEdit for Word - Word Processing Software - is 60% off for PC today.

dr_andus · « **Reply #15 on:** November 03, 2014, 10:50 AM »

Have you looked at SmartEdit? There used to be a free version I think, but maybe the trial version will do the job, too.
-dr_andus (October 18, 2014, 05:12 PM)

Turns out the free version is still available: SmartEdit Lite

Author Topic: need duplicate word scanner (Read 9660 times)

bit

need duplicate word scanner

dr_andus

Re: need duplicate word scanner

bit

Re: need duplicate word scanner

mouser

Re: need duplicate word scanner

bit

Re: need duplicate word scanner

dr_andus

Re: need duplicate word scanner

mouser

Re: need duplicate word scanner

rjbull

Re: need duplicate word scanner

bit

Re: need duplicate word scanner

bit

Re: need duplicate word scanner

IainB

Re: need duplicate word scanner

bit

Re: need duplicate word scanner

bit

Re: need duplicate word scanner

dr_andus

Re: need duplicate word scanner

dr_andus

Re: need duplicate word scanner

dr_andus

Re: need duplicate word scanner