Welcome Guest.   Make a donation to an author on the site October 21, 2014, 10:12:35 AM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
Read the Practical Guide to DonationCoder.com Forum Search Features
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: need duplicate word scanner  (Read 463 times)
bit
Supporting Member
**
Posts: 266


View Profile Give some DonationCredits to this forum member
« on: October 18, 2014, 04:27:29 PM »

In Microsoft Word 2003, which I'm not terribly familiar with, is there any way -or app- to scan documents for duplicate words?
A way is needed to discover if any sentence is buried deep in the document, which uses the same descriptive word twice in an obviously redundant manner.
There would also be a need to exclude common words like 'the', 'and', 'of', and so on, and to choose if the search is case sensitive or not.
Or, if there is any other program, for instance Thunderbird Compose or whatever, which the document could be copied & pasted into, to accomplish the same goal.
Any help would be most gratefully appreciated.
Logged
dr_andus
Supporting Member
**
Posts: 401


View Profile WWW Give some DonationCredits to this forum member
« Reply #1 on: October 18, 2014, 05:12:19 PM »

Have you looked at SmartEdit? There used to be a free version I think, but maybe the trial version will do the job, too.
Logged
bit
Supporting Member
**
Posts: 266


View Profile Give some DonationCredits to this forum member
« Reply #2 on: October 18, 2014, 07:42:02 PM »

Have you looked at SmartEdit? There used to be a free version I think, but maybe the trial version will do the job, too.
^Way cool. Tnx! smiley
Logged
mouser
First Author
Administrator
*****
Posts: 33,571



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #3 on: October 18, 2014, 08:53:31 PM »

it is a cool idea.
Logged
bit
Supporting Member
**
Posts: 266


View Profile Give some DonationCredits to this forum member
« Reply #4 on: October 19, 2014, 01:58:54 PM »

Have you looked at SmartEdit? There used to be a free version I think, but maybe the trial version will do the job, too.
Yes, SmartEdit is really quite awesome.
I got the basic non-MS Word model to work with MS Word 2003.
However, I don't seem to be able to get it to do what I originally wanted, which is to find unnecessarily duplicated words -such as adverbs- in the same sentence.
It just doesn't seem to have any way to do that.
Logged
dr_andus
Supporting Member
**
Posts: 401


View Profile WWW Give some DonationCredits to this forum member
« Reply #5 on: October 19, 2014, 03:42:47 PM »

However, I don't seem to be able to get it to do what I originally wanted, which is to find unnecessarily duplicated words -such as adverbs- in the same sentence.
It just doesn't seem to have any way to do that.

It should be possible to do it, but with a bit of manual work involved. First, select the "compile adverb usage list," "compile repeated phrases list", and "compile repeated words list" (and uncheck all others). Then hit the "run checks" button in the toolbar. Then click on the "results" tab and select one by one "open adverb usage list," "open repeated phrases list," and "open repeated words list". Depending on the size of your text document, it should be manageable to pick out repeated words and check whether they are in the same sentence, by just going through the results manually.
Logged
mouser
First Author
Administrator
*****
Posts: 33,571



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #6 on: October 19, 2014, 04:05:31 PM »

i like the idea of a program that can analyze a corpus and then identify when you have two many of the same "rare" words in a sentence, or too many "ultrarare" words in a document/paragraph.
Logged
rjbull
Charter Member
***
Posts: 2,775

View Profile Give some DonationCredits to this forum member
« Reply #7 on: October 19, 2014, 04:23:46 PM »

Of related interest, TextSTAT - Simple Text Analysis Tool:
Quote
Concordance software for Windows, GNU/Linux and MacOS

TextSTAT is a simple programme for the analysis of texts. It reads plain text files (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. This version includes a web-spider which reads as many pages as you want from a particular website and puts them in a TextSTAT-corpus. The new news-reader, too, puts news messages in a TextSTAT-readable corpus file.
TextSTAT reads MS Word and OpenOffice files. No conversion needed, just add the files to your corpus...
In TextSTAT you can use regular expression which provides you with powerful search possibilities. The programme is multilingual. Because it uses Unicode internally, TextSTAT can cope with many different languages and file encodings.
Freeware.  Found via Mark Wieczorek's "My Favorite Smallware".
Logged
bit
Supporting Member
**
Posts: 266


View Profile Give some DonationCredits to this forum member
« Reply #8 on: October 19, 2014, 09:37:41 PM »

i like the idea of a program that can analyze a corpus and then identify when you have two many of the same "rare" words in a sentence, or too many "ultrarare" words in a document/paragraph.
It would be sort of a 'duplicate wild card' search.
Something like the old DOS 'star-dot-star' [*.*] that would define 'any' duplicates between period dots, for a whole document, without having to name particular words.
Also to be able to exclude simple common words like 'a', 'and', 'the', and so on.

It really seems quite simple, and Google already does it;
-'x' = 'any dictionary word' (the original DOS 'wild card' or star-dot-star *.*),
-find 'x', where 'x' repeats 2x and < (is less than [i.e. 'within (the borders of))] '.' --'.' (period dot/exclamation point/question mark) to 'same',
-exclude list; a, an, and, the...
A few variables might need to be written, to include every way a sentence can begin or end, as with quotes, and so on.
Well anyway, it's always fun to 'dream' or have a 'wish list'.
Forget about that new fad, the 'bucket list'; all I want when I get that far, is a dreamy-looking 'wish list' to fasten my eyes on. smiley
« Last Edit: October 20, 2014, 12:34:05 PM by bit » Logged
bit
Supporting Member
**
Posts: 266


View Profile Give some DonationCredits to this forum member
« Reply #9 on: October 20, 2014, 09:24:12 PM »

SmartEdit's 'Repeated Word Settings' can be changed to 'Display when word occurs at least [2 to 100] times'.
I set it to [2] times.
For example, with the word 'again', it should only pop up any sentence -or fragment thereof- which shows [2] instances of 'again' in it.
And it has a results exclusion list of 'a, an, and, the' and so on that is quite comprehensive enough.
However, the 'Repeated Word List' does not seem to work properly according to its own parameters; it simply is not finding what it purports to find.
A search for all instances of the word 'again' pops up sentences which only use 'again' [1] time, not [2] times or more.
Perhaps I am somehow not using the software properly, but that's as far as I have managed to get with it yet.

I just reset it to find only sentences or fragments of sentences which contain only [3] or more instances, and it is still finding single instances of words.

Well...I think...it means [2] to [100] times in 'entire document', not 'individual sentences or fragments of sentences'.
That's too bad; if so, the 'Repeated Word Search' feature is not truly useful to me.
If I'm right, all I can do is hope SmartEdit will get an upgrade soon to fix that deficiency.

It does have a 'Repeated Phrase List', but that misses the ball too, by not singling out duplicates of [2] or more instances of individual words in sentences.

While I'm at it, I might as well wish for the Moon, and hope SmartEdit will one day include a feature to search for sentences containing [2] different specific words in the same sentence, such as 'again' + 'stenography'.

I would still highly recommend SmartEdit to whoever asks; it is feature-rich in some very nice ways.
« Last Edit: October 20, 2014, 09:58:59 PM by bit » Logged
IainB
Supporting Member
**
Posts: 4,789


Slartibartfarst

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #10 on: Today at 02:00:59 AM »

A few years back, I was assigned to work on a huge documentation project that was using Word 2003 as the main documentation tool. I read a book called "Taking Word for Windows to the Edge" (or something like that), and learned lots of good stuff from it. One thing it taught me to do was to switch ON all spelling, grammar-checking and proofing functionality in the settings. This greatly assisted in automated checking of all written text - including thoroughly parsing the grammar and checking for repetitive use of words. For example, if you wrote something repetitive but properly punctuated such as (say) "Very, very good but the rest of it was very, very bad and very, very smelly." it would not object to any of it, but it would spot any duplicated "very" that was without the necessary punctuation to make it grammatically correct.
However, it was not smart enough to check for bad use of English - for example, by pointing out that repetitive use of a phrase such as "Very, very something" was potential redundancy.

It could also sometimes spot the use of jargon and would suggest alternative terms.

I am currently using Word 2013, and it still has all this functionality.

I did once briefly trial an old software package called Grammatik (per Wikipedia) that went some way beyond MS Word's limits:
Quote
Grammatik was the first grammar checking program developed for home computer systems. Aspen Software of Albuquerque, NM, released the earliest version of this diction and style checker for personal computers, in 1981. Grammatik was first available for a Radio Shack - TRS-80, and soon had versions for CP/M and the IBM PC. Reference Software of San Francisco, CA, acquired Grammatik in 1985. Development of Grammatik continued, and it became an actual grammar checker that could detect writing errors beyond simple style checking.

Subsequent versions were released for the DOS, Windows, Macintosh and Unix platforms. Grammatik was ultimately acquired by Corel and is integrated in the WordPerfect word processor.

I don't know, because I haven't tried it, but SmartEdit looks like it goes some way towards doing the same kind of thing, using a different approach. It's clearly aimed at parsing/improving writing, anyway.
« Last Edit: Today at 02:08:56 AM by IainB » Logged
Pages: [1]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.041s | Server load: 0.03 ]