ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

need duplicate word scanner

<< < (2/4) > >>

dr_andus:
However, I don't seem to be able to get it to do what I originally wanted, which is to find unnecessarily duplicated words -such as adverbs- in the same sentence.
It just doesn't seem to have any way to do that.
-bit (October 19, 2014, 01:58 PM)
--- End quote ---

It should be possible to do it, but with a bit of manual work involved. First, select the "compile adverb usage list," "compile repeated phrases list", and "compile repeated words list" (and uncheck all others). Then hit the "run checks" button in the toolbar. Then click on the "results" tab and select one by one "open adverb usage list," "open repeated phrases list," and "open repeated words list". Depending on the size of your text document, it should be manageable to pick out repeated words and check whether they are in the same sentence, by just going through the results manually.

mouser:
i like the idea of a program that can analyze a corpus and then identify when you have two many of the same "rare" words in a sentence, or too many "ultrarare" words in a document/paragraph.

rjbull:
Of related interest, TextSTAT - Simple Text Analysis Tool:
Concordance software for Windows, GNU/Linux and MacOS

TextSTAT is a simple programme for the analysis of texts. It reads plain text files (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. This version includes a web-spider which reads as many pages as you want from a particular website and puts them in a TextSTAT-corpus. The new news-reader, too, puts news messages in a TextSTAT-readable corpus file.
TextSTAT reads MS Word and OpenOffice files. No conversion needed, just add the files to your corpus...
In TextSTAT you can use regular expression which provides you with powerful search possibilities. The programme is multilingual. Because it uses Unicode internally, TextSTAT can cope with many different languages and file encodings.
--- End quote ---
Freeware.  Found via Mark Wieczorek's "My Favorite Smallware".

bit:
i like the idea of a program that can analyze a corpus and then identify when you have two many of the same "rare" words in a sentence, or too many "ultrarare" words in a document/paragraph.
-mouser (October 19, 2014, 04:05 PM)
--- End quote ---
It would be sort of a 'duplicate wild card' search.
Something like the old DOS 'star-dot-star' [*.*] that would define 'any' duplicates between period dots, for a whole document, without having to name particular words.
Also to be able to exclude simple common words like 'a', 'and', 'the', and so on.

It really seems quite simple, and Google already does it;
-'x' = 'any dictionary word' (the original DOS 'wild card' or star-dot-star *.*),
-find 'x', where 'x' repeats 2x and < (is less than [i.e. 'within (the borders of))] '.' --'.' (period dot/exclamation point/question mark) to 'same',
-exclude list; a, an, and, the...
A few variables might need to be written, to include every way a sentence can begin or end, as with quotes, and so on.
Well anyway, it's always fun to 'dream' or have a 'wish list'.
Forget about that new fad, the 'bucket list'; all I want when I get that far, is a dreamy-looking 'wish list' to fasten my eyes on. :)

bit:
SmartEdit's 'Repeated Word Settings' can be changed to 'Display when word occurs at least [2 to 100] times'.
I set it to [2] times.
For example, with the word 'again', it should only pop up any sentence -or fragment thereof- which shows [2] instances of 'again' in it.
And it has a results exclusion list of 'a, an, and, the' and so on that is quite comprehensive enough.
However, the 'Repeated Word List' does not seem to work properly according to its own parameters; it simply is not finding what it purports to find.
A search for all instances of the word 'again' pops up sentences which only use 'again' [1] time, not [2] times or more.
Perhaps I am somehow not using the software properly, but that's as far as I have managed to get with it yet.

I just reset it to find only sentences or fragments of sentences which contain only [3] or more instances, and it is still finding single instances of words.

Well...I think...it means [2] to [100] times in 'entire document', not 'individual sentences or fragments of sentences'.
That's too bad; if so, the 'Repeated Word Search' feature is not truly useful to me.
If I'm right, all I can do is hope SmartEdit will get an upgrade soon to fix that deficiency.

It does have a 'Repeated Phrase List', but that misses the ball too, by not singling out duplicates of [2] or more instances of individual words in sentences.

While I'm at it, I might as well wish for the Moon, and hope SmartEdit will one day include a feature to search for sentences containing [2] different specific words in the same sentence, such as 'again' + 'stenography'.

I would still highly recommend SmartEdit to whoever asks; it is feature-rich in some very nice ways.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version