topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 6:15 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: IDEA: super word counter  (Read 7232 times)

Joe Hone

  • Supporting Member
  • Joined in 2012
  • **
  • default avatar
  • Posts: 86
    • View Profile
    • Donate to Member
IDEA: super word counter
« on: February 17, 2013, 10:17 AM »
I write for a living and when you get 10,000 words into a subject it is pretty easy for things to start repeating (phrases and sentence starts), and when you are trying to write convincingly, loaded words and phrases tend to repeat (“clearly” “without question”).  These things lower the quality of the writing and turn off readers – even subconsciously, we don’t like bad writing.

How about a super word counter, that in addition to counting the standard pages/words/paragraphs/characters/lines, will also generate lists for repeated words, repeated phrases, adverbs (words that tend to end with “ly” – a big no-no for writers) and sentences that start with the same words? It would need some simple filtering so “and” “the” and similar words can be omitted from the search field. There are probably other lists it could compile, but these are the tools I can use most.

To be compatible with different document formats, it would probably work best if it were a standalone program that imported the document for the searches, but it would only generate lists – the user would then look for the words and phrases and make corrections in the original document. This could also be a useful plug-in for a word processor or office suite, but I've never actually tried to use a plug-in with Word, WordPerfect or LibreOffice, the programs I use most, and I don't even know if they would utilize a plug-in.

After writing the above, but before posting, I did a web search and found a more comprehensive filtering program than what I've described - SmartEdit by Bad Wolf Software. Pretty pricey at $49. I didn't find anything else out there with these basic search parameters. There are programs like Word Counter for Mac by Supermagnus Software, but it is more a word counter than a phrase and sentence start tool. And I'm not on a Mac.

Any thoughts?

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Re: IDEA: super word counter
« Reply #1 on: February 17, 2013, 10:34 AM »
What software do you use for your day to day writing?

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,288
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: IDEA: super word counter
« Reply #2 on: February 17, 2013, 11:00 AM »
You *could* look at some CAT tools to see if they work. There are some free ones out there, though I forget the names. All of the people I work with use Trados.

But, translation memory could help out there. They have tools for dealing with batch jobs, etc.

Check here for some:

http://en.wikipedia....me_notable_CAT_tools
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
Re: IDEA: super word counter
« Reply #3 on: February 17, 2013, 11:33 AM »
Not sure it's quite what you want, but you might find TextSTAT - Simple Text Analysis Tool useful:
Concordance software for Windows, GNU/Linux and MacOS

TextSTAT is a simple programme for the analysis of texts. It reads plain text files (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. This version includes a web-spider which reads as many pages as you want from a particular website and puts them in a TextSTAT-corpus. The new news-reader, too, puts news messages in a TextSTAT-readable corpus file.
TextSTAT reads MS Word and OpenOffice files. No conversion needed, just add the files to your corpus...
In TextSTAT you can use regular expression which provides you with powerful search possibilities. The programme is multilingual. Because it uses Unicode internally, TextSTAT can cope with many different languages and file encodings.

Joe Hone

  • Supporting Member
  • Joined in 2012
  • **
  • default avatar
  • Posts: 86
    • View Profile
    • Donate to Member
Re: IDEA: super word counter
« Reply #4 on: February 17, 2013, 12:47 PM »
Most of the time I'm in Word because it is standard in the industry I write for, but occasionally WP and LibreOffice, usually if I'm sent a document written in those programs. If possible, I move the new doc to Word because it simplifies things down the road. OT: I'm waiting for the day when Word loses its dominance for word processing. . . 

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: IDEA: super word counter
« Reply #5 on: February 18, 2013, 07:13 AM »
If you are using MS-Word, then you can probably find almost everything you need to do in the Options menus. I find it tremendously useful - e.g., for Grammar & Style checking, and for flagging repetition, clichés and wordiness.
Image example below is from Word 2013, but I think it is the same for Word 2007:

Word 2013 Options 01 - Grammar + Style.jpgIDEA: super word counter

Word count is catered for in the Status Bar at the bottom:

Word 2013 Options 02 - word count.jpgIDEA: super word counter

If you want to check for repetition of certain phrases and words throughout a document, the Search or Search/Replace automates it for you to some extent.

However, the best check is to get someone - e.g., a peer who is good at proofreading and who knows your subject reasonably well - to review the document and comment on it. I was trained to always do this before issuing a report to a client. Once you get over the initial ego-hit when the peer review turns up so many faults, you find it a great timesaver and it helps you to understand and correct for the sorts of habitual, unconscious mistakes that the exercise shows that you tend to make the most.
This is learning from our mistakes as spotted by others, because often we cannot see our own mistakes until they are pointed out to us. Not easy for the arrogant, or those with sensitive egos!    ;)

Joe Hone

  • Supporting Member
  • Joined in 2012
  • **
  • default avatar
  • Posts: 86
    • View Profile
    • Donate to Member
Re: IDEA: super word counter
« Reply #6 on: February 18, 2013, 07:52 AM »
I'm familiar with the options menu from Word and it doesn't do what I need. First, I'm running a pre-2007 version of Word at work, so most of the 2013 options are not available. Even so, I have 2010 in my laptop and have tried some of the functions you mention - for instance, the "flag repeated words" function just tells you if you typed the same word word twice, like that. It doesn't pull repeated words from the body of the text. Similarly, there is no way to work through 25,000 words and have it check sentence starts, or 4-5 word phrases, or tell you how many times you used "axiomatic." Yes, you can search for axiomatic manually, but you have to first remember that you might have overused it and then do your search for it specifically - and the whole point of this tool is you overused a word without realizing it and the program is going to check for it for you.

I appreciate your words about editing - my work gets edited. But many times editors are just professional readers and I've found that one editor's fresh bread is another's stale dinner roll, and one editor's preference for the written word may not jibe with their colleague's in the next office. Either way, I still want to send out my work proofed to my satisfaction.

I checked out the CAT tools and TextSTAT. I don't see a CAT program that categorizes words like I need, and TextSTAT is quite the program but it doesn't appear to do specific searches within fields. But thanks for the suggestions!

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: IDEA: super word counter
« Reply #7 on: February 18, 2013, 03:01 PM »
@Joe Hone: I think I understand your needs a bit better now. Ultimately, I suspect that you may need to take an evaluation path  - a "suck-it-and-see" approach - to find something that meets your defined/undefined needs/criteria. The evaluation would probably not only help you to better define your needs and compare those needs against available solutions, but also help you to discover some new needs that might become "mandatory" or "highly desirable" criteria.

I googled "comparison review of automated text proofreading software", and came up with a slew of relevant references. There seem to be a lot of online and PC-based grammar-checking tools on offer, but I have no experience of them to be able to suggest any.
There was an interesting article referred to: Proofreading test: my wife vs. Grammarly vs. Ginger vs. After The Deadline vs. Microsoft Word 2010, where a simple (and useful) comparison test is made and the results given:
  • Grammarly - score: 3/8
  • Ginger - score: 3/8
  • After The Deadline - score: 3/8
  • Microsoft Word 2010 - score: 4/8
  • a skilled human proof-reader - score: 8/8

Here is the Proofreading test summary:
What we take away from this five-way match up is that you can’t beat the eye of a human proofreader. Digital tools can be useful as spellcheckers, grammar fixers and synonym suggesters. In some cases, they can help you improve your basic writing skills and steer you away from embarrassing copy-editing errors as you create content.

But there’s more to proofreading than hunting for typos and making sure you haven’t written ‘your’ when the sentence structure calls for ‘you’re’.

What digital tools like Grammarly, Ginger and After The Deadline CAN’T do is check that web links point to the right pages, that names are spelled correctly or that facts and figures are accurate. So while they might claim to ‘proofread’ text, they actually don’t. You’d be better off doing it yourself or getting somebody else to check copy for you.

Hope this helps or is of use.

Joe Hone

  • Supporting Member
  • Joined in 2012
  • **
  • default avatar
  • Posts: 86
    • View Profile
    • Donate to Member
Re: IDEA: super word counter
« Reply #8 on: February 19, 2013, 10:47 AM »
I still think we’re talking about different functions. A program cannot replace a human editor for textual substance, including grammar, but what I’m after is a program that checks text for word content, not another glorified spell checker. After replicating your “comparison review of automated text proofreading software” (thanks, by the way), I’m still only coming up with programs (Grammarly, Ginger, MS Word spell check) that check spelling, grammar and for repeated repeated words, but not for how many times a word or word combination is used, or for repeated phrases, including sentence starts, etc. When I’m writing an instructional manual, policy manual, teaching guide or legal treatise and I’m 45,000 words into it, it is a lot of work to go back and check for these things by reading – a search program would be fast and simple. And, I suspect, quite revealing.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: IDEA: super word counter
« Reply #9 on: February 19, 2013, 03:11 PM »
Your requirements:
  • How about a super word counter, that in addition to counting the standard pages/words/paragraphs/characters/lines, will also generate lists for repeated words, repeated phrases, adverbs (words that tend to end with “ly” – a big no-no for writers) and sentences that start with the same words? It would need some simple filtering so “and” “the” and similar words can be omitted from the search field. There are probably other lists it could compile, but these are the tools I can use most.

  • ...Either way, I still want to send out my work proofed to my satisfaction...

  • I still think we’re talking about different functions. A program cannot replace a human editor for textual substance, including grammar, but what I’m after is a program that checks text for word content, not another glorified spell checker....

Though syntax-checking and grammatical/structural parsing could be useful, you seem to need something (a program) that has a learning capability, or that you can add words, phrases and so on to a catalogue, to parse for - but it might perhaps be better if the program could identify what patterns predominate in a document and that you might like to add to the catalogue and that could perhaps even take into account or identify different authors' idiosyncrasies (e.g., including things such as writing style, certain words/phrases, or (say) speaking in the first person) - but I am merely speculating in this.
Certainly there would seem to be here the likelihood that what you want to do could be described as a process, and if it can (and I suspect it can), then you will probably be able to automate a large part of it.
Since there is generally little that is really "new" in user requirements for applications, it is likely that someone has been there before us.
I therefore googled:
program to count words and phrases in a document
- and came up with some interesting results, some of which:

Before I performed the search, I hadn't realised the field was so advanced and richly-populated. Having spent a lot of time, some years back, on large documentation projects, I find it very interesting.
I am not sure whether you will have already examined things like the above, but if you haven't, then it might be worthwhile doing so.