I’ve been editing quite a bit this weekend, which has been a lot of fun although it has also been hard work. On the other hand, some of the tools that I’ve developed (the VBScripts, for instance) are able to show me that I’m making objective progress. Whether that translates into subjective progress is something yet to be seen, but I believe it does (in other words, I think that an objectively better text will translate into a subjectively better experience for the reader).
Just to give an example, one of the big problems that faces new writers is the fact that they are often repetitious. We can often catch a lot of these problems ourselves, but often it takes extra readers to find them. That’s why, for instance, in the dedication I wrote for Public Transit, I included the phrase “Thanks, I suddenly realized, to…” I had used the phrase “suddenly” and “realized” as well as the combo “suddenly realized” a lot in that novel without knowing it; but the readers I gave it to for critiques spotted this flaw instantly.
Let’s make this concrete. Suppose that I wrote the following paragraph:
The wide receiver ran to his starting position. On the snap, he ran down field as fast as he could. Once in the open, he turned back, saw the ball already in the air. He ran to where it would fall, scooped it in his arms, and ran for the end zone.
When I run that paragraph through my stats program, it tells me:
There were 52 total words.
There were 35 different words used.
There were 27 words used only once.
So what this tells me is that in the above paragraph, there were 35 unique words that made up a total of 52 words. That means about 2/3 of the paragraph was unique. More importantly, of the 35 different words used, only 27 of them were used only once. That means that there were 8 words used more than one time.
But here’s where the problem of repetition comes in. See, there were 52 total words, with 27 words only used once (meaning we haven’t accounted for 25 words). There were 35 distinct words, meaning that there were only 8 words that accounted for those missing 25 words! Thus, 48% of this single paragraph was comprised of 15% of the words used in it.
Because the paragraph is so short, we can look at all of the words that appeared more than once:
6 the
4 he
4 ran
3 in
2 as
2 his
2 it
2 to
It’s not a surprise that “the” is high on the list, since “the” is a function word. In this context, so is “he”, “in”, “as”, “his”, “it”, and “to.” Those aren’t a big deal.
But what about “ran”? “Ran” is a verb, not a function word. What happens if we substitute synonyms for the word “ran” in the paragraph?
The wide receiver trotted to his starting position. On the snap, he raced down field as fast as he could. Once in the open, he turned back, saw the ball already in the air. He ran to where it would fall, scooped it in his arms, and sprinted for the end zone.
Now when I run my stats programs, I get the following result:
There were 52 total words.
There were 38 different words used.
There were 31 words used only once.
The numbers have improved. We added three more unique words, and because we only used the word “ran” once, we’ve added four to the number of words used only once. Furthermore, now the only words that occur more than one time are all function words.
Now, 73% of the text is comprised of different words (up from 67%), and 60% of the text is comprised of words used only once (up from 52%). Finally, 40% of the text is comprised of 13% of the words (whereas before 48% was comprised of 15% of the words).
Those numbers show that the second paragraph is objectively superior (assuming, of course, the rules of syntax and grammar are followed). That doesn’t mean it’s subjectively better though, because subjectivity relies on personal taste. However, I would say that generally speaking objective improvements would indicate subjective improvements too.
So why do I bring that up? Well, let’s just look at the numbers for my first chapter of The 13th Prime before I did my edits this weekend:
There were 3684 total words.
There were 1001 different words used.
There were 595 words used only once.
Compared to after my final edit this weekend:
There were 3694 total words.
There were 1053 different words used.
There were 668 words used only once.
You can see that I added 10 words to the total length. However, I added 52 “different words” and 73 “words used only once”!
Now with longer texts, the ratios won’t apply the same way as they did when I examined the single paragraph above. Still, I do look at how often a word appears in the document, and for this size of a sample I don’t want to see non-function words appearing more than around 20 times. That gives those words a cap of about 1/200 (that is, I want to keep non-function words appearing no more than once every 200 words).
Of course there are ALWAYS exceptions to this, so if I read something and artistically feel it’s better to have a repetition, then I’ll include it. But in those instances, I better have a darn good reason to do so!







