[This Article appeared in the American Scientist (Nov-Dec
1990), Volume 78, 550-558. Retyped and posted with
permission.]
The Science of Scientific Writing
If the reader is to grasp what the writer means,
the writer must understand what the reader needs
George D. Gopen and Judith
A. Swan*
*George D. Gopenis associate professor of English
and Director of Writing Programs at Duke University.
He holds a Ph.D. in English from Harvard University
and a J.D. from Harvard Law School. Judith A. Swan
teaches scientific writing at Princeton University.
Her Ph.D., which is in biochemistry, was earned at
the Massachusetts Institute of Technology. Address
for Gopen: 307 Allen Building, Duke University, Durham,
NC 27706
Science is often hard to read. Most people assume
that its difficulties are born out of necessity, out
of the extreme complexity of scientific concepts,
data and analysis. We argue here that complexity of
thought need not lead to impenetrability of expression;
we demonstrate a number of rhetorical principles that
can produce clarity in communication without oversimplifying
scientific issues. The results are substantive, not
merely cosmetic: Improving the quality of writing
actually improves the quality of thought.
The fundamental purpose of scientific discourse is
not the mere presentation of information and thought,
but rather its actual communication. It does not matter
how pleased an author might be to have converted all
the right data into sentences and paragraphs; it matters
only whether a large majority of the reading audience
accurately perceives what the author had in mind.
Therefore, in order to understand how best to improve
writing, we would do well to understand better how
readers go about reading. Such an understanding has
recently become available through work done in the
fields of rhetoric, linguistics and cognitive psychology.
It has helped to produce a methodology based on the
concept of reader expectations.
Writing with the Reader in Mind: Expectation and
Context
Readers do not simply read; they interpret. Any piece
of prose, no matter how short, may "mean" in 10 (or
more) different ways to 10 different readers. This
methodology of reader expectations is founded on the
recognition that readers make many of their most important
interpretive decisions about the substance of prose
based on clues they receive from its structure.
This interplay between substance and structure can
be demonstrated by something as basic as a simple
table. Let us say that in tracking the temperature
of a liquid over a period of time, an investigator
takes measurements every three minutes and records
a list of temperatures. Those data could be presented
by a number of written structures. Here are two possibilities:
t(time)=15', T(temperature)=32º, t=0', T=25º;
t=6', T=29º; t=3', T=27º; t=12', T=32º; t=9';
T=31º
time (min) temperature(ºC)
0 25
3 27
6 29
9 31
12 32
15 32
Precisely the same information appears in both formats,
yet most readers find the second easier to interpret.
It may be that the very familiarity of the tabular
structure makes it easier to use. But, more significantly,
the structure of the second table provides the reader
with an easily perceived context (time) in which the
significant piece of information (temperature) can
be interpreted. The contextual material appears on
the left in a pattern that produces an expectation
of regularity; the interesting results appear on the
right in a less obvious pattern, the discovery of
which is the point of the table.
If the two sides of this simple table are reversed,
it becomes much harder to read.
temperature(ºC) time(min)
25 0
27 3
29 6
31 9
32 12
32 15
Since we read from left to right, we prefer the context
on the left, where it can more effectively familiarize
the reader. We prefer the new, important information
on the right, since its job is to intrigue the reader.
Information is interpreted more easily and more uniformly
if it is placed where most readers expect to find
it. These needs and expectations of readers affect
the interpretation not only of tables and illustrations
but also of prose itself. Readers have relatively
fixed expectations about where in the structure of
prose they will encounter particular items of its
substance. If writers can become consciously aware
of these locations, they can better control the degrees
of recognition and emphasis a reader will give to
the various pieces of information being presented.
Good writers are intuitively aware of these expectations;
that is why their prose has what we call "shape."
This underlying concept of reader expectation is
perhaps most immediately evident at the level of the
largest units of discourse. (A unit of discourse is
defined as anything with a beginning and an end: a
clause, a sentence, a section, an article, etc.) A
research article, for example, is generally divided
into recognizable sections, sometimes labeled Introduction,
Experimental Methods, Results and Discussion. When
the sections are confused--when too much experimental
detail is found in the Results section, or when discussion
and results intermingle--readers are often equally
confused. In smaller units of discourse the functional
divisions are not so explicitly labeled, but readers
have definite expectations all the same, and they
search for certain information in particular places.
If these structural expectations are continually violated,
readers are forced to divert energy from understanding
the content of a passage to unraveling its structure.
As the complexity of the context increases moderately,
the possibility of misinterpretation or noninterpretation
increases dramatically.
We present here some results of applying this methodology
to research reports in the scientific literature.
We have taken several passages from research articles
(either published or accepted for publication) and
have suggested ways of rewriting them by applying
principles derived from the study of reader expectations.
We have not sought to transform the passages into
"plain English" for the use of the general public;
we have neither decreased the jargon nor diluted the
science. We have striven not for simplification but
for clarification.
Reader Expectations for the Structure of Prose
Here is our first example of scientific prose, in
its original form: The smallest of the URF's
(URFA6L), a 207-nucleotide (nt) reading frame overlapping
out of phase the NH2-terminal portion of
the adenosinetriphosphatase (ATPase) subunit 6 gene
has been identified as the animal equivalent of the
recently discovered yeast H+-ATPase subunit
8 gene. The functional significance of the other URF's
has been, on the contrary, elusive. Recently, however,
immunoprecipitation experiments with antibodies to purified,
rotenone-sensitive NADH-ubiquinone oxido-reductase [hereafter
referred to as respiratory chain NADH dehydrogenase
or complex I] from bovine heart, as well as enzyme fractionation
studies, have indicated that six human URF's (that is,
URF1, URF2, URF3, URF4, URF4L, and URF5, hereafter referred
to as ND1, ND2, ND3, ND4, ND4L, and ND5) encode subunits
of complex I. This is a large complex that also contains
many subunits synthesized in the cytoplasm.*
[*The full paragraph includes one more
sentence: "Support for such functional identification
of the URF products has come from the finding that
the purified rotenone-sensitive NADH dehydrogenase
from Neurospora crassa contains several subunits
synthesized within the mitochondria, and from the
observation that the stopper mutant of Neurospora
crassa, whose mtDNA lacks two genes homologous
to URF2 and URF3, has no functional complex I." We
have omitted this sentence both because the passage
is long enough as is and because it raises no additional
structural issues.]
Ask any ten people why this paragraph is hard to
read, and nine are sure to mention the technical vocabulary;
several will also suggest that it requires specialized
background knowledge. Those problems turn out to be
only a small part of the difficulty. Here is the passage
again, with the difficult words temporarily lifted:
R> The smallest of the URF's, and [A], has been
identified as a [B] subunit 8 gene. The functional
significance of the other URF's has been, on the contrary,
elusive. Recently, however, [C] experiments, as well
as [D] studies, have indicated that six human URF's
[1-6] encode subunits of Complex I. This is a large
complex that also contains many subunits synthesized
in the cytoplasm.
It may now be easier to survive the journey through
the prose, but the passage is still difficult. Any
number of questions present themselves: What has the
first sentence of the passage to do with the last
sentence? Does the third sentence contradict what
we have been told in the second sentence? Is the functional
significance of URF's still "elusive"? Will this passage
lead us to further discussion about URF's, or about
Complex I, or both?
Information is interpreted more
easily and more uniformly if it is placed where most
readers expect to find it.
Knowing a little about the subject matter does
not clear up all the confusion. The intended audience
of this passage would probably possess at least two
items of essential technical information: first, "URF"
stands for "Uninterrupted Reading Frame," which describes
a segment of DNA organized in such a way that it could
encode a protein, although no such protein product
has yet been identified; second, both APTase and NADH
oxido-reductase are enzyme complexes central to energy
metabolism. Although this information may provide
some sense of comfort, it does little to answer the
interpretive questions that need answering. It seems
the reader is hindered by more than just the scientific
jargon.
To get at the problem, we need to articulate something
about how readers go about reading. We proceed to
the first of several reader expectations.
Subject-Verb Separation
Look again at the first sentence of the passage cited
above. It is relatively long, 42 words; but that turns
out not to be the main cause of its burdensome complexity.
Long sentences need not be difficult to read; they
are only difficult to write. We have seen sentences
of over 100 words that flow easily and persuasively
toward their clearly demarcated destination. Those
well-wrought serpents all had something in common:
Their structure presented information to readers in
the order the readers needed and expected it.
Beginning with the exciting
material and ending with a lack of luster often leaves
us disappointed and destroys our sense of momentum.
The first sentence of our example passage does
just the opposite: it burdens and obstructs the reader,
because of an all-too-common structural defect. Note
that the grammatical subject ("the smallest") is separated
from its verb ("has been identified") by 23 words,
more than half the sentence. Readers expect a grammatical
subject to be followed immediately by the verb. Anything
of length that intervenes between subject and verb
is read as an interruption, and therefore as something
of lesser importance.
The reader's expectation stems from a pressing need
for syntactic resolution, fulfilled only by the arrival
of the verb. Without the verb, we do not know what
the subject is doing, or what the sentence is all
about. As a result, the reader focuses attention on
the arrival of the verb and resists recognizing anything
in the interrupting material as being of primary importance.
The longer the interruption lasts, the more likely
it becomes that the "interruptive" material actually
contains important information; but its structural
location will continue to brand it as merely interruptive.
Unfortunately, the reader will not discover its true
value until too lateuntil the sentence has ended without
having produced anything of much value outside of
that subject-verb interruption.
In this first sentence of the paragraph, the relative
importance of the intervening material is difficult
to evaluate. The material might conceivably be quite
significant, in which case the writer should have
positioned it to reveal that importance. Here is one
way to incorporate it into the sentence structure:
The smallest of the URF's is URFA6L, a 207-nucleotide
(nt) reading frame overlapping out of phase the NH2-terminal
portion of the adenosinetriphosphatase (ATPase) subunit
6 gene; it has been identified as the animal equivalent
of the recently discovered yeast H+-ATPase
subunit 8 gene.
On the other hand, the intervening material might
be a mere aside that diverts attention from more important
ideas; in that case the writer should have deleted
it, allowing the prose to drive more directly toward
its significant point: The smallest of the URF's
(URFA6L) has been identified as the animal equivalent
of the recently discovered yeast H+-ATPase
subunit 8 gene.
Only the author could tell us which of these revisions
more accurately reflects his intentions.
These revisions lead us to a second set of reader
expectations. Each unit of discourse, no matter what
the size, is expected to serve a single function,
to make a single point. In the case of a sentence,
the point is expected to appear in a specific place
reserved for emphasis.
The Stress Position
It is a linguistic commonplace that readers naturally
emphasize the material that arrives at the end of
a sentence. We refer to that location as a "stress
position." If a writer is consciously aware of this
tendency, she can arrange for the emphatic information
to appear at the moment the reader is naturally exerting
the greatest reading emphasis. As a result, the chances
greatly increase that reader and writer will perceive
the same material as being worthy of primary emphasis.
The very structure of the sentence thus helps persuade
the reader of the relative values of the sentence's
contents.
The inclination to direct more energy to that which
arrives last in a sentence seems to correspond to
the way we work at tasks through time. We tend to
take something like a "mental breath" as we begin
to read each new sentence, thereby summoning the tension
with which we pay attention to the unfolding of the
syntax. As we recognize that the sentence is drawing
toward its conclusion, we begin to exhale that mental
breath. The exhalation produces a sense of emphasis.
Moreover, we delight in being rewarded at the end
of a labor with something that makes the ongoing effort
worthwhile. Beginning with the exciting material and
ending with a lack of luster often leaves us disappointed
and destroys our sense of momentum. We do not start
with the strawberry shortcake and work our way up
to the broccoli.
When the writer puts the emphatic material of a sentence
in any place other than the stress position, one of
two things can happen; both are bad. First, the reader
might find the stress position occupied by material
that clearly is not worthy of emphasis. In this case,
the reader must discern, without any additional structural
clue, what else in the sentence may be the most likely
candidate for emphasis. There are no secondary structural
indications to fall back upon. In sentences that are
long, dense or sophisticated, chances soar that the
reader will not interpret the prose precisely as the
writer intended. The second possibility is even worse:
The reader may find the stress position occupied by
something that does appear capable of receiving emphasis,
even though the writer did not intend to give it any
stress. In that case, the reader is highly likely
to emphasize this imposter material, and the writer
will have lost an important opportunity to influence
the reader's interpretive process.
The stress position can change in size from sentence
to sentence. Sometimes it consists of a single word;
sometimes it extends to several lines. The definitive
factor is this: The stress position coincides with
the moment of syntactic closure. A reader has reached
the beginning of the stress position when she knows
there is nothing left in the clause or sentence but
the material presently being read. Thus a whole list,
numbered and indented, can occupy the stress position
of a sentence if it has been clearly announced as
being all that remains of that sentence. Each member
of that list, in turn, may have its own internal stress
position, since each member may produce its own syntactic
closure.
Within a sentence, secondary stress positions can
be formed by the appearance of a properly used colon
or semicolon; by grammatical convention, the material
preceding these punctuation marks must be able to
stand by itself as a complete sentence. Thus, sentences
can be extended effortlessly to dozens of words, as
long as there is a medial syntactic closure for every
piece of new, stress-worthy information along the
way. One of our revisions of the initial sentence
can serve as an example: The smallest of the
URF's is URFA6L, a 207-nucleotide (nt) reading frame
overlapping out of phase the NH2-terminal
portion of the adenosinetriphosphatase (ATPase) subunit
6 gene; it has been identified as the animal equivalent
of the recently discovered yeast H+-ATPase
subunit 8 gene.
By using a semicolon, we created a second stress
position to accommodate a second piece of information
that seemed to require emphasis.
We now have three rhetorical principles based on
reader expectations: First, grammatical subjects should
be followed as soon as possible by their verbs; second,
every unit of discourse, no matter the size, should
serve a single function or make a single point; and,
third, information intended to be emphasized should
appear at points of syntactic closure. Using these
principles, we can begin to unravel the problems of
our example prose.
Note the subject-verb separation in the 62-word third
sentence of the original passage: Recently, however,
immunoprecipitation experiments with antibodies to purified,
rotenone-sensitive NADH-ubiquinone oxido-reductase [hereafter
referred to as respiratory chain NADH dehydrogenase
or complex I] from bovine heart, as well as enzyme fractionation
studies, have indicated that six human URF's (that is,
URF1, URF2, URF3, URF4, URF4L, and URF5, hereafter referred
to as ND1, ND2, ND3, ND4, ND4L and ND5) encode subunits
of complex I.
After encountering the subject ("experiments"), the
reader must wade through 27 words (including three
hyphenated compound words, a parenthetical interruption
and an "as well as" phrase) before alighting on the
highly uninformative and disappointingly anticlimactic
verb ("have indicated"). Without a moment to recover,
the reader is handed a "that" clause in which the
new subject ("six human URF's") is separated from
its verb ("encode") by yet another 20 words.
If we applied the three principles we have developed
to the rest of the sentences of the example, we could
generate a great many revised versions of each. These
revisions might differ significantly from one another
in the way their structures indicate to the reader
the various weights and balances to be given to the
information. Had the author placed all stress-worthy
material in stress positions, we as a reading community
would have been far more likely to interpret these
sentences uniformly.
We couch this discussion in terms of "likelihood"
because we believe that meaning is not inherent in
discourse by itself; "meaning" requires the combined
participation of text and reader. All sentences are
infinitely interpretable, given an infinite number
of interpreters. As communities of readers, however,
we tend to work out tacit agreements as to what kinds
of meaning are most likely to be extracted from certain
articulations. We cannot succeed in making even a
single sentence mean one and only one thing; we can
only increase the odds that a large majority of readers
will tend to interpret our discourse according to
our intentions. Such success will follow from authors
becoming more consciously aware of the various reader
expectations presented here.
We cannot succeed in making
even a single sentence mean one and only one thing;
we can only increase the odds that a large majority
of readers will tend to interpret our discourse according
to our intentions.
Here is one set of revisionary decisions we
made for the example: The smallest of the URF's,
URFA6L, has been identified as the animal equivalent
of the recently discovered yeast H+-ATPase
subunit 8 gene; but the functional significance of other
URF's has been more elusive. Recently, however, several
human URF's have been shown to encode subunits of rotenone-sensitive
NADH-ubiquinone oxido-reductase. This is a large complex
that also contains many subunits synthesized in the
cytoplasm; it will be referred to hereafter as respiratory
chain NADH dehydrogenase or complex I. Six subunits
of Complex I were shown by enzyme fractionation studies
and immunoprecipitation experiments to be encoded by
six human URF's (URF1, URF2, URF3, URF4, URF4L, and
URF5); these URF's will be referred to subsequently
as ND1, ND2, ND3, ND4, ND4L and ND5.
Sheer length was neither the problem nor the solution.
The revised version is not noticeably shorter than
the original; nevertheless, it is significantly easier
to interpret. We have indeed deleted certain words,
but not on the basis of wordiness or excess length.
(See especially the last sentence of our revision.)
When is a sentence too long? The creators of readability
formulas would have us believe there exists some fixed
number of words (the favorite is 29) past which a
sentence is too hard to read. We disagree. We have
seen 10-word sentences that are virtually impenetrable
and, as we mentioned above, 100-word sentences that
flow effortlessly to their points of resolution. In
place of the word-limit concept, we offer the following
definition: A sentence is too long when it has more
viable candidates for stress positions than there
are stress positions available. Without the stress
position's locational clue that its material is intended
to be emphasized, readers are left too much to their
own devices in deciding just what else in a sentence
might be considered important.
In revising the example passage, we made certain
decisions about what to omit and what to emphasize.
We put subjects and verbs together to lessen the reader's
syntactic burdens; we put the material we believed
worthy of emphasis in stress positions; and we discarded
material for which we could not discern significant
connections. In doing so, we have produced a clearer
passage--but not one that necessarily reflects the
author's intentions; it reflects only our interpretation
of the author's intentions. The more problematic the
structure, the less likely it becomes that a grand
majority of readers will perceive the discourse in
exactly the way the author intended.
The information that begins a sentence establishes
for the reader a perspective for viewing the sentence
as a unit.
It is probable that many of our readers--and
perhaps even the authors--will disagree with some
of our choices. If so, that disagreement underscores
our point: The original failed to communicate its
ideas and their connections clearly. If we happened
to have interpreted the passage as you did, then we
can make a different point: No one should have to
work as hard as we did to unearth the content of a
single passage of this length.
The Topic Position
To summarize the principles connected with the stress
position, we have the proverbial wisdom, "Save the
best for last." To summarize the principles connected
with the other end of the sentence, which we will
call the topic position, we have its proverbial contradiction,
"First things first." In the stress position the reader
needs and expects closure and fulfillment; in the
topic position the reader needs and expects perspective
and context. With so much of reading comprehension
affected by what shows up in the topic position, it
behooves a writer to control what appears at the beginning
of sentences with great care.
The information that begins a sentence establishes
for the reader a perspective for viewing the sentence
as a unit: Readers expect a unit of discourse to be
a story about whoever shows up first. "Bees disperse
pollen" and "Pollen is dispersed by bees" are two
different but equally respectable sentences about
the same facts. The first tells us something about
bees; the second tells us something about pollen.
The passivity of the second sentence does not by itself
impair its quality; in fact, "Pollen is dispersed
by bees" is the superior sentence if it appears in
a paragraph that intends to tell us a continuing story
about pollen. Pollen's story at that moment is a passive
one.
Readers also expect the material occupying the topic
position to provide them with linkage (looking backward)
and context (looking forward). The information in
the topic position prepares the reader for upcoming
material by connecting it backward to the previous
discussion. Although linkage and context can derive
from several sources, they stem primarily from material
that the reader has already encountered within this
particular piece of discourse. We refer to this familiar,
previously introduced material as "old information."
Conversely, material making its first appearance in
a discourse is "new information." When new information
is important enough to receive emphasis, it functions
best in the stress position.
When old information consistently arrives in the
topic position, it helps readers to construct the
logical flow of the argument: It focuses attention
on one particular strand of the discussion, both harkening
backward and leaning forward. In contrast, if the
topic position is constantly occupied by material
that fails to establish linkage and context, readers
will have difficulty perceiving both the connection
to the previous sentence and the projected role of
the new sentence in the development of the paragraph
as a whole.
Here is a second example of scientific prose that
we shall attempt to improve in subsequent discussion:
Large earthquakes along a given fault segment do
not occur at random intervals because it takes time
to accumulate the strain energy for the rupture. The
rates at which tectonic plates move and accumulate strain
at their boundaries are approximately uniform. Therefore,
in first approximation, one may expect that large ruptures
of the same fault segment will occur at approximately
constant time intervals. If subsequent main shocks have
different amounts of slip across the fault, then the
recurrence time may vary, and the basic idea of periodic
mainshocks must be modified. For great plate boundary
ruptures the length and slip often vary by a factor
of 2. Along the southern segment of the San Andreas
fault the recurrence interval is 145 years with variations
of several decades. The smaller the standard deviation
of the average recurrence interval, the more specific
could be the long term prediction of a future mainshock.
This is the kind of passage that in subtle ways can
make readers feel badly about themselves. The individual
sentences give the impression of being intelligently
fashioned: They are not especially long or convoluted;
their vocabulary is appropriately professional but
not beyond the ken of educated general readers; and
they are free of grammatical and dictional errors.
On first reading, however, many of us arrive at the
paragraph's end without a clear sense of where we
have been or where we are going. When that happens,
we tend to berate ourselves for not having paid close
enough attention. In reality, the fault lies not with
us, but with the author.
We can distill the problem by looking closely at
the information in each sentence's topic position:
Large earthquakes
The rates
Therefore...one
subsequent mainshocks
great plate boundary ruptures
the southern segment of the San Andreas fault
the smaller the standard deviation...
Much of this information is making its first appearance
in this paragraph--in precisely the spot where the
reader looks for old, familiar information. As a result,
the focus of the story constantly shifts. Given just
the material in the topic positions, no two readers
would be likely to construct exactly the same story
for the paragraph as a whole.
If we try to piece together the relationship of each
sentence to its neighbors, we notice that certain
bits of old information keep reappearing. We hear
a good deal about the recurrence time between earthquakes:
The first sentence introduces the concept of nonrandom
intervals between earthquakes; the second sentence
tells us that recurrence rates due to the movement
of tectonic plates are more or less uniform; the third
sentence adds that the recurrence rates of major earthquakes
should also be somewhat predictable; the fourth sentence
adds that recurrence rates vary with some conditions;
the fifth sentence adds information about one particular
variation; the sixth sentence adds a recurrence-rate
example from California; and the last sentence tells
us something about how recurrence rates can be described
statistically. This refrain of "recurrence intervals"
constitutes the major string of old information in
the paragraph. Unfortunately, it rarely appears at
the beginning of sentences, where it would help us
maintain our focus on its continuing story.
In reading, as in most experiences, we appreciate
the opportunity to become familiar with a new environment
before having to function in it. Writing that continually
begins sentences with new information and ends with
old information forbids both the sense of comfort
and orientation at the start and the sense of fulfilling
arrival at the end. It misleads the reader as to whose
story is being told; it burdens the reader with new
information that must be carried further into the
sentence before it can be connected to the discussion;
and it creates ambiguity as to which material the
writer intended the reader to emphasize. All of these
distractions require that readers expend a disproportionate
amount of energy to unravel the structure of the prose,
leaving less energy available for perceiving content.
We can begin to revise the example by ensuring the
following for each sentence:
- The backward-linking old information appears in
the topic position.
- The person, thing or concept whose story it is
appears in the topic position.
- The new, emphasis-worthy information appears in
the stress position.
Once again, if our decisions concerning the relative
values of specific information differ from yours,
we can all blame the author, who failed to make his
intentions apparent. Here first is a list of what
we perceived to be the new, emphatic material in each
sentence: time to accumulate strain energy along
a fault
approximately uniform
large ruptures of the same fault
different amounts of slip
vary by a factor of 2
variations of several decades
predictions of future mainshock
Now, based on these assumptions about what deserves
stress, here is our proposed revision: Large
earthquakes along a given fault segment do not occur
at random intervals because it takes time to accumulate
the strain energy for the rupture. The rates at which
tectonic plates move and accumulate strain at their
boundaries are roughly uniform. Therefore, nearly constant
time intervals (at first approximation) would be expected
between large ruptures of the same fault segment. [However?],
the recurrence time may vary; the basic idea of periodic
mainshocks may need to be modified if subsequent mainshocks
have different amounts of slip across the fault. [Indeed?],
the length and slip of great plate boundary ruptures
often vary by a factor of 2. [For example?], the recurrence
intervals along the southern segment of the San Andreas
fault is 145 years with variations of several decades.
The smaller the standard deviation of the average recurrence
interval, the more specific could be the long term prediction
of a future mainshock.
Many problems that had existed in the original have
now surfaced for the first time. Is the reason earthquakes
do not occur at random intervals stated in the first
sentence or in the second? Are the suggested choices
of "however," "indeed," and "for example" the right
ones to express the connections at those points? (All
these connections were left unarticulated in the original
paragraph.) If "for example" is an inaccurate transitional
phrase, then exactly how does the San Andreas fault
example connect to ruptures that "vary by a factor
of 2"? Is the author arguing that recurrence rates
must vary because fault movements often vary? Or is
the author preparing us for a discussion of how in
spite of such variance we might still be able to predict
earthquakes? This last question remains unanswered
because the final sentence leaves behind earthquakes
that recur at variable intervals and switches instead
to earthquakes that recur regularly. Given that this
is the first paragraph of the article, which type
of earthquake will the article most likely proceed
to discuss? In sum, we are now aware of how much the
paragraph had not communicated to us on first reading.
We can see that most of our difficulty was owing not
to any deficiency in our reading skills but rather
to the author's lack of comprehension of our structural
needs as readers.
In our experience, the misplacement
of old and new information turns out to be he No.
1 problem in American professional writing today.
In our experience, the misplacement of old and
new information turns out to be the No. 1 problem
in American professional writing today. The source
of the problem is not hard to discover: Most writers
produce prose linearly (from left to right) and through
time. As they begin to formulate a sentence, often
their primary anxiety is to capture the important
new thought before it escapes. Quite naturally they
rush to record that new information on paper, after
which they can produce at their leisure contextualizing
material that links back to the previous discourse.
Writers who do this consistently are attending more
to their own need for unburdening themselves of their
information than to the reader's need for receiving
the material. The methodology of reader expectations
articulates the reader's needs explicitly, thereby
making writers consciously aware of structural problems
and ways to solve them.
Put in the topic position the
old information that links backward; put in the stress
position the new information you want the reader to
emphasize.
A note of clarification: Many people hearing
this structural advice tend to oversimplify it to
the following rule: "Put the old information in the
topic position and the new information in the stress
position." No such rule is possible. Since by definition
all information is either old or new, the space between
the topic position and the stress position must also
be filled with old and new information. Therefore
the principle (not rule) should be stated as follows:
"Put in the topic position the old information that
links backward; put in the stress position the new
information you want the reader to emphasize."
Perceiving Logical Gaps
When old information does not appear at all in a
sentence, whether in the topic position or elsewhere,
readers are left to construct the logical linkage
by themselves. Often this happens when the connections
are so clear in the writer's mind that they seem unnecessary
to state; at those moments, writers underestimate
the difficulties and ambiguities inherent in the reading
process. Our third example attempts to demonstrate
how paying attention to the placement of old and new
information can reveal where a writer has neglected
to articulate essential connections. The enthalpy
of hydrogen bond formation between the nucleoside bases
2'deoxyguanosine (dG) and 2'deoxycytidine (dC) has been
determined by direct measurement. dG and dC were derivatized
at the 5' and 3' hydroxyls with triisopropylsilyl groups
to obtain solubility of the nucleosides in non-aqueous
solvents and to prevent the ribose hydroxyls from forming
hydrogen bonds. From isoperibolic titration measurements,
the enthalpy of dC:dG base pair formation is -6.65±0.32
kcal/mol.
Although part of the difficulty of reading this passage
may stem from its abundance of specialized technical
terms, a great deal more of the difficulty can be
attributed to its structural problems. These problems
are now familiar: We are not sure at all times whose
story is being told; in the first sentence the subject
and verb are widely separated; the second sentence
has only one stress position but two or three pieces
of information that are probably worthy of emphasis--"solubility
...solvents," "prevent... from forming hydrogen bonds"
and perhaps "triisopropylsilyl groups." These perceptions
suggest the following revision tactics:
- Invert the first sentence, so that (a) the subject-verb-complement
connection is unbroken, and (b) "dG" and "dC" are
introduced in the stress position as new and interesting
information. (Note that inverting the sentence requires
stating who made the measurement; since the authors
performed the first direct measurement, recognizing
their agency in the topic position may well be appropriate.)
- Since "dG and "dC" become the old information
in the second sentence, keep them up front in the
topic position.
- Since "triisopropylsilyl groups" is new and important
information here, create for it a stress position.
- "Triisopropylsilyl groups" then becomes the old
information of the clause in which its effects are
described; place it in the topic position of this
clause.
- Alert the reader to expect the arrival of two
distinct effects by using the flag word "both."
"Both" notifies the reader that two pieces of new
information will arrive in a single stress position.
Here is a partial revision based on these decisions:
We have directly measured the enthalpy of hydrogen
bond formation between the nucleoside bases 2'deoxyguanosine
(dG) and 2'deoxycytidine (dC). dG and dC were derivatized
at the 5' and 3' hydroxyls with triisopropylsilyl groups;
these groups serve both to solubilize the nucleosides
in non-aqueous solvents and to prevent the ribose hydroxyls
from forming hydrogen bonds. From isoperibolic titration
measurements, the enthalpy of dC:dG base pair formation
is -6.65±0.32 kcal/mol.
The outlines of the experiment are now becoming visible,
but there is still a major logical gap. After reading
the second sentence, we expect to hear more about
the two effects that were important enough to merit
placement in its stress position. Our expectations
are frustrated, however, when those effects are not
mentioned in the next sentence: "From isoperibolic
titration measurements, the enthalpy of dC:dG base
pair formation is -6.65±0.32 kcal/mol." The authors
have neglected to explain the relationship between
the derivatization they performed (in the second sentence)
and the measurements they made (in the third sentence).
Ironically, that is the point they most wished to
make here.
At this juncture, particularly astute readers who
are chemists might draw upon their specialized knowledge,
silently supplying the missing connection. Other readers
are left in the dark. Here is one version of what
we think the authors meant to say, with two additional
sentences supplied from a knowledge of nucleic acid
chemistry: We have directly measured the enthalpy
of hydrogen bond formation between the nucleoside bases
2'deoxyguanosine (dG) and 2'deoxycytidine (dC). dG and
dC were derivatized at the 5' and 3' hydroxyls with
triisopropylsiyl groups; these groups serve both to
solubilize the nucleosides in non-aqueous solvents and
to prevent the ribose hydroxyls from forming hydrogen
bonds. Consequently, when the derivatized nucleosides
are dissolved in non-aqueous solvents, hydrogen bonds
form almost exclusively between the bases. Since the
interbase hydrogen bonds are the only bonds to form
upon mixing, their enthalpy of formation can be determined
directly by measuring the enthalpy of mixing. From our
isoperibolic titration measurements, the enthalpy of
dG:dC base pair formation is -6.65±0.32 kcal/mol.
Each sentence now proceeds logically from its predecessor.
We never have to wander too far into a sentence without
being told where we are and what former strands of
discourse are being continued. And the "measurements"
of the last sentence has now become old information,
reaching back to the "measured directly" of the preceding
sentence. (It also fulfills the promise of the "we
have directly measured" with which the paragraph began.)
By following our knowledge of reader expectations,
we have been able to spot discontinuities, to suggest
strategies for bridging gaps, and to rearrange the
structure of the prose, thereby increasing the accessibility
of the scientific content.
Locating the Action
Our final example adds another major reader expectation
to the list. Transcription of the 5S RNA
genes in the egg extract is TFIIIA-dependent. This is
surprising, because the concentration of TFIIIA is the
same as in the oocyte nuclear extract. The other transcription
factors and RNA polymerase III are presumed to be in
excess over available TFIIIA, because tRNA genes are
transcribed in the egg extract. The addition of egg
extract to the oocyte nuclear extract has two effects
on transcription efficiency. First, there is a general
inhibition of transcription that can be alleviated in
part by supplementation with high concentrations of
RNA polymerase III. Second, egg extract destabilizes
transcription complexes formed with oocyte but not somatic
5S RNA genes.
The barriers to comprehension in this passage are
so many that it may appear difficult to know where
to start revising. Fortunately, it does not matter
where we start, since attending to any one structural
problem eventually leads us to all the others.
We can spot one source of difficulty by looking at
the topic positions of the sentences: We cannot tell
whose story the passage is. The story's focus (that
is, the occupant of the topic position) changes in
every sentence. If we search for repeated old information
in hope of settling on a good candidate for several
of the topic positions, we find all too much of it:
egg extract, TFIIIA, oocyte extract, RNA polymerase
III, 5S RNA, and transcription. All of these
reappear at various points, but none announces itself
clearly as our primary focus. It appears that the
passage is trying to tell several stories simultaneously,
allowing none to dominate.
We are unable to decide among these stories because
the author has not told us what to do with all this
information. We know who the players are, but we are
ignorant of the actions they are presumed to perform.
This violates yet another important reader expectation:
Readers expect the action of a sentence to be articulated
by the verb.
Here is a list of the verbs in the example paragraph:
is
is...is
are presumed to be
are transcribed
has
is...can be alleviated
destabilizes
The list gives us too few clues as to what actions
actually take place in the passage. If the actions
are not to be found in the verbs, then we as readers
have no secondary structural clues for where to locate
them. Each of us has to make a personal interpretive
guess; the writer no longer controls the reader's
interpretive act.
As critical scientific readers,
we would like to concentrate our energy on whether
the experiments prove the hypotheses.
Worse still, in this passage the important actions
never appear. Based on our best understanding of this
material, the verbs that connect these players are
"limit" and "inhibit." If we express those actions
as verbs and place the most frequently occurring information--"egg
extract" and "TFIIIA"--in the topic position whenever
possible,* we can generate the following revision:
In the egg extract, the availability of TFIIIA
limits transcription of the 5S RNA genes. This
is surprising because the same concentration of TFIIIA
does not limit transcription in the oocyte nuclear extract.
In the egg extract, transcription is not limited by
RNA polymerase or other factors because transcription
of tRNA genes indicates that these factors are in excess
over available TFIIIA. When added to the nuclear extract,
the egg extract affected the efficiency of transcription
in two ways. First, it inhibited transcription generally;
this inhibition could be alleviated in part by supplementing
the mixture with high concentrations of RNA polymerase
III. Second, the egg extract destabilized transcription
complexes formed by oocyte but not by somatic 5S
genes.
[*We have chosen these two pieces of
old information as the controlling contexts for the
passage. That choice was neither arbitrary nor born
of logical necessity; it was simply an act of interpretation.
All readers make exactly that kind of choice in the
reading of every sentence. The fewer the structural
clues to interpretation given by the author, the more
variable the resulting interpretations will tend to
be.]
As a story about "egg extract," this passage still
leaves something to be desired. But at least now we
can recognize that the author has not explained the
connection between "limit" and "inhibit." This unarticulated
connection seems to us to contain both of her hypotheses:
First, that the limitation on transcription is caused
by an inhibitor of TFIIIA present in the egg extract;
and, second, that the action of that inhibitor can
be detected by adding the egg extract to the oocyte
extract and examining the effects on transcription.
As critical scientific readers, we would like to concentrate
our energy on whether the experiments prove the hypotheses.
We cannot begin to do so if we are left in doubt as
to what those hypotheses might be--and if we are using
most of our energy to discern the structure of the
prose rather than its substance.
Writing and the Scientific Process
We began this article by arguing that complex thoughts
expressed in impenetrable prose can be rendered accessible
and clear without minimizing any of their complexity.
Our examples of scientific writing have ranged from
the merely cloudy to the virtually opaque; yet all
of them could be made significantly more comprehensible
by observing the following structural principles:
- Follow a grammatical subject as soon as possible
with its verb.
- Place in the stress position the "new information"
you want the reader to emphasize.
- Place the person or thing whose "story" a sentence
is telling at the beginning of the sentence, in
the topic position.
- Place appropriate "old information" (material
already stated in the discourse) in the topic position
for linkage backward and contextualization forward.
- Articulate the action of every clause or sentence
in its verb.
- In general, provide context for your reader before
asking that reader to consider anything new.
- In general, try to ensure that the relative emphases
of the substance coincide with the relative expectations
for emphasis raised by the structure.
It may seem obvious that a scientific
document is incomplete without the interpretation
of the writer; it may not be so obvious that the document
cannot "exist" without the interpretation of each
reader.
None of these reader-expectation principles
should be considered "rules." Slavish adherence to
them will succeed no better than has slavish adherence
to avoiding split infinitives or to using the active
voice instead of the passive. There can be no fixed
algorithm for good writing, for two reasons. First,
too many reader expectations are functioning at any
given moment for structural decisions to remain clear
and easily activated. Second, any reader expectation
can be violated to good effect. Our best stylists
turn out to be our most skillful violators; but in
order to carry this off, they must fulfill expectations
most of the time, causing the violations to be perceived
as exceptional moments, worthy of note.
A writer's personal style is the sum of all the structural
choices that person tends to make when facing the
challenges of creating discourse. Writers who fail
to put new information in the stress position of many
sentences in one document are likely to repeat that
unhelpful structural pattern in all other documents.
But for the very reason that writers tend to be consistent
in making such choices, they can learn to improve
their writing style; they can permanently reverse
those habitual structural decisions that mislead or
burden readers.
We have argued that the substance of thought and
the expression of thought are so inextricably intertwined
that changes in either will affect the quality of
the other. Note that only the first of our examples
(the paragraph about URF's) could be revised on the
basis of the methodology to reveal a nearly finished
passage. In all the other examples, revision revealed
existing conceptual gaps and other problems that had
been submerged in the originals by dysfunctional structures.
Filling the gaps required the addition of extra material.
In revising each of these examples, we arrived at
a point where we could proceed no further without
either supplying connections between ideas or eliminating
some existing material altogether. (Writers who use
reader-expectation principles on their own prose will
not have to conjecture or infer; they know what the
prose is intended to convey.) Having begun by analyzing
the structure of the prose, we were led eventually
to reinvestigate the substance of the science.
The substance of science comprises more than the
discovery and recording of data; it extends crucially
to include the act of interpretation. It may seem
obvious that a scientific document is incomplete without
the interpretation of the writer; it may not be so
obvious that the document cannot "exist" without the
interpretation of each reader. In other words, writers
cannot "merely" record data, even if they try. In
any recording or articulation, no matter how haphazard
or confused, each word resides in one or more distinct
structural locations. The resulting structure, even
more than the meanings of individual words, significantly
influences the reader during the act of interpretation.
The question then becomes whether the structure created
by the writer (intentionally or not) helps or hinders
the reader in the process of interpreting the scientific
writing.
The writing principles we have suggested here make
conscious for the writer some of the interpretive
clues readers derive from structures. Armed with this
awareness, the writer can achieve far greater control
(although never complete control) of the reader's
interpretive process. As a concomitant function, the
principles simultaneously offer the writer a fresh
re-entry to the thought process that produced the
science. In real and important ways, the structure
of the prose becomes the structure of the scientific
argument. Improving either one will improve the other.
The methodology described in this article originated
in the linguistic work of Joseph M. Williams of the
University of Chicago, Gregory G. Colomb of the Georgia
Institute of Technology and George D. Gopen. Some
of the materials presented here were discussed and
developed in faculty writing workshops held at the
Duke University Medical School.
Bibliography
Williams, Joseph M. 1988. Style: Ten Lessons
in Clarity and Grace. Scott, Foresman, & Co.
Colomb, Gregory G., and Joseph M. Williams. 1985.
Perceiving structure in professional prose: a multiply
determined experience. In Writing in Non-Academic
Settings, eds. Lee Odell and Dixie Goswami. Guilford
Press, pp. 87-128.
Gopen, George D. 1987. Let the buyer in ordinary
course of business beware: suggestions for revising
the language of the Uniform Commercial Code. University
of Chicago Law Review 54:1178-1214.
Gopen, George D. 1990. The Common Sense of Writing:
Teaching Writing from the Reader's Perspective.
To be published.
|