More on "Meaningful" Information


In the ongoing debate about meaningful information at Uncommon Descent, one of the commentators asked:
"Does an of arrangement of nucleobases ‘adenine-cytosine-adenine’ in DNA mean anything?"
This is a surprisingly interesting and revealing question. To attempt to answer it, I would first like to put a limit on the question: let us consider the answer if the nucleotide sequence "adenine-cytosine-adenine" is in DNA (i.e. not RNA). If "meaningful" information is necessarily analogical, as I have suggested in the previous post on "Evolution, Information, and Teleology", then the answer to this question depends upon the circumstances in which the nucleotide sequence ACA is a part. If, for example, this sequence is part of a longer sequence of nucleotides in a longer DNA molecule, then there are several possible answers:

1) the DNA nucleotide sequence ACA could be located in a single strand of DNA that is suspended in a test tube (i.e. not in a living cell) and is therefore completely biologically inert (i.e. it is not binding to a complementary strand of DNA, nor being replicated, nor transcribed, nor translated);

2) the DNA nucleotide sequence ACA could be hydrogen bonded to the complementary sequence TGT (i.e. "thymine-guanine-thymine") in another strand of nucleotides that is anti-parallel with it and close enough to form hydrogen bonds between the nitrogenous bases;

3) the DNA nucleotide sequence ACA could be in a strand that is being replicated by DNA polymerase, which can synthesize the complementary sequence TGT in a newly synthesized strand of DNA;

4) the DNA sequence ACA could be in a strand of DNA that is being transcribed by RNA polymerase, which can synthesize the complementary sequence UGU in a newly synthesized strand of RNA;

5) the DNA sequence ACA could be in a strand of DNA that has already been transcribed by RNA polymerase into the complementary sequence UGU in a strand of mRNA that is bound to a ribosome and can be actively translated into an amino acid sequence in a polypeptide; or

6) the DNA sequence ACA could be in a strand of DNA that has already been transcribed by RNA polymerase into the complementary sequence UGU in a strand of mRNA that is bound to a ribosome and is being actively translated into an amino acid sequence in a polypeptide inside a living cell, within which the polypeptide has a biological function (i.e. participates in those biochemical reactions that maintain the cell alive/against the depredations of the second law of thermodynamics).

In case #1 the DNA nucleotide sequence ACA has no "meaning", in that it is not analogically related to anything. It also has no Shannon information nor Kolmogorov information nor Orgel information either, as it is not in the process of being transmitted or compressed, nor is it "specifying" anything.

In case #2 the DNA nucleotide sequence ACA has no "meaning" because its bonding with its complementary sequence is purely chemical, not analogical. Like the bonding together of water molecules in a snowflake (i.e. the regular crystalline solid form of water), the hydrogen bonding of the nitrogenous bases in complementary DNA sequences is wholly determined by "natural laws", and is therefore neither analogical nor meaningful.

Cases 3 and 4 appear to be the same as in case 2; the relationships between the nucleotide sequences and the bonding patterns therein are entirely the result of chemistry, with no analogical nor meaningful information involved.

However, in cases 5 and 6 we seem to come to a radical discontinuity. In both of these cases, there can be an analogical (and therefore "meaningful") relationship between the nucleotide sequence ACA in DNA and the corresponding amino acid sequence in a translated polypeptide, either in vitro or in a cell. What makes this difference possible (and what may make it necessary) is the analogical relationship between the nucleotide sequence and the corresponding amino acid sequence (if one exists). If the DNA sequence ACA is located in the template strand of an actively transcribed DNA sequence (i.e. a DNA sequence beginning with a promoter to which RNA polymerase can bind) and furthermore its complementary RNA analog is located in an mRNA molecule following the "start" codon AUG but not following a "stop" codon (either UAA, UAG, or UGA, assuming a three-base reading frame), then that the DNA sequence does indeed contain "meaningful" information: it is encoded in one medium, is translated into another medium, and has a function in the system of which it is a part.

It is not yet clear from current research whether or not the amino acid that is "translated" from the DNA sequence ACA (i.e. from the mRNA sequence UGU, assuming that the DNA sequence ACA is in a template strand) is necessarily related to that mRNA sequence. That is, we do not know with confidence whether the relationship between mRNA codons and the amino acids for which they code is purely arbitrary (i.e. the result of a "frozen accident") or if there is some as-yet-undetected necessary (i.e. "natural") relationship between them.

What we can say with reasonable assurance is that what distinguishes "meaningful" information from any other kind of information is not the material into which it is encoded, but rather the relationship between the information encoded in one physical medium and its decoded complement in a related physical medium. As Gregory Bateson pointed out many years ago, meaning is entirely in the relationship between material things; it is not the things themselves. Or, as Alfred Korzybski pointed out,
"The map is not the territory"
In the same way, meaningful information is not the medium in which it is encoded, transmitted, and decoded.

************************************************

As always, comments, criticisms, and suggestions are warmly welcomed!

--Allen

Evolution, Information, and Teleology in Biology


I am currently in the middle of a debate at Uncommon Descent, the leading "intelligent design" website. The debate focuses on the concept of "information": what it is, where it comes from, and what its properties are. In thinking about these questions, I have been struck by how central they are to biology in general and evolutionary biology in particular.

When one uses the term "information", one can be referring to at least four different phenomena: Shannon information, Kolmogorov information,complex specified ("Orgel") information[1], and meaningful information. To me, it appears that the first three types of information – Shannon, Kolmogorov, and complex specified information – are fundamentally different from meaningful information.

What do we "mean" when we say that something is "meaningful"? To me, "meaningful" information is encoded information in which the "bits" of information "encode" (or "stand for") other bits of information via analogy. A meaningful "bit" therefore "stands for" some other bit.

Furthermore, two bits of information that stand for each other necessarily not identical, even if they are written (i.e. symbolized) using exactly the same symbols. That is, two copies of the same symbol may "mean" the same thing, but they are not the "same" symbol, except via analogy. To be the "same" symbol, there could only be one symbol which "stands for itself". This is simply a reinterpretation of Aristotle's law of non-contradiction.

Moreover, it seems to me that not only is meaningful information necessarily analogical, it is also necessarily arbitrary, in the sense that the analogical relationship between the bits of a message and the concept with which those bits is associated is not "natural" (i.e. it is not the result of physical necessity), but rather "non-natural" (i.e. the result of arbitrary semantic association).

For example, consider the meaningful word "two". I can substitute the numeral "2" for the English word "two" without changing the meaning of the word. Indeed, the following words all "mean" the same thing: 2, ii, II, 10 (binary), dué, deux, duo, twa, zwei, etc. [see http://en.wikipedia.org/wiki/2_%28number%29 ] This list can be infinitely extended: 0 + 2, 1 + 1, 2 + 0, 3 - 1, 4 - 2, etc. (and, of course, zero plus two, one plus one, two plus zero, three minus one, four divided by two, ten divided by five, etc.). All of these words and phrases "mean" exactly the same thing: that which we refer to with the English word "two" (or, if you prefer, the Arabic numeral "2").

In the previous example, all of the words and phrases are encoded analogies of the concept of "twoness", none of them are more or less "twoish" than any other (You're twoish? That's funny, you don't look twoish), and indeed none of them are necessarily "twoish" at all. That is, the meaningful relationship between the various words and phrases and "twoishness" is arbitrary or, more precisely, non-natural. We may refer to such meaningful (and ultimately arbitrary) relationships between the "name" and "the thing named" as semantic associations, to distinguish them from non-arbitrary natural relationships.

It appears to me that arbitrary semantic associations such as those symbolized by the numeral "2" are fundamentally different from the natural relationship between the number of protons in an atomic nucleus and its chemical properties. Regardless of what one "calls" a nucleus with two protons ("helium" is the most common name for it, but there are others), and no matter which of the words or phrases one chooses to refer to the number of protons in the nucleus, the chemical and physical properties of the nucleus remains the same [see helium for more about the properties of this element]. Ergo, the "twoness" of the protons in the nucleus of helium is a non-arbitrary, "natural" property of such nuclei, and is therefore not a form of meaningful information.

By contrast, saying that the number of protons in the nucleus of an atom of helium has no more effect on the natural properties of such a nucleus than if one says that there are deux (or twa or zwei) protons in such a nucleus. No matter what you call it nor how you refer to the number of protons in its nucleus, helium is helium is helium (pacé Gertrude Stein).

Given the foregoing, it should be clear that the first three types of information I listed at the beginning of this comment are not necessarily meaningful. That this is the case for Shannon and Kolmogorov information is widely accepted. However, it is also the case for some (but not all) forms of complex specified ("Orgel") information. For example, if one constructs a string of random nucleotides (or any random string of bits), if that string does not subtend a promoter sequence, it will not "code" for the amino acid sequence of a polypeptide. Furthermore, unless such a string subtends a "binding region" (i.e. a sequence to which a protein or RNA molecule may bind via hydrogen bonding) it will also not have a regulatory function in a larger biochemical/cellular system. Under these circumstances, such a random string will not "encode" for any structure or function, but still possesses what Leslie Orgel [1] referred to as "complex specified information".

Ergo, "meaningful information" is analogical information; it "stands for" something else. Furthermore, the relationship between a bit of meaningful information and the thing it stands for is a functional relationship. That is, the meaningful bit specifies the function of the thing for which it stands (i.e. not "Richard Stans"). This means that meaningful information is necessarily teleological, as "functions" are semantically equivalent to "goals" which are semantically equivalent to "ends".

So, teleology must exist in any functional relationship, including those in biology. The question is not "is there teleology in biology"; no less an authority on evolutionary biology than the late Ernst Mayr (not to mention Franciso Ayala) emphatically stated "yes"! The real question (and the real focus of the dispute between EBers and IDers) is the answer to the question, "where does the teleology manifest in biology come from"? EBers such as Ernst Mayr assert that it is an
emergent property
of natural selection, whereas IDers assert that it comes from an "intelligent designer". It has never been clear to me how one would distinguish between these two assertions, at least insofar as they can be empirically tested. Rather, the choice of one or the other seems to me to be a choice between incommensurate metaphysical world views, which are not empirically verifiable by definition.

This is not, however, to say that the distinction between evolutionary and non-evolutionary models of reality is purely and solely a matter of choice of metaphysics. On the contrary, the empirical evidence for evolution is overwhelming, as is the evidence for at least some of the characteristics of living organisms having arisen as the result of natural selection. What is still a matter of dispute is where meaningful information "comes from": does it arise as an emergent property of natural processes (such as natural selection), or must it be "read into nature" from some non-natural source?

That is the question...

REFERENCE CITED:

[1] Orgel, L. (1973) The origins of life, Chapman & Hall, London, UK, pg. 189:
"...living organisms are distinguished by their specified complexity. Crystals are usually taken as the prototypes of simple well-specified structures, because they consist of a very large number of identical molecules packed together in a uniform way. Lumps of granite or random mixtures of polymers are examples of structures that are complex but not specified. The crystals fail to qualify as living because they lack complexity; the mixtures of polymers fail to qualify because they lack specificity."
P.S. Shannon information, Kolomogorov information, and Orgel information need not be perceived to exist, but meaningful information does.

P.P.S. As for the second law of thermodynamics, it seems clear to me from what I know about biology (the only natural science that deals with meaningful information) that both encoding and decoding meaningful information requires the transformation of energy from a condition of lower to higher entropy. This is always the case when meaningful information is “transformed”, whether one is referring to the replication of DNA, the transcription of DNA into RNA, the translation of mRNA into polypeptides, the catalysis of biochemical reactions via enzymes, the transduction of changes in the physical environment into action potentials in the sensory nervous system, the transduction of action potentials in the motor nervous and musculoskeletal systems into behaviors, or the playing of a game of chess (regardless of whether one uses a board and pieces).

************************************************

As always, comments, criticisms, and suggestions are warmly welcomed!

--Allen

Gauss, ID, and the Red Queen Hypothesis


Robert Sheldon has posted a blog entry at Uncommon Descent that is a masterpiece of misdirection, misunderstanding, and mendacity. His post is linked to a longer post at TownHall.com, which I would like to analyze in some detail, as it represents a paradigm of the kind of twisted "logic" that passes for "science" among supporters of "intelligent design". Let's start at the beginning:

First of all, Sheldon asserts that
"a "Gaussian" or "normal" distribution...is the result of a random process in which small steps are taken in any direction."
This is a gross distortion of the definition of a Gaussian distribution. To be specific, a Gaussian distribution is not "the result of a random process in which small steps are taken in any direction". On the contrary, a Gaussian distribution is "a continuous probability distribution that often gives a good description of data that cluster around [a] mean (see http://en.wikipedia.org/wiki/Gaussian_distribution). There is a huge difference between these two "definitions".
• The first – the one invented by Robert Sheldon – completely leaves out any reference to a mean value or the concept of variation from a mean value, and makes it sound like a Gaussian distribution is the result of purely random processes.

• The second – the one defined by Gauss and used by virtually all statisticians and probability theorists – assumes that there is a non-random mean value for a particular measured variable, and illustrates the deviation from this mean value.
Typically, a researcher counts or measures a particular environmental variable (e.g. height in humans), collates this data into discrete cohorts (e.g. meters), and then constructs a histogram in which the abscissa/x axis is the counted/measured variable (e.g. meters) and the ordinate/y axis is the number of individual data points per cohort (e.g. the number of people tallied at each height in meters). Depending on how broad the data cohort, the resulting histogram may be very smooth (i.e. exhibiting “continuous variation”) or “stepped” (i.e. exhibiting “discontinuous variation”).

Graphs of variables exhibiting continuous variation approximate what is often referred to as a “normal distribution” (also called a “bell-shaped curve”). This distribution is formally referred to as a Gaussian distribution, in honor of its discoverer, Carl Friedrich Gauss (this, by the way, is one of only three accurate statements conveyed by Sheldon in the post at TownHall.com). While it is the case that Gaussian distributions are the result of random deviations, they are random deviations from a mean value, which is assumed to be the result of a determinative process.

In the example above, height in humans is not random the way Sheldon defines “random”. If it were, there would be no detectible pattern in human height at all, and we would observe a purely random distribution of human heights from about 0.57 meters to about 2.5 meters. Indeed, we would see no pattern at all in human height, and every possible height would be approximately equally likely.

Instead, we see a bell-shaped (i.e. “normal” or “Gaussian”) distribution of heights centered on a mean value (around 1.6 meters for adults, disregarding gender). The “tightness” of the normal distribution around this mean value can be expressed as either the variance or (more typically) as the standard deviation, both of which are a measure of the deviation from the mean value, and therefore of the variation between the measured values.

Sheldon goes on to state in the post at TownHall.com that “[s]o universal is the "Gaussian" in all areas of life that it is taken to be prima facie evidence of a random process.” This is simply wrong; very, very wrong – in fact, profoundly wrong and deeply misleading. A Gaussian distribution is evidence of random deviation from a determined value (i.e. a value that is the result of a determinative process). Indeed, discovering that a set of measured values exhibits a Gaussian distribution indicates that there is indeed some non-random process determining the mean value, but that there is some non-determined (i.e. “random”) deviation from that determined value.

Why does Sheldon so profoundly misrepresent the definitions and implications of Gaussian distributions? He says so himself:
“Because many people predict that Darwinian evolution is driven by random processes of small steps. This implies that there must be some Gaussians there if we knew where to look.”
This is only the second accurate statement conveyed in the OP, but Sheldon goes on to grossly misrepresent it. It is the case that the “modern evolutionary synthesis” is grounded upon R. A. Fisher’s mathematical model for the population genetics of natural selection, in which the traits of living organisms are both assumed and shown to exhibit exactly the kind of “continuous variation” that is reflected in Gaussian distributions. Fisher showed mathematically that such variation is necessary for evolution by natural selection to occur. In fact, he showed mathematically that there is a necessary (i.e. determinative) relationship between the amount of variation present in a population and the rate of change due to natural selection, which he called
the fundamental theorem of natural selection
.

But in his post at TownHall.com Sheldon goes on to strongly imply that such Gaussian distributions are not found in nature, and that instead most or all variation in nature is “discontinuous”. Along the way, Sheldon also drops a standard creationist canard: “Darwin didn't seem to produce any new species, or even any remarkable cultivars.” Let’s consider these one at a time.

First, most of the characteristics of living organisms exhibit exactly the kind of variation recognized by Gauss and depicted in “normal” (i.e. “bell-shaped”) distributions. There are exceptions: the traits that Mendel studied in his experiments on garden peas are superficially discontinuous (this is Sheldon’s third and only other accurate statement in his post). However, almost any other characteristic (i.e. “trait”) that one chooses to quantify in biology exhibits Fisherian “continuous variation”.

I have already given the example of height in humans. To this one could add weight, skin color, density of hair follicles, strength, hematocrit, bone density, life span, number of children, intelligence (as measured by IQ tests), visual acuity, aural acuity, number of point mutations in the amino acid sequence for virtually all enzymes...the list for humans is almost endless, and is similar for everything from the smallest viruses to the largest biotic entities in the biosphere.

Furthermore, Darwin did indeed produce some important results from his domestic breeding programs. For example, he showed empirically that, contrary to the common belief among Victorian pigeon breeders, all of the domesticated breeds of pigeons are derived from the wild rock dove (Columba livia). He used this demonstration as an analogy for the "descent with modification" of species in the wild. Indeed, much of his argument in the first four chapters of the Origin of Species was precisely to this point: that artificial selection could produce the same patterns of species differences found in nature. No, Darwin didn’t produce any new “species” as the result of his breeding experiments, but he did provide empirical support for his theory that “descent with modification” (his term for “evolution”) could indeed be caused by unequal, non-random survival and reproduction; that is, natural selection.

To return to the main line of argument, by asserting that Mendel’s discovery of “discontinuous variation” undermined Darwin’s assumption that variation was “continuous”, Sheldon has revived the “mutationist” theory of evolution of the first decade of the 20th century. In doing so, he has (deliberately?) misrepresented both evolutionary biology and population genetics. He admits that the “modern evolutionary synthesis” did indeed show that there is a rigorously mathematical way to reconcile Mendelian genetics with population genetics, but he then states
”…finding Gaussians in the spatial distribution of Mendel's genes would restore the "randomness" Darwin predicted….But are Gaussians present in the genes themselves? Neo-Darwinists would say "Yes", because that is the way new information should be discovered by evolution. After all, if the information were not random, then we would have to say it was "put" there, or (shudder) "designed".
And then he makes a spectacular misrepresentation, one so spectacular that one is strongly tempted toward the conclusion that this massive and obvious error is not accidental, but rather is a deliberate misrepresentation. What is this egregious error? He equates the “spatial distribution of Mendel's genes” (i.e. the Gaussian distribution of “continuous variation” of the heritable traits of organisms) with “the distribution of ‘forks’ (i.e. random genetic changes, or “mutations”) in time (i.e. in a phylogenetic sequence).

He does so in the context of Venditti, Meade, and Pagel’s recent letter to Nature on phylogenies and Van Valen’s “red queen hypothesis”. Venditti, Meade, and Pagel’s letter outlined the results of a meta-analysis of speciation events in 101 species of metacellular eukaryotes (animals, fungi, and plants). Van Valen’s “red queen hypothesis” states (among other things) that speciation is a continuous process in evolutionary lineages as the result of “coevolutionary arms races”.

Van Valen suggested (but did not explicitly state) that the rate of speciation would therefore be continuous. Most evolutionary biologists have assumed that this also meant that the rate of formation of new species would not only be continuous, but that it would also be regular, with new species forming at regular, widely spaced intervals as the result of the accumulation of relatively small genetic differences that eventually resulted in reproductive incompatibility. This assumption was neither rigorously derived from first principles nor empirically derived, but rather was based on the assumption that “continuous variation” is the overwhelming rule in both traits and the genes that produce them.

What Venditti, Meade, and Pagel’s analysis showed was that
“… the hypotheses that speciation follows the accumulation of many small events that act either multiplicatively or additively found support in 8% and none of the trees, respectively. A further 8% of trees hinted that the probability of speciation changes according to the amount of divergence from the ancestral species, and 6% suggested speciation rates vary among taxa. “
That is, the original hypothesis that speciation rates are regular (i.e. “clock-like”) as the result of the accumulation of small genetic changes was not supported.

Instead, Venditti, Meade, and Pagel’s analysis showed that
“…78% of the trees fit the simplest model in which new species emerge from single events, each rare but individually sufficient to cause speciation.”
In other words, the genetic events that cause reproductive isolation (and hence splitting of lineages, or “cladogenesis”) are not cumulative, but rather occur at random intervals throughout evolving lineages, thereby producing “…a constant rate of speciation”. Let me emphasize that conclusion again:
The genetic events that cause reproductive isolation…occur at random intervals throughout evolving lineages, thereby producing “…a constant rate of speciation”.
In other words (and in direct and complete contradiction to Sheldon’s assertions in his blog post), Venditti, Meade, and Pagel’s fully support the assumption that the events that cause speciation (i.e. macroevolution) are random:
“…speciation [is the result of] rare stochastic events that cause reproductive isolation.
But it’s worse than that, if (like Sheldon) one is a supporter of “intelligent design”. The underlying implications of the work of Venditti, Meade, and Pagel is not that the events that result in speciation are “designed”, nor even that they are the result of a determinative process like natural selection. Like Einstein’s anathema, a God who “plays dice” with nature, the events that result in speciation are, like the spontaneous decay of the nucleus of a radioactive isotope, completely random and unpredictable. Not only is there no “design” detectible in the events that result in speciation, there is no regular pattern either. Given enough time, such purely random events eventually happen within evolving phylogenies, causing them to branch into reproductively isolated clades, but there is no deterministic process (such as natural selection) that causes them.

Here is Venditti, Meade, and Pagel's conclusion in a nutshell:
Speciation is not the result of natural selection or any other “regular” determinative process. Rather, speciation is the result of “rare stochastic events that cause reproductive isolation.”
And stochastic events are not what Sheldon tried (and failed) to assert they are: they are not regular, determinative events resulting from either the deliberate intervention in nature by a supernatural “designer” nor are they the result of a regular, determinative process such as “natural selection”. No, they are the result of genuinely random, unpredictable, unrepeatable, and irregular “accidents”. Einstein’s God may not “play dice” with nature (although a century of discoveries in quantum mechanics all point to the opposite conclusion), but Darwin’s most emphatically does.

************************************************

As always, comments, criticisms, and suggestions are warmly welcomed!

--Allen