DECIPHERMENT ALGORITHMS
                    by B.V. Sukhotin
           translated and adapted by Jacques Guy

"Our knowledge of language X" is twofold.

It is the knowledge we use to translate the signs of X into
their respective referents in the physical and ideal worlds.

It is also the knowledge we use to identify some properties of
the signs of X without resorting to their referents. It is
for instance the knowledge of English by which, presented
with the sentence "I was charded by its unflemming thurn",
we identify "charded" as the past participle of "to chard",
"unflemming" as the opposite of "flemming", itself derived
from "to flem", and "thurn" as a substantive. It is also
the knowledge by which we recognize three syllables and
ten letters in "unflemming". In general terms, it is the
knowledge by which the phenomena of X are identified.

Our knowledge of language X in either acceptation, however,
is useless for dealing with language Y, unless X and Y have
at least some features in common. Thus a knowledge of Dutch is
useful to understand Hawaiian only insofar as that they
have some, if few, features in common: both use the Roman
alphabet, and common letters are pronounced rather alike.
On the other hand, a knowledge of English is utterly useless
for Chinese: armed with it we can tell what "goes on" in
an English sentence, but nothing in a Chinese sentence, not
even where words end and start (there are no spaces to
indicate breaks between words), not even if it is language
at all.

An algorithm which identifies a linguistic phenomenon  provides
a convenient definition of that phenomenon in the general form:

    "Phenomenon P is the one identified by algorithm A when
    applied to any text in any language".

We shall call such an algorithm a "decipherment algorithm".

Definitions of linguistic phenomena by their decipherment algorithms
have very desirable properties:

1) the definitions, valid for all languages, are UNIVERSAL,

2) being algorithms, they are PRECISE,

3) finally, they are EFFICACIOUS, for they allow those phenomena
   to be recognized without fail if they occur in any particular
   language.

The importance of decipherment algorithms for linguistic
theory, then, can hardly be overemphasized -- if they exist.

If decipherment algorithms do exist, some of their properties can be
inferred:

If an algorithm B needs information produced by another algorithm A,
it is dependent upon A, and the phenomenon P(B) which it defines cannot
be identified until the phenomenon P(A) defined by A is identified.

For a set of decipherment algorithms to be efficacious, its elements
must be efficacious.

If an algorithm A depends for all its information upon another
algorithm B, which in turn needs all its information from A,
then the phenomena which they define cannot be identified.
Therefore A and B are not efficacious and must be rejected.

For a set of decipherment algorithms to be efficacious,
it must be at least effective, and therefore finite.

Therefore there must exist at least one algorithm in the set
which needs no input from any another.

One can imagine the following hierarchy of decipherment
algorithms in an efficacious set applying to the written
word:

1)   an algorithm to identify the individual symbols
     on the printed page,

2)   an algorithm to classify those symbols, for
     instance, grouping all occurrences of 'a' in
     the same category, regardless of their position
     in the text, and non-distinctive variations
     (e.g. the fonts used),

3)   an algorithm to identify the boundaries of
     the words,

4)   an algorithm to classify those words into
     grammatical categories,

5)   an algorithm to parse the text on the basis
     of the grammatical categories to which its words
     belong,

and so on.

Although the feasibility of algorithm (1) cannot be doubted,
none has been proposed to date.

The feasibility of algorithms (3) to (5), on the other hand,
can be doubted, and this is why some algorithms will be elaborated
and discussed here, which should go some way towards dispelling
doubts about the feasibility of high-order decipherment algorithms.


DISTINCTIVE FEATURES

It has been seen that the definition of a linguistic phenomenon
and the decipherment algorithm by which it is identified are one
and the same.

The list of the distinctive features of a linguistic phenomenon
(that is, the properties of this phenomenon which enable us to
identify its occurrences in a text) is also a definition of this
phenomenon. But whereas the decipherment algorithm that identifies
a given phenomenon is not necessarily unique, the list of its
distinctive features is. In this respect, a definition of a linguistic
phenomenon consisting of the list of its distinctive features is
more general than one consisting of its decipherment algorithm
(or one of its decipherment algorithms).  Lists of distinctive
features, however, generally do not have the desirable properties
of decipherment algorithms, least of all efficacy.

There are two kinds of distinctive features: text-dependent
and text-independent.


THE SET OF ACCEPTABLE SOLUTIONS

A list of text-independent features specifies a set of possible
interpretations of a text, ANY text. Those possible interpretations
constitute the set of ACCEPTABLE SOLUTIONS to the problem
of understanding the text.

Text-independent features are therefore definable without
computation on any text.


THE OBJECTIVE FUNCTION

How well each of those acceptable solutions fits any particular text
is determined by properties of this text, that is, by text-dependent
features. "How well" is then an objective function on the set
of acceptable solutions.

The set of acceptable solutions and the objective function
must be defined so that the phenomenon which they identify
should correspond to an element of the set of acceptable
solutions for which the objective function is minimum or
maximum. The objective function is computed from the text
analyzed.

Definitions of linguistic phenomena by distinctive features
therefore, at this stage, comprise two parts:

   1) a set of acceptable solutions
   2) an objective function.

The two parts must provide enough information for devising
a procedure which identifies the acceptable solution(s) for
which the objective functions reache an extreme. This
identification procedure is the decipherment algorithm
which constitutes the third part of the definition of
the phenomenon.



EQUIVALENCE AND CORRECTNESS: MATHEMATICAL VS LINGUISTIC

Given a set of acceptable solutions and an objective function,
a large number of algorithms may be devised which find an
acceptable solution for which the objective function reaches
an extreme. Such algorithms are MATHEMATICALLY EQUIVALENT.

Two objective functions that reach the same extreme for
the same element of a given set of acceptable solutions
are LINGUISTICALLY EQUIVALENT.

A definition of a linguistic phenomenon is MATHEMATICALLY
CORRECT if its decipherment algorithm yields the absolute
extreme for its objective function calculated for its set
of acceptable solutions.

A definition of a linguistic phenomenon is LINGUISTICALLY
CORRECT if its set of acceptable solutions and its objective
function are such that the (intuitively) right phenomenon is
effectively identified.

Those distinctions provide a basis of collaboration between
mathematicians and linguists.

It must be emphasized that almost all the algorithms to which
we have had access are mathematically incorrect: they cannot
be garanteed to find the extremes of objective functions.

The results obtained are not linguistically perfect, which
should be expected given the mathematical imperfection of
the algorithms, even though some are of surprisingly high
quality.

We assume that the alphabet of a given language is
known, that is, that we are able to identify all occurrences
of a given letter and to establish their inventory. We also
assume that the text is written in a phonemic, not syllabic
or hieroglyphic, system. Remains to be found which letters
correspond to vowels, and which to consonants. The algorithm
works with sounds as well as with letters, so that the
dichotomy letter/sound is not relevant here. Therefore,
the definition of vowels and consonants will be valid
for both the spoken and the written language. Let us first
describe the set of acceptable solutions.


SET OF ACCEPTABLE SOLUTIONS

We consider vowels and consonants to be obtained by a partition
of the alphabet into two subsets: V constituted by vowels, and
C, constituted of consonants. The intersection of V and C is
empty, and their union is the alphabet A.

In some cases the intersection of V and C is not empty, since
a letter can be a vowel as well as a consonant. Thus in Latin
i and u are sometimes vowels, sometimes consonants, e.g.:

            iam que opus exegi

In such cases, the model proposed seems to contradict intuition.
Constructing a decipherment algorithm for the more general model
is far more difficult.

The information according to which vowels and consonants constitute
a partition of an alphabet is of course not enough. If an alphabet
consists of n letters, there are 2^n possible partitions of this
alphabet, in the case of the 24 letters of the Latin alphabet,
some 17 million solutions. Even so, such a definition does
restrict to a certain extent the notion of vowel and consonant.


OBJECTIVE FUNCTION

It is obvious that, in any text, it is less probable that a vowel
will appear after another vowel than after a consonant, as it is
less probable that a consonant will appear after another consonant
than after a vowel. On the contrary, it is quite frequent that
a vowel and a consonant should appear next to each other. If
we consider a random partition of an alphabet, it is unlikely
that it should have such a property. An arbitrarily chosen partition
loses this property the further it diverges from the correct
partition.

We shall describe the combinatorial properties of letters using
a square matrix the rows and columns of which correspond to the
letters of the alphabet. In the cell at the intersection of row x
and column y we shall record the number of times a pair formed
of letters x and y appears (without taking into account the
order of occurrence of these letters) in a given text.

Consider a given partition of an alphabet into two subsets.
Move all the rows and columns corresponding to vowels into
the "northwest cell" of the matrix and draw the limits
separating the vowels from the consonants. We have a matrix
of the type:

.                vowels     consonants
.             .-----------------------.
. vowels      |    1      |     2     |
.             |-----------+-----------|
. consonants  |    4      |     3     |
.             '-----------------------'

Cell #1 will contain the number of combinations of vowels together,
cell #3 the same information for consonants, and the number of
combinations between consonants and vowels will be found in
cells #2 and #4.

If the partition is close to the correct partition, then, according
to our hypothesis, the numbers in 1 and 3 will be small, and those
in 2 and 4 will be large. The worth of a partition can then be
estimated, for instance, from the sum of the numbers in 1 and 3
(the sum of the whole matrix being constant and equal to twice
the number of the letters of the whole text). The smaller the
sum of 1 and 3, the better the partition; the minimum of the
sum corresponds to the best partition.

The process of defining vowels and consonants is now complete:
we can say that vowels and consonants form subsets of a
partition of the alphabet into two classes such that the
number of homogeneous pairs in a given text is minimum.
Which subset corresponds to vowels and which to consonants
remains to be determined. Now, it is extremely probable that
the most frequent letter is a vowel; we can therefore believe
that the subset which contains this letter corresponds to vowels.

THE ALGORITHM

The simplest procedure using the distinctive features mentioned
is trivial. It is sufficient to calculate the matrix of
absolute frequencies of digraphs, to examine each possible solution
(that is, each partition into two subsets) and to calculate for
each the value of the objective function. The partition
which corresponds to the minimum of the function must then be
chosen as the best, and the procedure terminates (in the case
where there would be more than one partition corresponding to
this criterion, all are retaines). Unfortunately, such an
algorithm requires considerable calculations which are far
beyond the capacity of current computers. We propose therefore
a simple procedure which is practically equivalent to the
previous one. It is be reminded that this procedure cannot
guarantee that the minimum will be found in every case, but
the probability of erroe seems fairly low.

We shall represent this procedure in the form of a list
of instructions:

  (1) Build for a given text the matrix [f(x,y)] where f(x,y)
      represents the absolute frequency of the pairs consisting of
      x and y, whatever the order of occurrence of x and y.

  (2) Erase the numbers of the form f(x,x).
      (This operation is justified by the fact that the sum of
      these numbers is constant whatever the partition, and
      is therefore not relevant to the calculation of the
      minimum of the objective function. These numbers would
      even have a disturbing effect owing to certain properties
      of the procedure)

.                                          _n_
.                                          \
. (3) For each letter a, calculate the sum /  f(a,ax)
.                                          --
.                                          x=1



(* The algorithm effectively partitions a text into its
morph components, not morphemes, but it will turn out to be far
more powerful than that: it can be expected to do a morphological
analysis of an unknown text. "Morph" should be read for "morpheme"
whenever the latter occurs in the text *)

We will assume that the text to be partitioned has no spaces
to separate words and we also assume that the spelling used is
phonemic.

(* The first assumption makes the problem more general and
somewhat more difficult. There is no need for the other: the algorithm
will apply to strings of ideograms as well as to strings of phonemes
equally successfully *)

As in the case of the consonant/vowel algorithm, we start by
defining the set of acceptable solutions.


SET OF ACCEPTABLE SOLUTIONS

A partition of a text into disjoint subsets is an acceptable
solution.

A few definitions need to be introduced before going further.

A text is a mapping of an alphabet A (a set of letters) onto a
set of locations L. In other words, a text is a set of 2-tuples
of the type (a,p) where  a  is a letter and  p  its position in
the text.

A string Str(s,t) (where s and t are integer numbers and s>=0)
is a set of 2-tuples of the type {(a[i],p[j])} in which s<p[j]<=t.
A text T is a string Str(0,p), p being its length.

(* The notation is somewhat unclear; a[i] is presumably the ith letter of
the alphabet, p[j] the position in T of the jth letter of Str(s,t).
That definition of a string allows for disconnected strings (eg.
"dfton" is a string of the text "definition": {('d',1),('f',3),
('t',7),('o',9),('n',10)} ). Whether Suhotin actually intended it
to be so is unclear. The definition of a text as a string
Str(0,n) of length n is counterintuitive; Str(1,n) would seem
more appropriate. It does allow for a null text Str(0,0),
yet there is no point is partitioning a null text *)

A partition P of a text Str(0,n) is a set of strings {Str[i](s,t)}
such that to each string Str[i](s,t) for which t<>n one can associate
a string Str[i+1](t,v). Str(s,t) and Str(v,w) have a non-empty
intersection if s<v<t<w or if v<s<w<t. (Since strings are sets,
we can use the terminology of set theory and say that a text is
included in another or that two texts have a non-empty intersection.)

Suppose that some relation of similarity on the set of strings of
text T has been defined.

A morpheme m is the set which contains some string Str(s,t) of T
and all the all strings of T similar to Str(s,t).

Let M be the set of the morphemes of T. Given an arbitrary partition
P of T, some strings of m are also strings of P, and some are not.
We say that the strings of m which are also strings of P are true
occurrences of the morpheme m relatively to P, and those which are
not are false occurrences of m.

If the partition P does not contradict our intuition we can say that
those occurrences are true or false absolutely rather than relatively
to a partition.

For instance, "pint" is a true occurrence of the morpheme "pint"
in 1) below, but it is not in 2), and "ilk" is a false occurrence
of the morphem "ilk" in 1):

1) |A|pint|of|milk|
2) |Keep|in|touch|

A morpheme is true if there is at least one absolutely true
occurrence of it in T.

A definition of "acceptable solution" now emerges: an
acceptable solution consists of:

1) a partition of the text
2) a set of morphemes

Let us now turn to the problem of defining a similarity relation.

A possible definition is:

Two strings Str(s,t) and Str(v,w) are similar when one can
be obtained from the other by translation, that is, if
Str(s,t) = { (a[x],p[j]), (a[y],p[j+1]), ... } and
Str(v,w) = { (a[x],p[y+c]), (a[y],p[j+1+c), ... } in which
c is an integer constant.

A morpheme m is then conveniently represented by the string
Strm(0,n) of length n, similar to any string of m and consisting of
2-tuples (a[x],p[y]) in which 0<p[y]<=n. A morpheme m[i]
alphabetically contains another m[j] if the string Strm[i]
(* which is a set *) contains the string Strm[j].

Given the notion of similarity as introduced above, a partition
uniquely defines a set of morphemes and the set of acceptable
solutions is therefore the set of partitions.

Having defined the set of acceptable solutions, we still have to
find an adequate objective function, then an optimal solution.
But we are now confronted with an interesting phenomenon. Consider the
following examples:

T:   innumerabilis
P1:  |in|numer|abil|is|
P2:  |in|numerabil|is|
P3:  |in|numerabilis|

It is apparent that the three partition are all, in a way, "right";
P1 only seems finer than P2 and P2 finer than P3. Nor can this partition
of a longer text be considered wrong:

P: |innumerabilis|annorum|series|et|fuga|temporum|

This is all the more remarkable that the strings which compose P2,
P3, and P are linguistically meaningful objects: they are roots, words,
or more simply, combinations of morphemes which behave like
isolated morphemes. This is the reason why partition P, for instance,
is not merely an approximation of P1; it has its own significance
because the strings which it evidences are words. It is obvious
too that P is more meaningful than P4 below, even though
P4 is also composed of strings which are combinations of simple
morphemes:

P4: |innumer|abilisannor|umserie|set|fugatempor|um|

We obtain thus a model of morphological analysis consisting of a series
of coarser and coarser partitions, each one of which is right in its own
way, and can be considered to be the approximation of other, finer, ones.

We say that Pi is coarser than Pj if any string of Pj is included
in a string of Pi, but no string of Pj is included in a string of
Pi.

An acceptable solution, which is at the same time a morphological
analysis, is then a series of partitions P0, P1, ...
Pj, ....Pn, each one of which (P0 excepted) is finer than the
previous one.

Such a morphological analysis is conveniently represented by
a tree (similar to that of immediate constituent analysis), or
by bracketing, eg.:


.1)
.               .----------------.
.               |                |
.               |           .--------.
.               |           |        |
.               |       .------.     |
.               |       |      |     |
.             .--.   .-----. .----. .--.
.             |in|   |numer| |abil| |is|


2)            ((((in)))(((numer)(abil))((is))))


A morphological analysis is of course an operation more complex
than a partition. It corresponds, however, to a more simple definition
of a morpheme:

    A morpheme is a set of similar strings the intersection of
    which is empty.

It follows from this definition that any morphological analysis
includes the finest partition Pn, into isolated letters, since there
is no one-letter string which does not have a non-empty intersection
with another string. Likewise, it includes the coarsest partition P0,
which is the whole of the text.

This new definition of an acceptable solution does not entail a
particularly more complex recognition procedure.

Note, however, that the definition of similarity is clearly wanting.
It prevents the algorithm from coping with

1) Sandhi-type morphemic alternances, eg. silex/silicis
2) Infixation, eg. rumpo/ruptus
3) Internal flexion, eg. facio/feci
4) Stress shifts, eg. orno/ornare

This is nothing disastrous, however. The first point at least
(* sandhi-type morphemic alternances *) can be fixed with a finer
definition of similarity relation.

(* Probably not. It is difficult to conceive how a similarity
relation allowing for sandhi could be other than language-
specific.  But if we read "morph" instead of "morpheme", the
problem disappears: the algorithm partitions a text into its
morphs; another algorithm must be devised to group them into
morphemes *)

Before we turn to the objective function, we must generalize
some notions already mentioned and introduce new ones.

A string Str which is a member of a morpheme m is a true
occurrence of m if it is also a member of at least one
partition of the morphological analysis A (or in other
words this string is true for that particular analysis).
Consider now a partition of this string. This partition is
immediate if it consists of true occurrences the number of
which is minimum.


THE OBJECTIVE FUNCTION

Since the number of different possible morphological analyses of any
text of reasonable length is so large, the definition of morpheme
elaborated above is wanting. It lacks a criterion of quality.

A morpheme is set of similar strings which occur REPEATEDLY in
a text. Its members must then be, in some way, stable (this,
however, is only true of those strings which are true occurrences
of the morpheme).

Suppose a measure of stability defined. The value of any
given morphological analysis can then be estimated by the
sum of the stabilities of the morphemes identified in the
process. If the measure of stability is good, the higher the
sum, the better the analysis, with the best analysis
yielding a maximum sum.

Let us consider some possible measurements of stability:

1) The stability of a morpheme is equal to the absolute
   frequency of its true occurrences.
   A seemingly attractive definition because of its simplicity,
   but unsatisfactory for short strings of high-frequency letters,
   which may occur frequently by pure chance.

2) The difference between the expected relative frequency of the
   strings of a morpheme and the observed relative frequency of its
   true occurrences.
   (* No reason is given for not using this measurement. *)

3) A function based on the following properties of morphemes:
   a) the presence of a part of a true occurrence announces the
      the presence of the rest of the morpheme with a high
      degree of likelihood
   b) true occurrences of morphemes are likely to be found
      in very different environments.

  (* Point b) possibly inspired from Zelig Harris's  "From
   phoneme to morpheme" in Language, v.21 No.3, 1945 *)


Call the first property (a) internal stability, the second (b)
external stability.

Consider the internal stability of a string Str(0,6) = "passer"

Suppose that the immediate partition of this string is of the
form:

    (p)(a)(s)(s)(e)(r)

List all the possible partitions of this string into two
connected substrings:

P1 (p)|(a) (s) (s) (e) (r)
P2 (p) (a)|(s) (s) (e) (r)
P3 (p) (a) (s)|(s) (e) (r)
P4 (p) (a) (s) (s)|(e) (r)
P5 (p) (a) (s) (s) (e)|(r)

It is clear that wherever "passe" occurs in a Latin text,
"r" is likely to occur next, and that wherever "asser" is found,
"p" is likely to precede.

Represent the string "passer" onto which partition Pi has
been effected by Si,S'i in which S is its left part
and S'i its right part.

Let the degree of attraction of S'i by Si be measured by the
fraction f(Si,S'i)/f(Si) and that of Si by S'i by f(Si,S'i)/f(S'i),
f being here either the relative or the absolute frequency.

(* That is, the degree of attraction of the right part by the
left part is equal to the number of occurrences of left and
right parts together divided by the total number of occurrences
of the left part, with or without the right part *)

The highest degree of attraction is 1 and the lowest is 0.

Let the internal stability of the string be the mean attraction
of all its partitions and strings of length 1 (single letters)
have a stability of 0 by definition.

External stability can be estimated roughly on a binary scale:
complete stability and complete instability. Now, the absolute
frequency of a morpheme m represented by a string S which
alphabetically includes another, longer string S' itself
representing a morpheme m' cannot be greater than that of m'.
Hence, if m and m' occur with exactly the frequency in the text,
m' is, as it were, "immersed" in m. By definition, the external stability
of a morpheme is 0 if it is "immersed" in another, otherwise it is 1.

Let the stability of a morpheme be the product of
its internal and external stabilities, and the stability
of a morphological analysis the sum of the stabilities
of its morphemes.

(* It follows from those definitions that the stability of
the finest (single letters) is zero. This is intuitively
agreeable, for this partition is trivial *)


RECOGNITION PROCEDURE

The optimal morphological analysis of the text and the list of its
true morphemes are obtained simultaneously.

A list of all the morphemes of a text is a gross approximation of
the list of its true morphemes (it is much longer and contains
many morphemes which are mostly false occurrences). Call this
first approximation L0.

A better approximation, L1, is obtained by eliminating morphemes
of stability 0.

L1 is much shorter L0. Unfortunately at this stage L1 lacks true
morphemes of zero stability but re-entering them later is quite easy.
L1 is convenient, for it contains all the morphemes the frequency
of which is necessary to calculate the stability of any morpheme.

The algorithm consists of five steps.


STEP 1

  Build a list L2 of all the morphemes of absolute frequency
  greater than 1, with each entry in the list consisting
  of the string representing the morpheme, and of its absolute
  frequency.
  Since any morpheme with an absolute frequency of one has
  a stability of zero L2 is shorter than L0 and longer than
  L1 as far as morphemes with a stability of 0 and a frequency
  greater than 1 (* eg. one-letter words *) are concerned.

  L2 can be obtained by the following procedure:

  a) List the alphabet of the text, calculate the absolute
     frequency of each letter. Remove entry with a
     frequency of one. This list is the first fragment,
     or fragment of rank 1, of L2. Represent it by L2(1).

  b) Given a fragment of rank n, the fragment of rank n+1 is obtained
     as follows: build the strings of the form Str(0,n)+Str(n,n+1)
     in which Str(0,n) is a string of L2(n) and Str(n,n+1) a string
     of L2(1); n, transposed, moves to the right; calculate their
     frequencies and eliminate the strings with frequencies not greater
     than 1.

     The procedure terminates when we obtain L2(n) for which L2(n+1)
     is empty. The list L2 is of the sum of these fragments.

     (* In far simpler terms: given a text of length n, list
     all the substrings of it from length 1 to n which occur
     more than once *)

STEP 2

  Calculate the stability of each string of L2.

  (But remember that at this stage only strings of length 1 can
  be considered to constitute true occurrences of morphemes.)

STEP 3

  This  step is based on the assumption that all the
  occurrences of the morpheme with the highest stability
  (greater than zero) are true. Granted this assumption,
  no string which has a non-empty intersection with an
  occurrence of this maximum stability morpheme can be a
  true occurrence of a morpheme.

  Therefore, select the morpheme of L2 with the highest
  stability, and bracket its occurrences in the text
  (* thus carrying out morpheme recognition and
  morphological analysis concurrently *)

  The frequency of every morpheme of L2 with a
  non-empty intersection with N occurrences of the
  selected morpheme is reduced by N. Some false occurrences
  are thus discarded. When during this process the frequency
  of any given morpheme falls below 2 it is eliminated from
  the list.

  Since the status (true or false occurrence) and
  the frequency of strings varies during the procedure, the
  stability of the rest of the remaining morphemes is then
  calculated anew.

  The process is repeated until only morphemes with zero
  stabilities are left.

STEP 4
  We now have a partition of the text expressed by bracketing.
  This partition, however, does not yet represent a morphological
  analysis.

  But first a few definitions.

  A pair of brackets which contains no other brackets is of
  rank 1. A pair of bracket which contains other brackets,
  the highest rank of which is i-1, is therefore itself of
  rank i. A segment of text which is not between
  brackets of rank 1, does not contain brackets, and is
  between brackets of any rank not necessarily forming a
  pair, is considered to be between brackets of rank 0
  (not represented in the text). A segment between two
  brackets of rank i is a segment of rank i.
  A pair of brackets is AUTONOMOUS for rank r if the lowest rank
  of the brackets by which it is contained is r.

  Find the maximum bracket rank of the text. Call it r. Consider
  the text to be between (invisible) brackets of rank r+1.
  Enclose all brackets pairs autonomous for rank r+1 with
  new bracket pairs until the rank of the enclosing brackets
  is equal to r.

  Repeat for all segments of rank r down to 2.

  The morphological analysis -- represented by bracketing
  segments of the text -- is now complete.

STEP 5
  The segments which appeared in brackets for the first time
  at step 4 are true occurrences of zero-stability morphemes
  (eg. morphemes having only one string for member)
  and are added to L2.


Let us call INFORMATIVE texts meant to be deciphered.

Note that this concept is different from that of "meaning" proper,
for a text which carries a meaning is not necessarily meant to be
deciphered without a key. Consequently, meaningless and undecipherable
texts, by definition, cannot be informative.

Since encryption is relatively rare, a characteristic feature of texts
in a natural language will be their high degree of INFORMATIVENESS.
I shall not attempt to formalize this concept any further, as I only
wish to suggest directions for further research.

Take for instance  a problem very similar to the partitioning of a
text into its constituent morphemes. Imagine a text encoded using
n-tuples of integers, so that each letter corresponds to one n-tuple.
Suppose n known. The problem is: find the beginning of the n-tuples.

It is obvious that (supposing the text circular or its beginning and
end missing, so that the solution is not trivial) there are only
n classes of starting positions, since each starting point is
equivalent to the one obtained moving nm positions left or right
(m being an integer). Thus the set of acceptable solutions is
defined: it is the set of partitions of the text into n-tuples,
and this set has n elements.

How can the quality of such a partition be judged?

A random text can be considered to be a sequence of symbols produced
by randomly drawing symbols from a box. The probabilities of the
different symbols must be considered equal and so must be those of the
corresponding n-tuples (if they were not, there would be less reason
to consider the text to be completelty random). It seems obvious that an
informative text must differ considerably from a random text.
Several functions can be proposed to evaluate its degree of
randomness, for instance, the sum of the absolute deviations
from the mean probability, the sum of their squares, or the
first-order entropy of the text, or its infinite-order entropy.

Choosing between those functions is difficult, even though infinite-order
entropy seems best. At any rate, since they are probably all roughly
equivalent for the use to which we are about to put them, the simplest
one will be used here.

Let us encode the Roman alphabet into ternary 3-tuples, i.e.:
a=000, b=001, c=002, d=010, e=011, f=012, g=020, etc...

The short Latin text "Donec eris felix, multos numerabis amicos,
tempora si fuerint nubila, solus eris" becomes:

010111110011002011121022122012011101022210102201101200111122
110201102011121000001022122000102022002111122200011102112111
121000122022012201011121022110200110201001022101000122111101
20112201121022122

We have:

.           Number of 3-tuples occurring N times
.                   in partition