Excerpt from my  "to-do" outline for GELLMU:
(Generalized Extensible LaTeX-Like MarkUp)
[http://www.albany.edu/~hammond/gellmu/veterans.html]

Provide a didactic note explaining why, except for *mistakes* and
foibles, the legacy of more than 200 hundred years of mainstream
typeset mathematical notation is NOT one of ambiguity.  This will not
be prescriptive for future notation but descriptive of how one may
"parse" past mainstream notation.  After all, human readers do it all
the time.  The key is that one needs "type" information for each
expression.  This rests on having type information for each symbol.
Symbols are the (Democritus) "atoms" among expressions.  When given type
information for each symbol, a robot needs to be able to deduce type
information for each expression.


MATHEXPR outline, Very Draft 1.3.1

The URL of the Current Very Draft is:

        http://www.albany.edu/~hammond/gellmu/notation

Actually, one will also need a separate syntax (that is prescribeable)
for "type expression".  It will govern how expression types are built up
recursively from symbol types as expressions are formed.  In this syntax
names need to be pretty much entirely alpha-numeric, perhaps augmented
by '(', ')', and '/' in order to have sanity with an overall design
for "multiple independently-reusable documents (MIRD)".  [ :-) ]

Of course, the long legacy of typeset documents is that authors have
always, relative to their intended audiences, provided adequate type
information in text description apart from formal notation.

This is not yet intended to be as complete as the documentation of
"regular expression" in the E-lisp manual, but I hope that it covers
the most essential points.  The task ultimately is to have a
definition of "mathexpr" that is parallel to the E-lisp definition of
"regexp".  Here I am trying to provide a guide for an author of such a
more formal definition based on my personal observation of legacy
practice.

I would hope that the computer algebra organizations and
representative mathematicians, physicists, etc. at some point soon get
together and furnish a draft for an ultimate public standards
document.  No author will have incentive to create "smart documents"
without a good public standard that is useful for the author's
context.  To cover everyone it needs to be very general.  The "type"
concept used below is only metaphorical.  Its metaphorical nature is
intended to leave room for everyone in a broad general system that
will have many more specific community-level arrangements.

I have avoided below using the "C" word that John Baez has had the
courage to use in his UseNet News series of articles "This Week's
Finds in Mathematical Physics" in order to deflect accusations of
"abstract nonsense".  That said, we have known since 1950, if not
earlier, that anyone who is concerned with *algorithms* involving the
solution of finitely many equations in a finite number of variables
(even a single cubic in two variables) will ultimately confront
something, namely, the array of solutions viewed as a "functor" of how
"non-rational" those solutions are, that is equivalent to what is for
others "abstract nonsense".

It's not really bad news, because, as with combinatorics, there is
very little that most need to know.  One just needs to know that:

  (1)  "objects" stand in for sets.

  (2)  "morphisms" stand in for maps (or functions).

  (3)  a morphism from X to Y may be composed in one and only one way
       with a morphism from Y to Z.

  (4)  composition of morphisms is associative.

  (5)  each object X has a unique identity morphism that composes on
       either side with another another morphism to give back the
       other morphism.

One may not be able to speak about x being "in" X and even if one can,
the more useful thing to consider is often a "point" of X which is
simply a morphism from some other contextually "small" object T that
*may* vary from point to point.  For the primordial "category of sets",
where not much additional structure is involved, one very often looks
only at "points" emanating from a standard one-point set I, and then
a "point" of X is just a member of X.

At one time I was privately disappointed that the W3C MathML group did
not go through this, but I now realize that it was simply too much.
What they did give us is an important target for us, but we still need
to have (1) broader provisions in a public MATHEXPR standard and (2)
succinct LaTeX-like (or TeX-like) mathematical notation for authoring
*with* (for those authors who are so inclined) one-time "type"
declarations in document headers of those symbols that are not
publicly typed (such as "2" or "+" even if such symbols are
over-burdened as both of these examples are).

THE MAJOR POINTS HERE:

    +   But for the choice of "handedness" in the notation for
        "composition", which usually is juxtapostion, with or without
        balanced logical grouping, but which *may* be indicated in
        the literature with some *type* of explicit midline small
        circle or thickened dot (but not with a symbol that in the
        context could be interpreted as an explicit symbol for another
        binary operation), there is no acceptable use in legacy
        notation of arbitrary and capricious choices.

    +  The success of Richard Fateman with a big table of integrals
        and other similar successes, even though procedures used
        may have involved guessing, offer circumstantial evidence
        that legacy mathematical notation is sound.

    +   The missing ingredient for robots is *type information* for
        each symbol.  With type information for each symbol an automatic
        processor may derive the type of each expression.

THE YOGA OF THE "MATHEXPR":

    +   The word symbol denotes an "irreducible" expression.  Each
        symbol has a type.  (A symbol, though mathematically atomic,
        may be represented in visual form as a combination of things.)

    +   Symbols and expressions may be juxtaposed to form expressions.

    +   Each expression derives a type that must be unambiguous as it
        is built up recursively in the same way that each finite string
        of matrix multiplications require matrix sizes to fit.

    +   Expressions may be logically grouped with parentheses.
        The expression  "(x)"  represents the same thing as the
        expression  "x" .  The insertion of parentheses for logical
        grouping may be used to eliminate possible ambiguity or may be
        gratuitous and redundant.

    +   Every expression resolves recursively to a finite (ordered)
        sequence of symbols.

    +   Each expression has a type that is recursively computable from
        the types of its juxtaposed component expressions.

    +   "Composition" (the wellspring of associativity) is the
        default meaning of "juxtaposition".   Of course,
        composition like matrix multiplication is not always defined,
        and there is a hierarchy of four possibilities to consider
        in parsing a pair of juxtaposed expressions.  (Due to the
        assumptions that juxtaposition is composition and that
        composition is associative it is enough to know how to parse
        two in order to know how to parse a finite sequence of juxtaposed
        expressions.)  Let x and y be expressions.

        #   So long as "xy" is not a declared symbol, the expression
            "xy" is equivalent to the expression "x y" or to "x(y)",
            "x (y)", etc.

        #   if  x  has type of the form "B > C" and  y  has type of the
            form "A > B", then  xy  is the composition (in the handedness
            of "U.S. Calculus", i.e., "x after y") of type "A > C".

        #   absent that, the "evaluation" gambit:  if  x  has type  "B > C"
            and  y  has type "B", AND if the types  B  and  C  are
            both under a pointed context with initial object denoted,
            say, I, then an object of type "I > B" is *naturally*
            identified with one of type B and one of type "I > C" is
            *naturally* identified with one of type C, so then one
            sees, with the first identification,  xy  of type "I > C"
            and then, with the second identification,  xy  of type C.

        #   absent that, the "dual-evaluation" gambit (relatively rare,
            I think, but on the way "up" and compelling at this point):
            which is the preceding where all arrows get reversed when
            there is a "final" context.

        #   absent that, the "semi-group action" gambit: if  x  has type
            "G" and  y  has type  "X"  in a context where a semi-group
            action, which is an object  f  of type  "G,X -> X",  is
            available, then  xy  denotes the "evaluation"  f(x,y) ,
            which is of type  "X".  In a context with more than one
            semi-group action available, where there is no explicit
            contextual order of precedence, the following gambit
            applies.

        #   the "binary operation" gambit:
            if x has type G and y has type G where G is a semigroup
            context, i.e., there is an associative operation, say *,
            that is itself of type "G,G > G" with G having an "identity"
            object e, then by long tradition  xy  denotes  "x * y".
            If there is a "semiring" context where one has both + and
            * with both operators providing semigroup context and with
            * distributive with respect to +, then by long tradition
            any use of + must be explicit (and less tightly binding than)
            any use of *, and, therefore, "xy" denotes "x * y" in a
            semiring context.  It is important to establish this usage
            as a composition in order to be able to have assurance that
            mere juxtaposition of a finite list of expressions is not
            ambiguous.  That is done by identifying each object z of G
            with the object of type "G > G" that is given by "sending"
            x to z*x.  (This depends on U.S. calculus handedness; if
            that is reversed, use x*z.)  The object of type "G > G"
            arising this way from z might be denoted "M_z" where the
            subscript means, of course, that M is of type

                                "G > (G > G)" .

            One has:

                             M_x M_y = M_(x*y) ,

            where the juxtaposition on the left side is straightforward
            composition and where, without parentheses on the right side
            one would have

                              M_x*y = (M_x) y ,

            which is the evaluation gambit.

            The preceding statement about the type of M may not be
            literally true if one is in a context involving something
            non set-theoretic like "regularity".  The "identity" object
            is needed here in order to know that  z  is determined by M_z.

        #   There are a few other gambits that one might care to
            formalize or prioritize as an implicit default such as the
            "module gambit" that pertains when vectors are multiplied
            by scalars.  In most contexts, this will automatically fall
            under the semi-group action gambit in a way that is not
            ambiguous if one allows for canonical identifications.
            Most of the other gambits in this category are used only in
            special areas.

        #   The fact that composition is associative means that each
            expression formed is unambiguous if each first order
            subexpression does not involve the exercise of a gambit.  In
            a sequence of expressions of length greater than 2 there can
            be more than one way of descending the gambit rules
            depending on logical grouping.  Explicit grouping does block
            such ambiguity, and for the most part authors have been
            reasonably careful to provide explicit grouping.  In very
            difficult situations sometimes the practice is for an author
            to assign a symbol to what otherwise would be a subexpression
            in order to avoid such ambiguity.


    +   The use of subscripts always resolves to composition with
        the base applied first and the subscript second if it is
        defined, or the reverse order (very idiosyncratic?) if it
        is defined, or otherwise is undefined.

    +   The use of superscripts resolves first to context-appropriate
        exponentiation ONLY if it is rigorously defined and, if not,
        resolves to composition with, in modern notation (e.g., "Galois
        conjugations"), the base applied first and the superscript second
        if that is defined, and otherwise, the reverse composition order
        if defined, or otherwise is undefined.  In older literature the
        composition order precedence may be reversed.

    +   Subscripts bind more tightly than superscripts when superscripts
        are exponential, and both bind more tightly than any other
        implicit notational artifice.

    +   In the presence of subscripts, the use of superscripts as
        arguments without explicit grouping must commute with the
        use of subscripts, i.e., one must have, for expressions x, y, z

                               x_y^z = x^y_z

        in order to use either notation without grouping.

    +   Mixed use of binary operators without grouping is generally an
        error;  for example, the usage in a semiring context of

                              x - y + z

        should be permitted only if it may be understood as

                             x + (-y) + z ,

        that is, if the semiring context is a ring context, which means
        that the symbol - may be regarded as a unary operator.