^^Parsing timeline.

Questo e' un estratto di

credits: Parsing Timeline, di jeffreykegler Version 3.1 Revision 2, 23 October 2019



wp/ACE Automatic_Computing_Engine of Alan Turing


4th BCE: Pannini's description of Sanskrit

In India, Pannini creates an exact and complete description of the Sanskrit language, including pronunciation. Sanskrit could be recreated using nothing but Pannini's grammar. Pannini's grammar is probably the first formal system of any kind, predating Euclid. In the future, nothing like it will exist for any other natural language of significant size or corpus. By 2018, Pannini will be the object of serious study. But in the 1940's and 1950's Pannini will be almost unknown in the West. This means that Pannini will have no direct effect on the other events in this timeline.

wp/Pāṇini Panini Pannini

1854: Ada discovers computer "language"

In her translator's note on an article on Babbage's computer, Ada Lovelace becomes the first person to clearly see that programming a computer is a distinct discipline from building the computer itself.

Ada is also the first to think of software in linguistic terms -- she originates the idea that, in writing software, we communicate with the computer, and that the means that we use to do that is a "language".

1906: Markov's chains  >>>

Andrey Markov introduces his chains -- a set of states with transitions between them.

One offshoot of Markov's work will be what will come to be known as regular expressions.

Markov uses his chains, not for parsing, but to deal with a problem in probability -- does the law of large numbers require that events be independent? Indirectly, Markov is addressing the question of the existence of free will.

1913: Markov and Eugene Onegin (ref: Markov chain text generator)

In 1913, Markov revisits his chains, applying them to the sequence of vowels and consonants in Pushkin's Eugene Onegin. Again, Markov's interest is not in parsing. Nonetheless, this is an application to language of what later will be regarded as a parsing technique, and apparently for the first time in the West.

1929 Bloomfield's "Postulates". Structural linguistics. "Language" as of 1929.

Leonard Bloomfield, as part of his effort to create a linguistics that would be taken seriously as a science, publishes his "Postulates". Known as structural linguistics, Bloomfield's approach will dominate American lingustics for over two decades.


Bloomfield's "Postulates" defines a "language" as (ref: Linguaggio formale)

the totality of utterances that can be made in a speech community

Note: no reference to semantic, the usual view: utterances of a language "mean" something.

This omission is not accidental. Bloomfield excludes meaning from his definition of language because he wants linguistics to be taken seriously as science. Behaviorist thought is very influential at this time and behavorists believe that, while human behaviors can be observed and verified and therefore made the subject of science, mental states cannot be verified. Claiming to know what someone means by a word is claiming to read his mind. And "mind-reading" is not science.

1943 rewriting system >>>

Emil Post defines and studies a formal rewriting system using productions. With this, the process of rediscovering Pannini in the West begins.

1949 Operator Issue. Arithmetic expression parsing.

In the form of arithmetic expressions,
operator expressions are the target of the first efforts at automatic parsing.

We will not see this issue go away. Very informally, we can say that

The archetypal examples of operator expressions are arithmetic expressions. >>>

Rutishauser's language is structured line-by-line, as will be all languages until LISP. No language until ALGOL will be truly block-structured.
The line-by-line languages are parsed using string manipulations. A parsing theory is not helpful for describing these ad hoc string manipulations, so they don't give rise to one.

Arithmetic expression parsing.

The only logic in these early compilers that really deserves to be called a parsing method is that which tackles arithmetic expressions.

1954: The FORTRAN project begins

At IBM, a team under John Backus begins working on the language which will be called FORTRAN.

1954 Compiler. "First Glossary of Programming Terminology"

June 1954, ACM committee including John Backus and Grace Hopper
(ACM Association for Computing Machinery www.acm.org)

a executive routine which, before the desired computation is started, translates a program expressed in pseudo-code into machine code (or into another pseudo-code for further translation by an interpreter)."

The "pseudo-code" of the Committee is what in 2018 will be called a "programming langauge".

The definition goes on to detail the various tasks a "compiler" might perform. These include what in 2018 will be called "linking" and "loading", so that the ACM committee's redefinition of compiler can be seen as a broadening of Hopper's use of the term. It is not clear if the ACM committee's new definition captures a current usage of the term "compiler", or if it is a prescriptive redefinition, intended to change usage.

As the computer field develops, the textbook definition will expand beyond the ACM Committee's implementation-focused one, so that a

any translator from one language (the "source language") to another (the "target language")

And linkers and loaders will often not be consider a part of the compiler in the more proper sense. But in its essentials, the most common usage of the term "compiler" in 2018 will be as it is defined by the ACM committee.

automatic coding   codification automatique   Rechenplanfertigung
were programs envisioned even before Hopper first used the term "compiler",
were compilers in the ACM committee sense of the term,
but nobody had called them "compilers".

1956: The Chomsky hierarchy. L'articolo fondante i linguaggi formali

3 Models for the Description of Language. IRE Transactions on Information Theory. September, 1956 - Noam Chomsky (chomsky).

Chomsky publishes his "Three models" paper, which is usually considered the foundation of Western formal language theory. Chomsky demolishes the idea that natural language grammar can be modeled using only Markov chains. Instead, the paper advocates a natural language approach that uses 3 layers:

  1. Chomsky uses Markov's chains as his bottom layer. This will come to be called a compiler's lexical phase.
  2. Chomsky's middle layer uses context-free grammars and context-sensitive grammars. These are his own discoveries. This middle layer will come to be called a compiler's syntactic phase.
  3. Chomsky's top layer, again his own discovery, maps or transforms the output of the middle layer. In years to come, Chomsky's top layer will be the inspiration for the AST transformation phase of compiler

1956 Language term (ref: Linguaggio formale)

In his "Three models" paper, Chomsky says that

By a language then, we shall mean a set (finite or infinite) of sentences, each of finite length, all constructed from a finite alphabet of symbols.

This is exactly Bloomfield's definition, restated using set theory.

Nonetheless signs of departure from the behaviorist orthodoxy are apparent in "Three Models" -- Chomsky is quite willing to talk about what sentences mean, when it serves his purposes. For a utterance with multiple meanings, Chomsky's new model produces multiple syntactic derivations. Each of these syntactic derivations "looks" like the natural representation of one of the meanings. Chomsky points out that the insight into semantics that his new model provides is a very desirable property to have.

1956 Parser term >>>

Chomsky is a turning point, so much so that he establishes or settles the meaning of many of the terms we are using. A parser, for our purposes, is

1956 Recognizer term

In contrast to a parser, a recognizer is a something or someone that takes a string and answers "yes" or "no"

Note that, if we intend to do semantics, a recognizer alone is not sufficient.

In the strict sense, a recognizer cannot drive a compiler -- a compiler needs a parser.

But recognizers can be far easier to write, and are often hacked up and pressed into service as parsers. For example, if your semantics is simple and non-recursive, you might be able to drive your semantics phase with the output of a regex engine, using captures.

1957: Chomsky publishes Syntactic Structures

Noam Chomsky publishes Syntactic Structures, one of the most important books of all time. The orthodoxy in 1957 is structural linguistics which argues, with Sherlock Holmes, that it is a capital mistake to theorize in advance of the facts. Structuralists start with the utterances in a language, and build upward.

But Chomsky claims that without a theory there are no facts: there is only noise. The Chomskyan approach is to start with a grammar, and use the corpus of the language to check its accuracy. Chomsky's approach will soon come to dominate linguistics.

Syntactic Structures, London: Mouton, 1957. Limited preview available in English

1960: Oettinger discovers pushdown automata

Oettinger's paper is full of evidence that stacks are still very new.

stacks  called  "pushdown stores".

DPDA Deterministic PushDown Sutomata, will come to be called Oettinger's mathematical model of stacks.

DPDA's will become the subject of a substantial literature.

1965: Knuth discovers LR grammars

LR(k) grammar  "corresponds with the intuitive notion of

1965 Language

Knuth 1965 uses the Bloomfieldian definition of language, following the rest of the parsing literature at this time. In other words, Knuth defines a language a "set of strings", without regard to their meaning. The Bloomfieldian definition, while severely restrictive, has yet to cause any recognizable harm.

1965 Parsing Problem

eterministic stack-based parsing is "exactly" LR-parsing.

It is also the case that non-deterministic stack-based parsing is "exactly" context-free parsing. This symmetry is very elegant, and suggests that the theoreticians have uncovered the basic laws behind parsing.

While nobody had been more authoritative about the limits of extensions as a definition of language than Chomsky (he had literally written the book), Chomsky himself had used extensions to produce some of his most impressive results.

1968: Lewis and Stearns discover LL

When Knuth discovered the LR grammars, he announced them to the world with a full-blown mathematical description. The top-down grammars, which arose historically, have lacked such a description. In 1968, Lewis and Stearns fill that gap by defining the LL(k) grammars.

Terms: "LL" and "LR"

When LL is added to the vocabulary of parsing, the meaning of LR shifts slightly.

in response, the meaning of LR shifts to become

LL(k) means "scan from the left, using left reductions with k character of lookahead".

LL(1) is important.




The "Perl and parsing" series

I am not one of those who believes that mathematics takes place in vacuum -- it follows trends and even what are more accurately called fashions.


This post is about the influence of minimalism on computer languages and parsing theory. Larry Wall is perhaps the only language design who has managed to escape the influence of minimalism and that is the reason Perl is so different from other languages.


If you require evidence that parsing theory can be as much about trends, ideology and fashion as it is about mathematics and techology, here it is. The dominant parsing algorithm for production compilers today is hand-written recursive descent, exactly the one that was displaced in 1975, when the technology was almost unimaginably less powerful.

credits: jeffreykegler.github.io/blog Posts about Marpa: an annotated list



  1. Parsing Timeline  jeffreykegler
    1. jeffreykegler.github.io/blog/A Marpa-based HTML reformatter
    2. jeffreykegler.github.io/Marpa-web-site/
  2. wp/Pāṇini Panini Pannini



RHS  Right Hand Side

LHS  Left Hand Side

A "blank mind" can observe nothing.

There is no "data" without theory, just white noise. Every "fact" gathered relies on many prejudgements about what is relevant and what is not. And you certainly cannot characterize anything as "impossible", unless you have, in advance, a theory about what is possible.
http://jeffreykegler.github.io/Ocean-of-Awareness-blog/  Sun, 31 Mar 2019


HTML <tt> Tag, "teletype tag".

Not Supported in HTML5.

The <tt> tag was used in HTML 4 to define teletype text.
What to Use Instead?

Consider the <kbd> element (for keyboard input), the <var> element (for variables), the <code> element (for computer code), the <samp> element (for computer output), or use CSS instead.




Ho preferito mettere qui i 2 articoletti

invece che dove vengono richiamati (in Linguaggi formali) poiche' l'articolo mi e' piaciuto molto, e mi ha spinto a mantenerne l'integrita'.