Parallel parsing

(1)

(2)

(3)

Parallel Parsing

Parallel Ontleden

Proefschrift

Ter verkrijging van de graad van doctor aan de Technische Universiteit Delft

op gezag van de Rector Magnificus, prof.drs. P.A. Schenck, in het openbaar te verdedigen ten overstaan van een commissie aangewezen door het College van Dekanen

op dinsdag 29 juni 1993 om 14.00 uur

door

Joannes Paulus Maria de Vreught

informatica ingenieur en ingenieur in de hogere informatica

(4)

prof.dr. S.C. van Westrhenen

Toegevoegd promotor: dr.ir. R. Sommerhalder

De leden van de promotiecommissie zijn:

prof.dr.ir. J. van Katwijk Technische Universiteit Delft prof.ir. S.P.J. Landsbergen Rijksuniversiteit Utrecht prof.dr. J. van Leeuwen Rijksuniversiteit Utrecht prof.dr.ir. A. Nijholt Universiteit van Twente

prof.dr.ir. H.J. Sips Katholieke Universiteit Brabant/ Technische Universiteit Delft dr.ir. R. Sommerhalder Technische Universiteit Delft prof.dr. S.C. van Westrhenen Technische Universiteit Delft

(5)

A reduced reproduction of

Escher’s litho ‘Ascending and Descending’

` `

c

1960 M.C. Escher / Cordon Art - Baarn - Holland. Klimmen en Dalen (Ascending and Descending).

(6)

J.P.M. de Vreught Louis Davidsstraat 98 2551 EX The Hague The Netherlands

CIP-DATA KONINKLIJKE BIBLIOTHEEK, DEN HAAG

Vreught, J.P.M. de

Parallel Parsing / J.P.M. de Vreught. [S.l.: s.n.]. -Ill.

Thesis Technische Universiteit Delft. - With ref. ISBN 90-9006256-4

Subject headings: parallel parsing.

Picture on cover: c 1993, J.J.C.M. de Vreught

c

1993, J.P.M. de Vreught

(7)

Preface

A decade ago when I began my ‘informatics’ study (E.W. Dijkstra once said: ‘informatics is the non Anglo-Saxon English word for computer science’) at the polytechnic of The Hague, no layman over here knew what ‘informatics’ was. Times have changed. Nowadays ‘informatics’ has such a major impact on life that everybody knows the term.

At the polytechnic I found the courses on formal languages and compiler design fascinating. When I continued my study at the university it was a natural choice for me to specialize in these fields. My interest in parallel systems is a consequence of the assignments on distributed systems I’ve been working at during my studies: it’s like a second skin.

Therefore I was glad to be able to do my M.Sc. project on a parallel parser for a real life size natural language grammar. Shortly, after the start of this project it turned out to be an entirely theoretical project. My Ph.D. project is a continuation of this.

The order in which this Ph.D. thesis is presented, isn’t the order in which the research has been carried out. During the last few years I have constantly switched between the more theoretical and the more practical parts.

Chapter 2 and chapter 3 form one coherent theory on parsing context-free grammars in which three algorithms are described, first in the slow parallel case and then in the fast parallel case. In these two chapters ev-erything is tied up to essentially three similar partial parse relations. The presentation of each variant is kept as closely in line with the others as possible. That is exactly what I want to demonstrate: all these algorithms are closely related. The disadvantage for the reader might be a feeling of d´ej`a vu, although every completely analogous proof has been removed from this thesis. The remaining proofs only differ in detail.

Chapter 4 describes a sharpest upper bound for the derivation length of acyclic context-free grammars and, almost as a corollary, its implication on acyclic noncircular attribute grammars. I hesitated where to place it: in the part on context-free grammars or in the part on attribute grammars.

(8)

I’ve decided to place it in the part on attribute grammars.

Chapter 5 was the last chapter I’ve been working on and it is on the slow parallel decorating of all parses simultaneously. When I was working on that chapter I constantly found new problems emerging. Regrettably, there wasn’t enough time for a more thorough investigation.

The bulk of chapter 6 is on a parallel dag contraction algorithm. Work-ing on such an algorithm is quite difficult because of the requirement of low time complexity. There is very little room to manoeuvre; even trivial steps must be analyzed on their contribution to the overall time complexity.

Due to the theoretical subject of this thesis I have included some prac-tical results in an appendix. These kinds of results are quite important but are seldomly discussed in scientific literature. The impressions of these re-sults have been used to adjust research in the right direction and henceforth I have included these results.

Writing a thesis is seldom the work of one person alone. Therefore, I want to thank everybody who helped me during my Ph.D. project. How-ever, many people are worth to be mentioned explicitly.

I want to thank my promotor prof. Chris van Westrhenen for offering me the opportunity to write this thesis. I want to thank my mentor Ruud Sommerhalder for reading the various versions of this thesis and my papers. He has the rare gift of cross linking his constructive remarks throughout entire papers.

I also want to thank my M.Sc. mentor Job Honig. I’ve enjoyed the many discussions we had on parsing. They helped me a lot since they kept me on the right track. Also his computer linguistic knowledge has been a welcome supplement to my knowledge. Similarly, the discussions with Matthew Hurst were quite enlightning. Since he has been working on the practice of parallel parsing, his results were very interesting to me.

I also want to thank Niek IJzinga for his comments on a draft version of this thesis. My gratitude also goes to Trudie Stoute who frequently helped me correcting my English. I want to thank Koos Schonewille for his help with the reproduction of this thesis. I also want to thank my uncle Joop de Vreught whose picture, a visual pun to the ‘dragon’ books, decorates the cover.

I want to thank Pawe l Kruszy´nsky, Jan Hoogerbrugge, Arno Roelofs, Eddy Olk, and Klaas Sikkel. Also the help of prof. Wojciech Rytter, prof. Larry Ruzzo, prof. Jim Hoover, and prof. Satoru Miyano was appreciated. Finally, I want to thank all my roomies, the MMohicans, my colleagues, the secretaries, my friends, and my relatives.

Hans de Vreught The Hague

(9)

Introduction

1.1 Historical Background

When we trace back the origins of this thesis we come to a point where three rivers meet, two large ones and a smaller third one flowing like a meander around the second one:

• Parallel algorithms • Formal languages • Natural languages

The theory of parallel algorithms has been greatly influenced by [Bre74, Coo79, Pip79]. Sequential parsing has flourished by the need of good con-struction techniques for building compilers. Parsing theory is therefore mainly described in books on compiler design. Algorithms for natural languages, however, are often precursors of practical compiler algorithms [Ear68, Ear70, Hay62, You67, Kas65]. These algorithms work for a very wide class of grammars, including ambiguous ones. They are too inefficient to be used for compilers. For natural languages, however, these algorithms are well suited since they can deal with ambiguities.

In the early 1980’s much progress was made on the subject of parallel computing and parallel parsing in particular. In [Ruz80, Ruz81] Ruzzo was the first to describe that parallel recognition of context-free languages was possible in O(log2n) time. Many other people presented their algorithms for various types of language classes or generating devices:

• In [BG84] a recognizer for grammars in Chomsky normal form is described.

(12)

• In [CF84] a VLSI implementation of a parallel version of Earley’s algorithm is described.

• In [Kle85, KR88] recognizers for deterministic context-free languages are described.

• In [BOV85] an optimal parallel algorithm is described that can trans-form any arithmetical expression into its corresponding syntax tree. • In [MR85] a parallel tree contraction algorithm is described. • In [GR86] a parallel pebble game is given.

• In [Ryt85b, Ryt85a, Ryt87] several recognizers and parsers for am-biguous and unamam-biguous grammars in CNF are given.

On these foundations [Vre90b, VH91] were written, on which this thesis is partly based. From roughly the same origin Sikkel started [Sik90, SN90, SN91] but his work took a more linguistic direction while the author of this thesis took a more theoretical direction.

1.2 Parallelism

The field of parallel algorithms is relatively new. In this field there are two mainstreams. One mainstream tries to parallelize sequential algorithms in a rather straightforward way keeping the processor-time product constant (see [Qui87, Akl89]), while the other tries to find parallel algorithms that run in polylogarithmic time (see [GR88, J´aJ92]).

Sequential algorithms are considered fast (or feasible) when they solve their problems in polynomial time. Parallel algorithms, however, are only called fast when they solve their problems in polylogarithmic time. Many sequential algorithms can be transformed into fast parallel algorithms when an exponential number of processors is used. This isn’t very realistic. A parallel algorithm is called feasible when it uses a polynomial number of processors. A parallel algorithm that is both fast and feasible, is called efficient. The class of problems with an efficient parallel algorithm is called Nick’s Class N C (in honor of Nick Pippenger [Pip79]). It is an open question whether or not N C 6= P.

Note that a feasible parallel algorithm running in polynomial time is called slow parallel, while the same algorithm performed by a single pro-cessor (in a round Robin fashion) is called fast (sequentially). Although we call an algorithm feasible if it uses a polynomial number of processors, most people would regard that very generous. Many people call a parallel algorithm practical if it uses a linear number of processors.

(13)

Keeping the processors-time product (p(n) × T (n)) constant is a desir-able feature of an algorithm. Such a property is called scalability. If the algorithm is scalable, we can trade processors for time; halving the number of available processors doubles the running time. For some algorithms the processors-time product is of the same order as the time complexity of the best sequential algorithm known. In such a case the parallel algorithm is said to be optimal.

There are several types of parallel RAM models:

• Concurrent Read Concurrent Write Parallel RAM (CRCW PRAM) • Concurrent Read Exclusive Write Parallel RAM (CREW PRAM) • Exclusive Read Exclusive Write Parallel RAM (EREW PRAM) By definition any EREW PRAM is also a CREW PRAM and any CREW PRAM is also a CRCW PRAM. It is well known that an EREW PRAM can simulate a CRCW PRAM with only a logarithmic overhead [GR88].

In this thesis Brent’s scheduling principle is often used. Without proof we will state Brent’s theorem [Bre74].

Theorem 1.2.1 (Brent) Let A be a given algorithm with parallel compu-tation time t. Assume that A involves a total number of m compucompu-tational operations. Then A can be implemented using O(m/p + t) parallel time. Example 1.2.1 Assume we want to sum n integers. A straightforward parallel algorithm performs this job with O(n) processors in O(log n) time. But when we use Brent’s scheduling principle we could do the same job with O(n/ log n) processors in O(log n) time. Thun we use a factor O(log n) processors less, while we maintain the same order of time complexity.

A second technique which is heavily used is Rytter’s pebble game. We will discuss this pebble game in full depth when we discuss Rytter’s algo-rithm (chapter 3.2).

A third technique that is presumed to be well known is using parallel tree contraction algorithms operating by means of rake operations. We will use the description in [KR90] as the prime example of a parallel tree contraction algorithm (although there the ‘rake’ operation is called ‘shunt’):

Tree Contraction:

Number the n leafs from left to right as 1 . . . n repeatdlg ne times

Rake in parallel all odd numbered left leaves Rake in parallel all odd numbered right leaves Shift out the least significant bit in the leaf numbers

(14)

Rake leaf v:

Let u be v’s father, w be v’s brother ifvhas a grandparent x then

Remove v and u and connecting edges from the tree Add edge u → w to the tree

else

Remove v and u and connecting edges from the tree

1.3 Formal Languages

The theory of formal languages is discussed in many classical textbooks [Sal73, Har78, HU79, Sal85]. The theory of recognizing and parsing is described in [AU72, AU73, SSS88, SSS90] and in the ‘dragon’ books [AU77, ASU86]. The most extensive treatment of attribute grammars can be found in [DJL88]. For a wealth of references on parallel recognizing and parsing the reader is referred to [AANL89, Vre91a].

In the first part of this thesis we will investigate the parsing of context-free languages. A context-context-free grammar (CFG) is a 4-tuple G = (V, Σ, P, S) with vocabulary V , terminal alphabet Σ, productions P , and with start symbol S. In derivations we will denote by αβγ ⇒β αδγ that only β is

allowed to be rewritten, thus β ⇒ δ. Furthermore we will assume that a1. . . an is the input string.

A grammar is said to be in Chomsky Normal Form (CNF) iff all produc-tion rules in P are of the form A → a or A → BC with A, B, C ∈ (V \ Σ) and a ∈ Σ. It can be proved that any context-free language not containing the empty string λ can be described by a grammar in CNF. Since for any free language L the languages L \ {λ} and L ∪ {λ} are also context-free [Har78], we will say that any context-context-free language can be generated by a CFG in CNF, neglecting the empty string.

The process of deciding whether or not a sentence belongs to a given language (described by a grammar) is called recognizing. When the string is recognized as a word in the language, it is often desirable to have the justification of that. The process of obtaining this justification is called parsing. Often the justification is a derivation tree or a description from which this tree can be easily obtained (in this thesis a left parse will be frequently used).

In the realms of parsing theory there exist two major application area’s, compiler design and natural language processing (NLP). When we look at the context-free grammars and input strings involved, we note several differences:

(15)

• Programs usually consist of thousands of lines of code, while the sen-tence length in NLP is usually well below 30 words.

• Ambiguous sentences are very common in natural languages (e.g. ‘the man saw the girl in the park with the telescope’), while in program-ming languages ambiguities are unwanted.

• Derivation trees of programs tend to be very deep and highly skewed. In natural languages trees are fairly balanced.

In this thesis we will focus our attention on grammars for natural languages. Some of the oldest algorithms for recognizing and parsing context-free languages are the Cocke-Younger-Kasami (CYK) algorithm (see [Hay62, You67, Kas65]) and Earley’s algorithm (see [Ear68, Ear70]). Both are tab-ular algorithms that use upper triangtab-ular matrices to encode the possible partial parses. The CYK algorithm requires a grammar in CNF; it will fill the matrix with sets of nonterminals. Earley’s algorithm can be used for any CFG; it will fill the matrix with single dotted items. These are production rules with a dot appearing somewhere in the right-hand sides. This dot denotes the boundary of what has already been recognized and what still has to be processed. In the next chapter we will extend this to double dotted items where the dots play similar roles.

For attribute grammars we will follow the definitions given in [DJL88]. An attribute grammar (AG) consist of an underlying context-free gram-mar G = (V, Σ, P, S) and an associated attribute system. In the attribute system each nonterminal is augmented by a set of attributes, for each pro-duction rule there is a set of semantic rules describing how attribute values can be obtained, and for each production rule there is a set of semantic conditions which must be satisfied by the attributes involved. The most important things to note are:

• Each production rule has a unique set of semantic rules.

• Each production rule has a unique set of semantic conditions.

The process of computing all attribute values in a parse tree is called dec-orating. There are two ways to decorate a derivation tree:

• Dynamically: the evaluation order is computed during the decoration process and is dependent on both the grammar and the string under consideration. Although dynamic decorators can make the best choice when to evaluate the attribute values, they tend to be inefficient due to the extra computations needed to determine the evaluation order.

(16)

• Statically: an evaluation order is determined which is only dependent on the grammar. These decorators tend to be efficient although they seem to impose severe restrictions on the generative power of the attribute grammars.

In [Kui89] static parallel decorators are investigated for general attribute grammars although the emphasis is on attribute grammars used for pro-gramming languages.

1.4 The Outline

This thesis is divided in two parts. The first is on context-free grammars while the second is on attribute grammars.

The part on context-free grammars consists of two chapters. Chapter 2 describes three tabular algorithms for recognizing and parsing context-free languages. These algorithms are closely related. For instance, all three recognizers run with p(n) × T (n) = Θ(n2_{) × Θ(n) on a CRCW PRAM. All}

these algorithms are connected by three partial parse relations.

Chapter 3 describes the fast parallel versions of the three algorithms dis-cussed in chapter 2. These algorithms attain their speed by using proposi-tions whose thruth values have not yet been determined. If the proposition is valid, so are all the consequences of this proposition, but if the proposition turns out to be wrong, all the work concerning this proposition has been in vain. The three algorithms given in this chapter have several resemblances: for instance, the recognizers run with p(n) × T (n) = Θ(n6_{) × Θ(log n) on a}

CRCW PRAM.

The part on context-free grammars is highly coherent: the reader might get a feeling of d´ej`a vu. This is partly done on purpose. The descriptions and their proofs are kept in the same style and presentation order. For this reason the reader is advised to focus his attention to sections 2.4 and 3.3. In the other descriptions only some details are different.

The part on attribute grammars starts with chapter 4, which discusses the consequences of cyclicity of the underlying context-free grammar of a noncircular attribute grammar. [Har78] describes a proof that the deriva-tion length of an acyclic context-free grammar is linear in the size of the length of the generated string. The larger part of chapter 4 is devoted to deriving a sharpest upper bound for the derivation length of acyclic context-free grammars. In that chapter also grammars and strings are given that reach the upper bound. That upper bound has a consequence for noncir-cular attribute grammars that have such an underlying acyclic context-free grammar: it puts a complexity bound on attribute values that can be com-puted.

(17)

Although it is customary to describe the decorating process for a single derivation tree, it is not the most desirable strategy in the case of NLP. In natural language processing it is often necessary to decorate a high number of different derivation trees. Chapter 5 describes a way to pack all parses together into one data structure. We can pack that data structure even further in such a way that we obtain a succinct data structure of size O(n3_).

When decorating that structure, we might get to a point where two parses dictate two mutually different values for one certain attribute: an attribute value clash. Unfortunately, the problem whether a grammar is free of such clashes is recursively unsolvable.

When we know how to handle such clashes during the decorating stage, we can do the decorating on the succinct data structure. Unfortunately, we will only present a parallel decorator for single sweep attribute grammars. Chapter 6 describes several examples of P-complete problems that can be easily constructed by attribute grammars. So we can’t seriously expect to do the decorating stage in polylogarithmic time using only a polyno-mial number of processors. However we will define one group of attribute grammars for which efficient parallel decorators do exist.

The larger part of chapter 6 is devoted to a parallel dag contraction algo-rithm. This algorithm is similar to the various tree contraction algorithms. For dags of reasonable size (i.e. for which the ‘tree size’ is polynomial in the size of the dag) the contraction algorithm runs with O(n/ log n) proces-sors in time O(log2n) on an EREW PRAM. In comparison with the tree algorithms this is for a tree only a factor O(log n) slower.

For arithmetical dags (i.e. dags computing arithmetical expressions) of reasonable sizes there do exist ‘dag’ versions of the tree contraction algo-rithms that don’t have that factor O(log n) overhead. But these algoalgo-rithms won’t work in the case of non-arithmetical operations.

The final chapter is reserved for the epilogue. We will look in retrospec-tive to the facts we have established and give some directions for further research.

(18)

(19)

Context-Free Grammars

(20)

(21)

Slow Parallelism

2.1 Introduction

In sections 2.2 and 2.3 the parallel CYK algorithm and the parallel Earley algorithm are described. These algorithms are based on existing literature and they are only presented for sake of completeness. Section 2.4, on the parallel double dotted algorithm, is the cardinal part of this chapter.

2.2 The CYK Algorithm

The parallel CYK algorithm is a parallelized version of the Cocke-Younger-Kasami algorithm for grammars in CNF. The CYK algorithm was originally described in [Hay62, You67, Kas65]. The algorithm (the recognizer and the parser) can be characterized by the following relation.

Definition 2.2.1 The relation CYK ⊆ {0, . . . , n}2_{× (V \ Σ) is defined as}

follows:

• If A → aj∈ P then (j − 1, j, A) ∈ CYK for any j ∈ {1, . . . , n}.

• If A → BC ∈ P and (i, k, B) ∈ CYK and (k, j, C) ∈ CYK then (i, j, A) ∈ CYK.

• Nothing is in CYK except those elements which must be in CYK by applying the preceding rules finitely often.

An item of the form (0, n, S) is called a root item. The item (j − 1, j, A) in the first rule is called a base item, while the item (i, j, A) in the second

(22)

i ↑ ai @@R ai+1. . . ak B @@R C ak+1. . . aj @@R A · aj+1 k ↑ j ↑ combination · ·

Figure 2.1: Parallel CYK operation

rule is said to be obtained by means of a combination operation (see figure 2.1). Note that combination is a binary operation.

The relation CYK is called a partial parse relation. It can be shown that CYK = {(i, j, A)|A ⇒∗ _a

i+1. . . aj} and thus S ⇒∗ a1. . . an iff

(0, n, S) ∈ CYK.

CYK-Recognizer:

C YK:= ∅ fori:= 1 to n do

for eachj:= 0 to n − i in parallel do ifi= 1 then

C YK_:=C YK_{∪ {(j, j + i, A)|A → a}_j+i_{∈ P }}

else

for eachk:= 1 to i − 1 in parallel do

C YK:=C YK_{∪ {(j, j + i, A)|A → BC ∈ P and}

(j, j + k, B) ∈C YKand (j + k, j + i, C) ∈C YK_}

returnwhether or not (0, n, S) ∈C YK

It can be proved that this algorithm correctly computes the relation C YK _{(see definition 2.2.1). In case of name conflicts we will refer to the} computed relation in the algorithm as CYK.

CYK-Parser(i, j, A): ifi+ 1 = j then

outputA→ aj

else

find in parallela k with i < k < j and A → BC ∈ P such that • (i, k, B) ∈C YK

(23)

outputA→ BC CYK-Parser(i, k, B) CYK-Parser(k, j, C)

It can be shown that this algorithm correctly computes a left parse when called as CYK-Parser(0, n, S) if (0, n, S) ∈ CYK.

2.3 Earley’s Algorithm

The parallel Earley algorithm is a parallelized version of Earley’s algorithm. Earley’s algorithm was originally described in [Ear68, Ear70]. The sequen-tial algorithm (the recognizer and the parser) can be characterized by the following (partial parse) relation.

Definition 2.3.1 The relation ETD⊆ {0, . . . , n}2× {A → α qβ|A → αβ ∈

P } is defined as follows:

• If S → α ∈ P then (0, 0, S → qα) ∈ E_TD.

• If (i, j − 1, A → α qajγ) ∈ ETD then (i, j, A → αajqγ) ∈ ETD.

• If (i, j, A → α qBγ) ∈ E_TD and B → β ∈ P then (j, j, B → qβ) ∈ E_TD_.

• If (i, k, A → α qBγ) ∈ ETD and (k, j, B → β q) ∈ ETD then (i, j, A →

αB qγ) ∈ ETD.

• Nothing is in ETD except those elements which must be in ETD by

applying the preceding rules finitely often.

It can be shown that E_TD = {(i, j, A → α qβ)|S ⇒∗ _a

1. . . aiAγ ⇒A

a1. . . aiαβγ ⇒∗α a1. . . ajβγ} and thus S ⇒∗ a1. . . an iff (0, n, S → α q) ∈

E_TD _{for some S → α ∈ P .}

The first and third rules enforce an algorithm that will compute E_TD to be sequential. These two rules will be replaced by a single rule that won’t enforce sequential behavior. The new rule —the first one in the next definition— slightly changes the partial parse relation, however. The modified partial parse relation is presented in the following definition.

Definition 2.3.2 The relation E ⊆ {0, . . . , n}2_{×{A → α qβ|A → αβ ∈ P }}

is defined as follows:

(24)

• If (i, j − 1, A → α qajγ) ∈ E then (i, j, A → αajqγ) ∈ E.

• If (i, k, A → α qBγ) ∈ E and (k, j, B → β q) ∈ E then (i, j, A → αB qγ) ∈ E.

• Nothing is in E except those elements which must be in E by applying the preceding rules finitely often.

An item of the form (0, n, S → α q) is called a root item. The item (j, j, A → qα) in the first rule is called a base item. The item (i, j, A → αajqγ) in the second rule is said to be obtained by means of a scanner

operation, while the item (i, j, A → αB qγ) in the third rule is said to be obtained by means of a completor operation (see figure 2.2). Note that the scanner operation is a unary operation, whereas the completor operation is a binary one. ai· ↑ i qaj ↑ j − 1 ai+1. . . aj−1 A γ α @@R @@R C C C CCW qaj+1 ↑ j scanner ai· ↑ i q ↑ k ai+1. . . ak qaj+1 ↑ j ak+1. . . aj β A γ α @@R @@R completor B ? A A A AU ?

Figure 2.2: Parallel Earley operations

It can be shown that E = {(i, j, A → α qβ)|A ⇒ αβ ⇒∗

α ai+1. . . ajβ}

and thus S ⇒∗_a

1. . . an iff (0, n, S → α q) ∈ E for some S → α ∈ P .

Earley’s-Recognizer:

E_{:= ∅}

fori:= 0 to n do

for eachj:= 0 to n − i in parallel do ifi= 0 then

E:=E_{∪ {(j, j, A →} qα)|A → α ∈ P } else

E:=E_{∪ {(j, j + i, A → αa}_j+iqγ)| (j, j + i − 1, A → αqaj+iγ) ∈E}

for eachk:= 1 to i − 1 in parallel do

E_:=E_{∪ {(j, j + i, A → αB}q_γ)|

(25)

(j + k, j + i, B → βq) ∈E_}

whileE still changes do

E_:=E_{∪ {(j, j + i, A → αB}q_γ)| (j, j, A → αqBγ) ∈Eand (j, j + i, B → βq) ∈E_} E:=E_{∪ {(j, j + i, A → αB}qγ)| (j, j + i, A → αqBγ) ∈E and (j + i, j + i, B → βq) ∈E_}

returnwhether or not (0, n, S → αq) ∈Efor some S → α ∈ P

It can be proved that this algorithm correctly computes the relation E (see definition 2.3.2). In case of name conflicts we will refer to the computed relation in the algorithm as E.

Earley’s-Parser(i, j, A → αqγ): ifα= λ then outputA→ γ else ifα= α0_a j then Earley’s-Parser(i, j − 1, A → α0q_a jγ) else letα= α0_B_{with B ∈ V \ Σ}

find in parallela k with i ≤ k ≤ j and a B → β ∈ P such that • (i, k, A → α0q_{Bγ) ∈}_E

• (k, j, B → βq) ∈E

Earley’s-Parser(k, j, B → βq) Earley’s-Parser(i, k, A → α0q_Bγ)

It can be shown that this algorithm correctly computes a left parse for acyclic CFGs when called as Earley’s-Parser(0, n, S → α q) for some (0, n, S → α q) ∈ E.

2.4 The Double Dotted Algorithm

We will start with the partial parse relation on which the double dotted algorithm is based. We will investigate in full depth which properties this partial parse relation has. These properties will be heavily used in the rest of this and the next chapter. Due to these properties the description of the recognizer and the parser can be relatively short.

2.4.1 The Partial Parse Relation

In this section we define a partial parse relation D similar to CYK and E. It is used to describe what the double dotted algorithm computes and to

(26)

construct and prove the correctness of the fast parallel algorithm of section 3.4. Although much of the basic theory of the partial parse relations is only relevant to the next chapter, we prefer to give all basic theory here.

The inductive definitions of the different partial parse relations offer a way of proving whether some item is valid or not. Such a proof can be represented by a proof tree, showing why the item is in the relation according to the definition. We will call such trees composition trees since they show how an item is composed from other items in the relation.

Parsing can be regarded as extracting a composition tree for a root item, since we can easily transform a composition tree into a derivation tree. In general there are items in the partial parse relation that are never used in any composition tree of every root item; those items are superfluous. Therefore we would like to preclude those items from the partial parse relations. This is called filtering. Note that after filtering we are still able to retrieve all parses.

In the next chapter we will drop this requirement. Since the objective of being able to retrieve any parse is too ambitious, we only want to retrieve one parse. After having computed the partial parse relation, we will try to reduce this relation in such a way that we only have the items of one composition tree of a root item. Transforming the resulting items into a composition tree is trivial.

Definition 2.4.1.1 The relation D ⊆ {0, . . . , n}2_{× {A → α qβ qγ|A →}

αβγ ∈ P } is defined as follows:

• If A → λ ∈ P then (j, j, A → q q) ∈ D for any j ∈ {0, . . . , n}.

• If A → αajγ ∈ P then (j − 1, j, A → α qajqγ) ∈ D for any j ∈

{1, . . . , n}.

• If A → αBγ ∈ P and (i, j, B → qβ q) ∈ D then (i, j, A → α qB qγ) ∈ D_.

• If (i, k, A → α qβ1qβ2γ) ∈ D and (k, j, A → αβ1qβ2qγ) ∈ D then

(i, j, A → α qβ1β2qγ) ∈ D.

• Nothing is in D except those elements which must be in D by apply-ing the precedapply-ing rules finitely often.

An item of the form (0, n, S → qα q) is called a root item. The items (j, j, A → q q) in the first rule and (j − 1, j, A → α qajqγ) in the second

rule are called base items. The item (i, j, A → α qB qγ) of the third rule is said to be obtained by means of the inclusion operation, while item (i, j, A → α qβ1β2qγ) is said to be obtained by means of the concatenation

(27)

operation (see figure 2.3). Note that inclusion is a unary operation, while concatenation is a binary operation.

aiq ↑ i β @@_{R q} aj+1 ↑ j B A α ?@@Rγ ? ai+1. . . aj inclusion aiq ↑ i q ↑ k ai+1. . . ak qaj+1 ↑ j ak+1. . . aj β1 β2 A H HHHj γ α = AAAU Z ZZ~ J J ^ concatenation

Figure 2.3: Double dotted operations

Definition 2.4.1.1 offers a way of justifying the presence of an item x in the relation D. A justification is a sequence of rules corresponding to a proof showing why x ∈ D. Sometimes an item x can be justified in more than one way. We will consider justifications one at the time. A complete justification of an item x in D will be called a composition for x. Such a composition can be represented by a composition tree, denoted by Tx. The

nodes in Txare labeled with the items mentioned in the antecedents of the

applied rules of definition 2.4.1.1; the root is labeled x.

Example 2.4.1.1 Suppose w is the result of an inclusion of x, x is the result of a concatenation of y and z, and y and z are base items. The composition tree Twfor w is as given

alongside. Tw: w x y z ? @ @ @@R

Next we will show that D = {(i, j, A → α qβ qγ)|A ⇒ αβγ ⇒∗

β αai+1. . .

ajγ and if β = λ then αγ = λ} and thus S ⇒∗ a1. . . an iff (0, n, S →

qα q) ∈ D for some S → α ∈ P.

Lemma 2.4.1.1 Let D0 = {(i, j, A → α qβ qγ)|A ⇒ αβγ ⇒∗

βαai+1. . . ajγ

and if β = λ then αγ = λ}. Then D ⊆ D0.

Proof:

We proceed by induction on the number of operations m of a min-imal composition (with respect to the number of operations) for (i, j, A →

(28)

α qβ qγ) ∈ D showing that (i, j, A → α qβ qγ) ∈ D0. Assume (i, j, A → α qβ qγ) ∈ D.

• First the base case where m = 0. There are two possibilities: ◦ i = j and β = λ

Thus (j, j, A → q q) ∈ D, but then A → λ ∈ P and therefore (j, j, A → q q) ∈ D0.

◦ i = j − 1 and β = aj

Thus (j − 1, j, A → α qajqγ) ∈ D, but then A → αajγ ∈ P and

therefore (j − 1, j, A → α qajqγ) ∈ D0.

Hence the base case is proved.

• Now the inductive case. Assume that the induction hypothesis holds for all ` < m and assume m > 0. There are two possibilities:

◦ The last operation is an inclusion. In that case we have for some B → β0_:

∗ β is a nonterminal B, i.e. the item is (i, j, A → α qB qγ) ∗ A → αBγ ∈ P

∗ (i, j, B → qβ0q) ∈ D

Suppose αγ = λ and A = B = β0_{. Then we have that item}

(i, j, A → α qB qγ) = (i, j, B → qβ0q) = (i, j, A → qA q) is

al-ready in D, contradicting the minimality of m. Otherwise we have (i, j, B → qβ0q) ∈ D0 _{by the induction hypothesis. Thus}

we have B ⇒ β0 _⇒∗_a

i+1. . . aj and so (i, j, A → α qB qγ) ∈ D0.

◦ The last operation is a concatenation. In that case we have for some k: ∗ β = β1β2 ∗ |β1|, |β2| ≥ 1 ∗ A → αβ1β2γ ∈ P ∗ (i, k, A → α qβ1qβ2γ) ∈ D ∗ (k, j, A → αβ1qβ2qγ) ∈ D

By the induction hypothesis we have (i, k, A → α qβ1qβ2γ) ∈ D0

and (k, j, A → αβ1qβ2qγ) ∈ D0. Thus we have β1⇒∗ai+1. . . ak

and β2 ⇒∗ ak+1. . . aj, which can be combined to β1β2 ⇒∗

ai+1. . . aj and so (i, j, A → α qβ1β2qγ) ∈ D0.

(29)

Hence the lemma is proved. 2

Now the opposite direction.

Lemma 2.4.1.2 Let D0 = {(i, j, A → α qβ qγ)|A ⇒ αβγ ⇒∗

βαai+1. . . ajγ

and if β = λ then αγ = λ}. Then D ⊇ D0.

Proof:

The proof is doubly inductive. We will prove by induction on the number of steps m in the derivation β ⇒m_a

i+1. . . aj for an item (i, j, A →

α qβ qγ) ∈ D0that (i, j, A → α qβ qγ) ∈ D. Assume (i, j, A → α qβ qγ) ∈ D0. • First the overall base case where m = 0. We will prove the base case

by induction on the length of β.

◦ For the base case there are two possibilities: ∗ |β| = 0

So we have i = j and β = λ. As a result of this A → λ ∈ P and therefore (j, j, A → q q) ∈ D.

∗ |β| = 1

So we have i = j − 1 and β = aj. As a result of this

A → αajγ ∈ P and therefore (j − 1, j, A → α qajqγ) ∈ D.

Hence for the nested base case (i, j, A → α qβ qγ) ∈ D.

◦ Now the inductive case. Assume the induction hypothesis holds for m = 0 and for all strings smaller than β. Assume |β| > 1. In that case we have for some k:

∗ β = β1β2 with |β1|, |β2| ≥ 1

∗ β1⇒∗ai+1. . . ak

∗ β2⇒∗ak+1. . . aj

and thus (i, k, A → α qβ1qβ2γ) ∈ D0and (k, j, A → αβ1qβ2qγ) ∈

D0_{. We have by the hypothesis that (i, k, A → α qβ}₁qβ₂_{γ) ∈ D} and (k, j, A → αβ1qβ2qγ) ∈ D, because |β1|, |β2| < |β|. But

then we have with a concatenation that (i, j, A → α qβ1β2qγ) =

(i, j, A → α qβ qγ) ∈ D. Hence the base case is proved.

• Now the overall inductive case. Assume that the induction hypothesis holds for all ` < m and assume that m > 0 (thus β 6∈ Σ∗_{). The proof}

of the inductive case proceeds by induction on the length of β. ◦ The base case is where the length of β is 1 and in that case β is a

nonterminal B. Assume B ⇒m_a

i+1. . . aj for some B → β0∈ P

(30)

∗ B ⇒ β0_{⇒ a}

i+1. . . aj

∗ β0 _⇒m−1 _a

i+1. . . aj

and thus we have (i, j, B → qβ0q) ∈ D0

. But then we have by the induction hypothesis that (i, j, B → qβ0q) ∈ D. But then

we have with an inclusion that (i, j, A → α qB qγ) = (i, j, A → α qβ qγ) ∈ D.

◦ Now the inductive case. Assume the induction hypothesis holds for all strings smaller than β. Assume |β| > 1. In that case we have for some k:

∗ β = β1β2 with |β1|, |β2| ≥ 1

∗ β1⇒∗ai+1. . . ak

∗ β2⇒∗ak+1. . . aj

and thus (i, k, A → α qβ1qβ2γ) ∈ D0and (k, j, A → αβ1qβ2qγ) ∈

D0_{. We have by the hypothesis that (i, k, A → α qβ}₁qβ₂_{γ) ∈ D} and (k, j, A → αβ1qβ2qγ) ∈ D, because |β1|, |β2| < |β|. But

then we have with a concatenation that (i, j, A → α qβ1β2qγ) =

(i, j, A → α qβ qγ) ∈ D. Hence the inductive case is proved.

2

Combining the two previous lemmas we get the following corollary.

Corollary 2.4.1.1 D = {(i, j, A → α qβ qγ)|A ⇒ αβγ ⇒∗

β αai+1. . . ajγ

and if β = λ then αγ = λ}.

2.4.2 Compositions

For each item we can show that there exists a composition of a very special form: each item mentioned in that composition does not depend (directly or indirectly) on its own presence. Such compositions are said to be acyclic. One pleasant fact about acyclic compositions is that the size of such com-positions is linear in size (with respect to the string length).

This will not only be true for D but also for CYK (trivially) and for E_{. Most of the proofs for the other two relations are, apart from the names} of the operations, literally the same. As a matter of fact the arity of the operations tends to be more important than the operations themselves. In such a case we will explicitly mention the arity of the operation.

Note that CYK only has a binary operation called combination, E has a unary scanner operation and a binary completor operation, and D has a

(31)

unary operation called inclusion and a binary operation called concatena-tion.

The presence of unary operations complicates the proofs of the fast parallel algorithms considerably. These proofs will be discussed in the next chapter. We can partially avoid this by replacing the unary operations by binary ones. This will be done by introducing dummy nodes that can act as dummy operands for the new binary operations. In other words, one operand will be a dummy item ∆ and the other item will be the old operand of the old unary operation.

In the case of D we will get the following situation. In definition 2.4.1.1 we replace the rule:

• If A → αBγ ∈ P and (i, j, B → qβ q) ∈ D then (i, j, A → α qB qγ) ∈ D

by the two rules:

• ∆i∈ D for any i ∈ {0, . . . , n},

• If ∆i∈ D, A → αBγ ∈ P , and (i, j, B → qβ q) ∈ D then (i, j, A →

α qB qγ) ∈ D.

Clearly, apart from the dummy items, this will not alter D. A composition tree for item x according to this new definition will be called a binary com-position tree Bx. Having such a binary composition tree we can make use

of Rytter’s pebble game which is defined for binary trees. It is a parallel combinatorial game that starts with a binary tree in which every leaf is pebbled. In successive iterations other nodes of the tree get pebbled. The game stops when all nodes are pebbled. For this pebble game Rytter es-tablished a link between the number of leaves and the moment when the root gets pebbled. The structure of the fast parallel algorithms will be such that without much difficulty we can link the result of the pebble game to the moment when an item is entered into the partial parse relation.

Example 2.4.2.1 Suppose w is the re-sult of an inclusion of ∆i and x, x is

the result of a concatenation of y and z, and y and z are base items. The binary composition tree Bw for w is as given

alongside. Bw: w x y z ∆i A A AAU A A AAU

(32)

There are some correspondences between the sizes of a composition, its composition tree, and its binary composition tree. The number of oper-ations in a composition is equal to the number of internal nodes of both a composition tree and a binary composition tree. Moreover the number of leaves of a binary composition tree is one more than the number of its internal nodes (a property of any binary tree).

A composition tree Tx can be compacted when we merge nodes that

have the same labels. The resulting graph is called the composition graph Gx. We may and will assume that the nodes of Gxare items. For grammars

that allow cyclic derivations like S ⇒ S ⇒ λ we have an infinite number of different composition trees for (0, 0, S → qS q) but only one composition graph for (0, 0, S → qS q). We will see that for each item only a finite number of composition graphs exists. In fact, apart from the edges we can regard D as the union of all valid composition graphs. Composition graphs will serve as an important tool for proving the correctness of the various algorithms.

Example 2.4.2.2 Suppose w is the re-sult of an inclusion of w and also of x, x is the result of a concatenation of y and z, and y and z are base items. The composition graph Gw for w is as given

alongside. Gw: w x y z ? @ @ @@R ?

Lemma 2.4.2.1 Let G = (V, E) be the union of every possible composition graph. Then V = D.

Proof:

The inclusion V ⊇ D is immediate. So we only have to prove that V ⊆ D. Assume x ∈ V . Then x is a node in a composition graph, say, Gw.

Let G0 _{be the subgraph rooted at x. Then G}0 _{is a composition graph G} x

for x. But since Gx is obtained from a certain Tx, we have a justification

for x ∈ D. 2

Now it is clear why only a finite number of composition graphs for a certain item can exist. Each composition graph must be a subgraph of G. Since the number of subgraphs of G is finite, so the number of composition graphs must be.

Some composition trees can be regarded as being more elementary than others. In some composition trees certain items can occur on a path many

(33)

times. This can be regarded as a circularity in the proof tree: the item higher in the tree is a consequence of the same item lower in the tree. We will show that there exists a composition tree without such circularities for any item x ∈ D. Furthermore we will show that the number of operations in such a composition is linear in the length of the string to be recognized. Definition 2.4.2.1 A composition for x is called cyclic iff the composition graph Gx is cyclic. In all other cases the composition is acyclic. The

composition tree Tx and the binary composition tree Bx are called cyclic

iff on a path in that tree there exist two distinct nodes with the same label. In all other cases the trees are called acyclic.

Note that a ‘cyclic tree’ is a contradiction in terms. A tree is acyclic by definition. In this case the predicate cyclic does not refer to the shape of the tree but to the composition of that composition tree. In figure 2.4 two trivial examples of acyclic and cyclic composition trees are given. Note that if we replace the cyclic x → y → x part by just x, we have transformed a cyclic composition tree into an acyclic one. In these trivial examples only inclusions were used. We will handle the general case later in this section. If a cyclic grammar also has nullable nonterminals, concatenations can occur in the cyclic part. To be able to have a cyclic part the other son must have a nullable string between its two dots, of course. This will ensure that a similar cycle x → . . . → x can be replaced by x in those cyclic composition trees without any problems.

Cyclic z y x ? ? ? x Tx: Acyclic z? x Tx:

Figure 2.4: Composition trees

Theorem 2.4.2.1 Let x ∈ D with a given composition and let Tx, Bx,

and Gx be according to that composition. A composition for x is cyclic iff

the composition graph Gx is cyclic iff the composition tree Tx is cyclic iff

the binary composition tree Bx is cyclic.

(34)

• A composition for x is cyclic iff the composition graph Gx is cyclic:

by definition.

• The composition graph Gx is cyclic iff the composition tree Tx is

cyclic.

◦ First the if part. Assume Txis cyclic and that an item y occurs

at least twice as a label on a path (in other words: there exists a non-trivial path in Txfrom a node with label y to another node

with label y). In Gxthose two nodes are merged into one node

y. But then there exists a non-trivial path from y to y in Gx.

So Gxis cyclic as well.

◦ For the only if part assume Gx is cyclic. Let y, possibly x itself,

be a node nearest to x (following the direction of the edges) that is element of a cycle. The cycle in Gx can only arise if a

descendent of y, possibly y itself, makes a reference to y in an operation. But that means that in Tx there must exist a

non-trivial path from a node with label y to another node with a label y. Therefore Txis cyclic too.

• The composition tree Tx is cyclic iff the binary composition tree Bx

is cyclic. This is obvious.

2

Next we will define the height of a tree. It states that the height of a tree is the length of the longest path in the tree.

Definition 2.4.2.2 The height H of a tree T is defined as follows: • If T is a leaf then H(T ) = 0.

• If T is not a leaf then H(T ) = 1 + max{H(T0_)|T0 _{is a subtree of T}

and T0_{6= T }.}

By the pigeon hole principle we obviously have the following results. Lemma 2.4.2.2 Let Tx be a composition tree. If H(Tx) ≥ |D| then Tx is

cyclic.

Corollary 2.4.2.1 Let Bx be a binary composition tree. If H(Bx) ≥ |D|

then Bx is cyclic.

Definition 2.4.2.3 An edge x → y is called a brother edge of x → z iff at least one of the following assertions holds:

(35)

• x can be obtained by the unary inclusion of y = z (i.e. x → y is its own brother edge),

• x can be obtained by the binary concatenation of y and z (or z and y).

An edge x → y and its brother edge x → z contribute to one operation in the composition graph. We assume a virtual brother edge in the case of the unary operation. Note that x → y is one of the brother edges of its brother edges.

Theorem 2.4.2.2 Every item x ∈ D has an acyclic composition.

Proof:

Let x ∈ D have a cyclic composition (for otherwise there is nothing to prove). Then the corresponding Gx is cyclic too. The acyclic graph G

is defined as follows:

• If y is leaf (or sink) in Gx then y is a node in G.

• If y → z is an edge in Gxand z is a node in G then if adding the edge

y → z to G does not create a cycle then y is a node and y → z is an edge in G.

• Nothing is in G except those elements which must be in G by applying the preceding rules finitely often.

Clearly G is an acyclic subgraph of Gx and it is easy to verify that x is a

node in G. We transform G into a composition graph G0

xas follows:

• x is a node in G0 x.

• If y is a node in G0

x and both y → z and one of its brother edges is

in G, then z is a node in G0

x and y → z is an edge in G0x.

• Nothing is in G0

x except those elements which must be in G0x by

applying the preceding rules finitely often. Clearly G0

x is indeed a composition graph and obviously acyclic. But then

there exists an acyclic composition for x. 2

Corollary 2.4.2.2 The trees Txand Bxthat go with the cyclic composition

for x have more nodes than the trees T0

x and Bx0 that go with the acyclic

G0 x.

(36)

@@ @@ x Tx: _y y @ @ @ y x T0 x: ⇒ cycle y − y removed

Figure 2.5: Reducing a cyclic composition tree

The previous theorem and corollary describe a way of reducing a cyclic composition tree to an acyclic composition tree by subsequently removing every cycle. As can be seen in figure 2.5 removing a cycle reduces the size of the composition tree. Therefore we have the following corollary.

Corollary 2.4.2.3 The minimal composition (with respect to the number of operations) for any item is acyclic.

Lemma 2.4.2.2 gives an upper bound on the size of an acyclic compo-sition. We will now strengthen that upper bound. Since we only want to show that the number of operations in an acyclic composition is linear in the length of the input string, we don’t need a very tight upper bound, however.

We will write ‘x ∈ Dij’ iff x ∈ D and x is of the form (i, j, A → α qβ qγ).

We will now derive an upper bound aij of the number of operations in

acyclic compositions for items in Dij.

Definition 2.4.2.4 N = |{A → α qβ qγA → αβγ ∈ P }| Without proof we give the following lemma.

Lemma 2.4.2.3 Let m = max{0} ∪ {|α|A → α ∈ P } and p = |P |. Then N ≤1

2p(m + 1)(m + 2).

Since we don’t need a tight upper bound, we will not use an actual acyclic composition. We will assume that every step is a worst case step. This may lead to a ‘case’ that is worse than the actual worst case. We will assume that every internal node of a composition tree is the result of a concatenation. Any path in an acyclic composition tree that travels entirely through, say, Dij is strictly bounded in length by N . Since the case where

N ≤ 1 is trivial, we will insist N > 1.

When we examine the behavior of those paths more closely (see figure 2.6), we can find the following equations for aij.

(37)

D_jj J JJ^ J_JJ^ BBBN BBBN BBBN BBBN = ZZZ~ • • • • • • • • D_j−1,j ? ? ? ? ) PPP_P_P_q ) PPP_P_P_q D_j−1,j−1 D_jj • D_ij ? ? ? ) PPP_P_P_q ) PPP_P_P_q D_ii D_jj D_kj D_ik ? @@R Figure 2.6: A simplified partial subtree of an acyclic composition tree

• ajj= 2N−1− 1

When i = j, the height of the composition tree is bounded by N − 1. A completely balanced binary tree of height N − 1 has 2N−1_{− 1}

internal nodes, so we choose ajj = 2N−1− 1.

• aj−1,j= (N − 1) max{aj−1,j−1, ajj} + N − 1

When i = j − 1, the son of the node with a label in Dj−1,j has a

label in either Dj−1,j−1or Djj. Furthermore, we know that if there

exists an item in Dj−1,j then there exist a sink in Dj−1,j. Therefore

we choose aj−1,j = (N − 1) max{aj−1,j−1, ajj} + N − 1.

• aij= (N − 1) max{aii, ajj} + N + max{aik+ akj|i < k < j}

The case where i < j − 1 is similar to the previous one except that there does not exist a sink in Dij. Instead of being a sink, the last

item in Dijhas a son in Dikand one Dkj(with i < k < j). Therefore

we choose aij= (N −1) max{aii, ajj}+N +max{aik+akj|i < k < j}.

It is easy to see that aij = ai0_j0 whenever j − i = j0− i0, so we will

simplify the equations by letting Ak = ai,i+k. Without proof we give the

following lemma.

Lemma 2.4.2.4 Assume N > 1. The equations: A0= 2N−1− 1 A1= (N − 1)A0+ N − 1 Am= (N − 1)A0+ N + Mm m > 1 with Mm= max{Ak+ Ak0|0 < k, k0 < m and m = k + k0} have a solution: A0= 2N−1− 1 Am= ((N − 1)2N− 1)m − (N − 1)2N−1+ 1 m > 0

(38)

The quantities aij = Aj−i form upper bounds for the number of

oper-ations in an acyclic composition of an item in Dij. Therefore we have the

following corollary.

Corollary 2.4.2.4 The number of operations needed for a minimal com-position of an item in Dij is linear in j − i.

2.4.3 Filters

We have already stated that D is the vertex set of the union of every possible composition graph. However, we are only interested in certain composition graphs. For the slow parallel algorithm we are only interested in those composition graphs which ensure that the entire input string can be derived.

We can regard D as the biggest possible partial parse relation D>. In

general D> will be too large to be practical during computations. On the

other hand if we don’t have all the items mentioned in the composition graphs that ensure that the entire string can be derived, we aren’t able to retrieve every parse. So the relation that contains all those items, D⊥, can

be regarded as the smallest possible partial parse relation. The following result is obvious.

Lemma 2.4.3.1 Let D> = D and let D⊥ consist of those items that occur

in composition for root items. The set D = {D0|D⊥ ⊆ D0 ⊆ D>} with

∨ = ∪ and ∧ = ∩ is a lattice.

Any member D of D can be regarded as a filtered version of D. Often a filter can be very simple so that, instead of computing D, we can compute D directly. Using a filter not only reduces the partial parse relation but also the computation time and often more so. When |D>| |D⊥| filtering

is most likely a profitable thing to do. Unfortunately, there exist grammars and strings for which D>= D⊥. For these grammars and strings filtering

is useless.

Although several filtering techniques exist (e.g. [SN91, Hur92]), we will use the context of the substring under investigation as a guide to filter out ‘bad items.’ We will use a left context of length ` (i.e. a lookback) and a right context of length r (i.e. a lookahead). We will define the context relation CD

`,r and a new partial parse relation D`,r.

Definition 2.4.3.1 We define the following functions (with [, ] 6∈ V ): • F irst`,r(α) = {w ∈ (Σ ∪ {[, ]})r|[`S]r ⇒∗ βαγ ⇒∗αγ βwδ for some

(39)

• Last`,r(α) = {w ∈ (Σ ∪ {[, ]})`|[`S]r ⇒∗ βαγ ⇒∗βα δwγ for some

α, β, γ, and δ ∈ (V ∪ {[, ]})∗_}

• P recede`,r(α) = {w ∈ (Σ ∪ {[, ]})`|[`S]r⇒∗βαγ ⇒∗βδwαγ for some

α, β, γ, and δ ∈ (V ∪ {[, ]})∗_}

• F ollow`,r(α) = {w ∈ (Σ ∪ {[, ]})r|[`S]r ⇒∗ βαγ ⇒∗γ βαwδ for some

α, β, γ, and δ ∈ (V ∪ {[, ]})∗_}

Let N ame stand for F irst or Last:

N ame`,r(X) =

[

α∈X

N ame`,r(α)

We will assume that a1−`. . . a0= [`and an+1. . . an+r=]r.

Definition 2.4.3.2 We define the context relation CD

`,r ⊆ {0, . . . , n}2×

{A → α qβ qγ|A → αβγ ∈ P } as follows: • CD

`,r= {(i, j, A → α qβ qγ)|[`S]r⇒∗δ1Aδ2⇒Aδ1αβγδ2⇒∗δ1αδ3ai−`+1

. . . aiβγδ2⇒∗γδ2 δ3ai−`+1. . . aiβaj+1. . . aj+rδ4for some δ1, δ2, δ3, and

δ4∈ (V ∪ {[, ]})∗}

It can easily be verified that (i, j, A → α qβ qγ) ∈ CD

`,r iff ai−`+1. . . ai∈

Last`,r(P recede`,r(A)α) and aj+1. . . aj+r∈ F irst`,r(γF ollow`,r(A)). Note

that for fixed ` and r we can check in O(1) time whether or not x ∈ CD `,r.

Definition 2.4.3.3 The relation D`,r ⊆ {0, . . . , n}2× {A → α qβ qγ|A →

αβγ ∈ P } is defined as follows:

• If A → λ ∈ P and (j, j, A → q q) ∈ CD

`,r then (j, j, A → q q) ∈ D`,r for

any j ∈ {0, . . . , n}.

• If A → αajγ ∈ P and (j −1, j, A → α qajqγ) ∈ C`,rD then (j −1, j, A →

α qaj qγ) ∈ D`,r for any j ∈ {1, . . . , n}.

• If A → αBγ ∈ P and (i, j, A → α qB qγ) ∈ CD

`,r and (i, j, B → qβ q) ∈

D_`,r _{then (i, j, A → α qB qγ) ∈ D}_`,r_.

• If (i, k, A → α qβ1qβ2γ) ∈ D`,rand (k, j, A → αβ1qβ2qγ) ∈ D`,rthen

(i, j, A → α qβ1β2qγ) ∈ D`,r.

• Nothing is in D`,r except those elements which must be in D`,r by

(40)

It can be proved that D`,r = {(i, j, A → α qβ qγ)|[`S]r ⇒∗ δ1Aδ2 ⇒A

δ1αβγδ2⇒∗_δ₁_αδ3ai−`+1. . . aiβγδ2⇒∗_γδ₂ δ3ai−`+1. . . aiβaj+1. . . aj+rδ4⇒∗_β

δ3ai−`+1. . . aj+rδ4 for some δ1, δ2, δ3, and δ4 ∈ (V ∪ {[, ]})∗ and if β = λ

then αγ = λ}.

Now we will prove that DC= {D`,r|`, r ≥ 0} is a sub lower semilattice

of D.

Theorem 2.4.3.1 Let DC = {D`,r|`, r ≥ 0} with ∧ = ∩. Then DC is a

sub lower semilattice of D.

Proof:

Clearly we have DC ⊆ D and the ∧ operation is the same as the

∧ operation of D for the elements in DC. Thus we only have to show that

D_`,r_{∧ D}_`0_,r0 ∈ D_C for any D_`,r, D_`0_,r0 ∈ D_C. We show D_`,r∧ D_`0_,r0 =

D_max{`,`0_},max{r,r0_}. Assume (i, j, A → α qβ qγ) ∈ D_`,r∧ D_`0_,r0. Then by

definition:

• If β = λ then αγ = λ.

• For some δ1, δ2, δ3, and δ4∈ (V ∪ {[, ]})∗ we have:

[`_S]r _⇒∗ _δ 1Aδ2 ⇒A δ1αβγδ2 ⇒∗ δ1α δ3ai−`+1. . . aiβγδ2 ⇒∗ γδ2 δ3ai−`+1. . . aiβaj+1. . . aj+rδ4 ⇒∗ β δ3ai−`+1. . . aj+rδ4

• For some δ1, δ2, δ3, and δ4∈ (V ∪ {[, ]})∗ we have:

[`0 S]r0 ⇒∗ _δ 1Aδ2 ⇒A δ1αβγδ2 ⇒∗ δ1α δ3ai−`0+1. . . aiβγδ2 ⇒∗ γδ2 δ3ai−`0+1. . . aiβaj+1. . . aj+r0δ4 ⇒∗ β δ3ai−`0+1. . . aj+r0δ4 Clearly then: [max{`,`0_} S]max{r,r0_} ⇒∗ _δ 1Aδ2 ⇒A δ1αβγδ2 ⇒∗ δ1α δ3ai−max{`,`0}+1. . . aiβγδ2 ⇒∗

γδ2 δ3ai−max{`,`0}+1. . . aiβaj+1. . . aj+max{r,r0}δ4

⇒∗

(41)

for some δ1, δ2, δ3, and δ4 ∈ (V ∪ {[, ]})∗. Thus (i, j, A → α qβ qγ) ∈

D_max{`,`0_},max{r,r0_}. Since the proof of this direction can be reversed

with-out any problems, so we have proved equality of the two sets D`,r∧ D`0_,r0

and Dmax{`,`0_},max{r,r0_}. Thus D_C is indeed a lower semilattice. 2

Practical evidence with a moderate sized natural language grammar (see [HV91, RV92, Vre92b]) tends to show that even for ` = r = 1 the amount of space is reduced considerably by roughly a factor 5 and the time spent is reduced even more dramatically by roughly a factor 20. The reduced size of D1,1 speeds up loops over the various items in D_1,1ij.

2.4.4 Index graphs

Finally, we will come to a part that is of interest in the case of fast parallel parsing. For the fast parallel algorithm we will only be interested in finding one composition graph which ensures us that the entire input string can be derived.

Definition 2.4.4.1 A graph GD= (V, E) is called an index graph iff

• V = D

• There exists a brother edge x → z in E for every x → y ∈ E. • There exists a composition graph Gx such that Gx is a subgraph of

GD for every x ∈ D.

The graph GD can be regarded as an index for composition graphs of

any item present in D.

Definition 2.4.4.2 An index graph GD is maximal iff for every graph G

such that GD is a proper subgraph of G, we have that G is not an index

graph.

Theorem 2.4.4.1 Let GD be a maximal index graph. Every Gx is a

sub-graph of GD for every x ∈ D.

Proof:

Suppose that for x ∈ D, we have that Gxis not a subgraph of GD.

But then the graph G obtained by unifying GD with Gx, is an index graph

which has GD as a proper subgraph. But this contradicts the fact that GD

is maximal. Thus there cannot be a Gx which is not a subgraph of GD. 2

Definition 2.4.4.3 An index graph GD is minimal iff for every graph G

(42)

Theorem 2.4.4.2 Let GD be a minimal index graph. Exactly one Gx is a

subgraph of GD for every x ∈ D.

Proof:

It is evident from the definition of an index graph that there must exist at least one Gx for every x ∈ D with the desired property. Suppose

that for x ∈ D, Gx and G0xare two distinct composition graphs which are

subgraphs of GD. We may assume without loss of generality that x is the

result of two different concatenation operations in Gx and G0x. Suppose

x → y and x → z are in Gx and x → y0 and x → z0 are in G0x. There are

two possibilities:

• x → y = x → y0 _{or x → z = x → z}0

Assume without loss of generality x → y = x → y0 _{(and thus z 6= z}0_).

Consider GD without x → z0. The resulting graph is still an index

graph. So we have derived a contradiction. • x → y 6= x → y0 _{and x → z 6= x → z}0

Consider GD without x → y0 and x → z0. The resulting graph is still

an index graph. So again we have derived a contradiction.

Henceforth, there can be at most one Gxfor each x ∈ D with the desired

property. 2

After having computed the partial parse relation D, we can quite easily compute GD. Its computation is based on the following lemma, which we

will state without proof.

Lemma 2.4.4.1 Let the graph G = (V, E) be defined as follows: • V = D

• If x ∈ D is the result of an inclusion operation of y ∈ D then x → y ∈ E.

• If x ∈ D is the result of a concatenation operation of y ∈ D and of z ∈ D then x → y ∈ E and x → z ∈ E.

The graph G is a maximal index graph.

However, for the fast parallel algorithm we are interested in minimal index graphs. For any x ∈ D we know that Gx is formed exactly by the

nodes (and edges) accessible from x in GD. Although for a given D we

have that the maximal index graph is unique, this does not have to be the case for a minimal index graph. Note that the set of possible index graphs for D with ∨ being the graph union operation is an upper semilattice.

(43)

2.4.5 The Recognizer

Now we are ready to present and prove the correctness of the double dotted recognizer. Double-Dotted-Recognizer: D:= ∅ fori:= 0 to n do forj:= 0 to n − i in parallel do casei = 0 : D:=D_{∪ {(j, j, A →} q q)|A → λ ∈ P } = 1 : D:=D_{∪ {(j − 1, j, A → α}qajqγ)|A → αajγ∈ P } >1 : fork:= 1 to i − 1 in parallel do D:=D_{∪ {(j, j + i, A → α}q_β₁_β₂qγ)| (j, j + k, A → αqβ1qβ2γ) ∈Dand (j + k, j + i, A → αβ1qβ2qγ) ∈D}

whileD_{still changes do}

D:=D_{∪ {(j, j + i, A → α}q_β₁_β₂qγ)| (j, j, A → αqβ1qβ2γ) ∈Dand (j, j + i, A → αβ1qβ2qγ) ∈D} D_:=D_{∪ {(j, j + i, A → α}q_β₁_β₂q_γ)| (j, j + i, A → αqβ1qβ2γ) ∈Dand (j + i, j + i, A → αβ1qβ2qγ) ∈D} D_:=D_{∪ {(j, j + i, A → α}q_Bq_γ)| A→ αBγ ∈ P and (j, j + i, B → qβq∈D_}

returnwhether or not (0, n, S → qαq) ∈Dfor some S → α ∈ P

Before we give some examples and prove that the algorithm computes D correctly, we first comment on the algorithm (roughly the same comments also apply to the CYK and Earley’s algorithm as well).

The customary way of coding D is by means of an upper triangular matrix D: (i, j, A → α qβ qγ) ∈ D iff A → α qβ qγ ∈ Dij.

In the algorithm statements like D := D ∪ {(i, j, . . .)| . . .} often occur. In the light that D can be regarded as a matrix, this means that only the cell Dij of D is and needs to be affected.

Note that in the loop over k and in the while loop read conflicts can occur, whereas write conflicts can only occur in the loop over k. It is also possible that an operation is performed on the contents of a certain cell in the matrix, while the result is added to that same cell (the three operations in the while loop). If we want to do so, we may assume that immediately before each operations we extract a fresh copy of the cell that is used, and

(44)

use the copy in the right-hand side of the assignment. Since the while loop only terminates when D doesn’t change any more, this will result in the same D as the one we would have obtained otherwise.

Finally, an item (i, j, A → α qβ1β2qγ) obtained by means of a

concate-nation, will be inserted d(|β1β2| − 1) times where d is de degree of local

ambiguity (d = 1 if there is no ambiguity). When we insist that only items with |β1| = 1 may be entered in D, then those items are only entered d

times. The correctness of this is a corollary of the following lemma.

Lemma 2.4.5.1 Let (i, j, A → α qβ qγ) ∈ D. If |β| > 1 then there exists a k ∈ {i, . . . , j} such that:

• β = β1β2

• |β1| = 1 and |β2| ≥ 1

• (i, k, A → α qβ1qβ2γ) ∈ D

• (k, j, A → αβ1qβ2qγ) ∈ D

Proof:

Assume (i, j, A → α qβ qγ) ∈ D where |β| > 1. The tuple can only be inserted in D by means of a concatenation operation. We prove the lemma with induction on the length of β.

• First the base case. Assume |β| = 2. There is only one way to split β and that decomposition gives the right k to satisfy the lemma. • Now the inductive case. Assume that the induction hypothesis holds

for all strings smaller than β and that |β| > 2. We may assume that there exist an ` ∈ {i, . . . , j} such that:

◦ β = δ1δ2

◦ |δ1|, |δ2| ≥ 1

◦ (i, `, A → α qδ1qδ2γ) ∈ D

◦ (`, j, A → αδ1qδ2qγ) ∈ D

For |δ1| = 1 there is nothing more to prove, so assume |δ1| > 1.

Since the induction hypothesis holds for |δ1| < |β|, there must be a

k ∈ {i, . . . , `} ⊆ {i, . . . , j} such that: ◦ δ1= β1β2

◦ |β1| = 1 and |β2| ≥ 1

(45)

◦ (k, j, A → αβ1qβ2qδ2γ) ∈ D

Hence we can assert that (k, j, A → αβ1qβ2δ2qγ) ∈ D with the

con-catenation operation.

Therefore we can find a k such that this k satisfies the lemma. 2

Example 2.4.5.1 We will show the computed matrices for four grammars and input strings in the four tables 2.1, 2.2, 2.3, and 2.4.

A→ qaqB A→ qaBq B→ qAqcc A→ qaqB A→ qaBq B→ qAcqc B→ qAccq B→ qAqcc A→ aqBq B→ qbq A→ aqBq B→ Aqcqc B→ Aqccq B→ Acqcq B→ Aqcqc B→ Acqcq

Table 2.1: Recognition matrix for input string aabcc and grammar ({A, B, a, b, c}, {a, b, c}, {A → ab, B → Acc, B → b}, A)

Before proving the correctness of the algorithm, we will analyze the complexity of the algorithm. We will assume throughout that the algorithm computes matrix D. In the proofs D refers to the relation of definition 2.4.1.1.

Theorem 2.4.5.1 The algorithm runs with S(n) = O(n2_{) space and with}

costs p(n) × T (n) = Θ(n2_{) × Θ(n) on a CRCW PRAM.}

Proof:

Since {A → α qβ qγ|A → αβγ ∈ P } is a finite set independent of n, we clearly have |D| = O(n2_{). The rest of the space is consumed by}

the various loop counters, thus S(n) = O(n2_{). The cost of the algorithm,}

p(n) × T (n), is the maximum of Θ(n3_{) and Θ(n}2_{) multiplied by the time}

spent in the while loop. But the latter is Θ(1) since for a cell Dij can

only assume Θ(1) different values. Since no items are retracted from D, the while loop must terminate within Θ(1) iterations. So the total costs are Θ(n3_{). The amount of processors needed to run the algorithm in parallel}

Parallel parsing