• Nie Znaleziono Wyników

Parsing free word order languages in Prolog

N/A
N/A
Protected

Academic year: 2021

Share "Parsing free word order languages in Prolog"

Copied!
4
0
0

Pełen tekst

(1)

P A R S I N G F R E E W O R D O R D E R L A N G U A G E S IN P R O L O G

]anusz S t a n i s t a w Biefi + Krystyna Laus- M~czyrlska ++

S tanislaw S z p a k o w i c z +

• + I n s t i t u t e of I n f o r m a t i c s , W a r s a w U n i v e r s i t y W a r s a w , P o l a n d

+ + I n s t i t u t e f o r S c i e n t i f i c T e c h n i c a l a n d E c o n o m i c I n f o r m a t i o n , W a r s a w , P o l a n d

T h e P r o l o g p r o g r a m m i n g l a n g u a g e a l l o w s t h e u s e r to w r i t e p o w e r f u l p a r s e r s in t h e f o r m of m e t a m o r p h o s i s g r a m m a r s . H o w e v e r , t h e m e t a m o r p h o s i s g r a m m a r s , a s d e f i n e d b y C o l m e r a u e r 2 , h a v e to s p e c i f y s t r i c t l y t h e o r d e r of t e r m i n a l a n d n o n t e r m i n a l s y m b o l s . A m o d i f i c a t i o n of P r o l o g h a s b e e n i m p l e m e n - t e d w h i c h a l l o w s " f l o a t i n g t e r m i n a l s " to b e included in a metamorphosis g r a m m a r toge- ther with s o m e information enabling to cori- trol the search for such a terminal in the u n p r o c e s s e d part of the input. T h e modifica- tion is illustrated by several examples from the Polish language and s o m e open questions are discussed.

M e t a m o r p h o s i s g r a m m a r s 2' 3 m a k e a con- venient tool of the formal description of syn- tax of natural languages. Their convenience is due to their straightforward relation to the p r o g r a m m i n g language Prolog. A meta- morphosis g r a m m a r is an ordinary part of a Prolog p r o g r a m . It defines a language as well as a parser for it.

W e suggest here such modifications of the w a y of handling the m e t a m o r p h o s i s gram- m a r s in Prolog w h i c h allow these g r a m m a r s to analyse constructions without strictly specified order of their components.

Let us consider an example. T h e follo- wing sentence in Polish :

( I ) P R A C O W A ( J B Y L O B A R D Z O P R Z Y J E M N I E 'to work' ' i t was' " v e r y \ 'nice ~

"It w a s very nice to w o r k . "

is accepted b y the metamorphosis g r a m m a r given below (nonterminals prefixed by % , terminals by ~, == stands for an a r r o w ) :

%S == %INF %V %ADVP.

%INF == g P R A C O W A C .

%V == ~ B Y L O .

%ADVP = = ~ B AR DZ O ~ P R ZYJE M N I E .

In o r d e r to s i m p l i f y t h e e x a m p l e w e n e - g l e c t t h e g r a m m a t i c a l c a t e g o r i e s of p h r a s e s

and w o r d s . T h e last three rules serve as "dictionary rules".

This g r a m m a r does not, however, account for m a n y correct Polish sentences, such as :

(2) BARDZO PRZYJEM~E BYLO PRACOWAd

( 3 ) B Y L O B A R D Z O P R Z Y J E M N I E PRACOWAd

T o m a k e t h e g r a m m a r a c c e p t t h e s e s e n t e n c e s we s h o u l d , f o r e x a m p l e , a d d two rules :

%S == % A D V P % V N N F .

%S == %V %ADVP %INF.

One-third of the possible permutations of w o r d s B Y L O , B A R D Z O , P R A C O W A C , P R Z Y ] E M N I E constittzte admissible Polish sentences (although sometimes stylistically m a r k e d ) . T h e complete g r a m m a r should then have 21 rules, including dictionary rules. S u c h a solution is obviously clumsy a n d n o t s a t i s f a c t o r y .

O u r f i r s t p r o p o s a l c o n s i s t s in a l l o - w i n g t w o kinds of terminal s y m b o l s : a n c h o - red terminals, retrieved in the current position of a given sentence (available in metamorphosis g r a m m a r s 2 and prefixed By in our example) and floating terminals, retrieved a n y w h e r e in the u n p r o c e s s e d p a r t of a sentence (we shall prefix them by ~ ).

T h e easiest and most concise w a y of expressing a g r a m m a r for the sentences mentioned above consists in replacing eve- ry anchored terminal by a floating termi- nal. It is, however, not satisfactory beca- use such a g r a m m a r accepts also deviant

(syntactically or stylistically) s e q u e n c e s , e.g.

( g ) B Y L O B A R D Z O P R A C O W A C P R Z Y J E M N I E ( 5 ) P R Z Y J E M N I E P R A C O W A C B A R D Z O B Y L O B y u s i n g b o t h t h e a n c h o r e d t e r m i n a l s a n d the floating terminals w e can define the following g r a m m a r :

--346--

(2)

% S == %IN.F %V % A D V P .

%INF == @ P R A C O W A C .

%V == @ B Y L O .

% A D V P == ~ B A R D Z O @ P R Z Y J E M N I E .

T h e g r a m m a r a c c e p t s o n l y h a l f of t h e i n c o r r e c t s e q u e n c e s , b u t ( a u s u a l t r a d e - o f f ) it r e j e c t s s o m e c o r r e c t P o l i s h s e n t e n c e s .

I t s e e m s t h a t o n l y a g r a m m a r w i t h numerous specific rules can satisfy the strong requirement of accepting those and only those sequences which are considered correct and no others.

The formalism is, however, quite ap- propriate to describe e.g. the syntax of s o m e noun phrases in Polish or syntacti- cally unbound modifiers.

Introducing the floating terminals into the Marseille-originated Prolog inter- preter requires only minor alterations of the bootstrap. T h e facility has b e e n alre- ady m a d e standard in the Prolog version for O D R A I $ 0 5 (ICL ~900 compatible)which is distributed in Poland.

T o illustrate deficiencies of the pro- posed mechanism in parsing certain kinds of free word-order constructions w e shall consider the following Polish sentences:

(6) T R Z E B A B Y C Z E G O g WII~C EJ 'is needed" "something" "more"

[present, [condi- [genitive]

impersonal] tional formative]

(7) CZEGOS BY WlRCE] TRZEBA

"Something m o r e w o u l d be needed."

T h e sentences (6),(7) consist of the

~mpersonal conditional verb-like phrase T R Z E B A B Y and the noun phrase C Z E G O ~ WII~CEJ. The words C Z E G O S and WII~CE]

m a y occupy any position, but the order of T R Z E B A and B Y is restricted. If B Y precedes T R Z E B A then B Y must not be the first w o r d of a sentence, otherwise, B Y must be adjacent to T R Z E B A .

Therefore in order to m a k e a conci- se g r a m m a r accepting all correct Polish sentences built of the w o r d s T R Z E B A , BY, WII~CEJ, C Z E G O ~ , w e must introduce a m o r e s e l e c t i v e i n f o r m a t i o n c o n c e r n i n g t h e o r d e r of w o r d s . We s u p p l y s e l e c t e d t e r m i - n a l s and nonterminals with-control items restricting their scopes of floating. The lack of such an item m e a n s the restric-

lions inheriled from the left-hand nonter- minal (in particular no restrictions).

.For example, such restrictions could be:

a terminal should be the last (the firsl), a terminal must follow (immediately fol-

low) the recently retrieved terminal.

Coming back to our example w e should specify:

either B Y follows a verb immediately, or B Y must not be the first and must

precede a verb.

W e can n o w write the g r a m m a r accep- ting the sentences (6),(?). T h e g r a m m a r is as follows (variable parameters prefixed by asterisks, control items separated by c o m m a s ).

% S ( ~ T E N S E , ~ M O O D ) ==

% V P I M P E R S ( ~ T E N S E , ~ M O O D ).

% V P I M P E R S ( ~ T E N S E , ~ M O O D ) = =

% V I M P E R S ( ~ T E N S E , ~ M O O D , ~ S Y N T R E Q )

% R E Q ( ~ S Y N T R E Q ) .

% V I M P E R S ( , ~ T E N S E , C O N D , ~ S Y N T R E Q ) ==

% V E R B ( I M P E R S , ~ T E N S E , ~ S Y N T R E Q )

@ B Y , N E X T .

% V I M P E R S ( ~ T E N S E , C O N D , ~ S Y N T R E Q ) ==

@ B Y , N O T . F I R S T

% V E R B ( I M P E R S , ~ T E N S E , ~ S Y N T R E Q ) , A F T E R .

~ V E R B (IMPERS, P R E S E N T , N P (GEN))

= =

~ T R Z E B A .

% R E Q ( N P ( ~ C A S E ) ) == % ' N P ( ~ C A S E ).

% N P ( , C A S E ) == % N P R O N ( , C A S E ) %IviOD.

%¢~PRON(GEN) == @ C Z E G O S .

% M O D == e W I E C E J .

In order to m a k e the example clear w e use only the categories relevant for the sentences under discussion. W e omit,

for instance, the n u m b e r and gender of a noun phrase ; the parameter ~ S Y N T R E Q

expresses a single syntactic requirement (in general a verb can have m o r e then one requirement ; for details, see

Szpakowicz 5 ). The rule for N P is also very simplified. .From the point of view of the description of Polish syntax the grmn- m a r presented above is, in fact, unsophi- sticated and fragmentary. It is sufficient, however, to illustrate some linguistic phe- n o m e n a mentioned earlier.

A n experimental version of the O D R A - Prolog accepts the metamorphosis g r a m m a r

--347--

(3)

rules with control items (syntactically just Prolog terms). T h e inventory of the w o r d order restrictions has yet to Be established by the research on w o r d order in Polish.

Thus, for the time Being, the interpreta- tion of the control items is implemented in an ad hoc manner.

A formal description of the syntax of a natural language of free w o r d - o r d e r type, as for example Polish and other Slavonic languages, requires, however, s o m e addi- tional technical and linguistic problems to Be solved.

W e want to present n o w those pro- blems which w e find to Be the most impor- tant.

In s o m e cases the occurence of a w o r d - f o r m depends on particular proper- ties of the w o r d w h i c h immediately prece- des it (usually it is the phonetic shape of the preceding w o r d which influences the choice of the proper w o r d - f o r m ). F o r example, agglutin,ative present tense form of the verb B Y C in second person, singu- lar, masculine can Be realized either by

or By E ~ . T h e forms ~, E L are written jointly with the preceding syntactic item But on the level of syntactic descrip- tion they are clearly distinguishable.

Let us illustrate this p r o b l e m by the following sentences :

( 8 ) N A R O B I L + E~ L A D N E G O

"to cause" "cute"

h e r e : 'big"

[ s g , m a s c ] [2p, sg, [ s g , masc, [ s g , m a s c ,

masc] gen ] gen ]

(9) LADNEGO + S KLOPOTU NAROBIL

"You've caused quite a lot of trouble."

K L O P O T U /trouble"

The v e r y simple grammar p r e s e n t e d below a c c e p t s t h e s e t w o s e n t e n c e s but it a c c e p t s also some i n c o r r e c t s e q u e n c e s b e c a u s e the r u l e s do not e x p r e s s the d e p e n d e n c y phenomena mentioned above.

%S == %PP(,GENDER , ~NUMBER , N P ( G E N ) )

%VPT(,~GENDER , ~NUMBER , ~PERSON, ~X)

%NP(~NUMBER2, ~ G E N D E R 2 , G E N ) .

% V P T ( M A S C , S I N G , 2P, V O W ) == @ S .

% V P T ( M A S C , S I N G , 2P, C O N ) == @ E S .

%PP(MASC, SING, NP(GEN ) ) == e N A R O B I L .

i

%NP(SING, MASC, GEN) ==

eLADNEGO @KLOPOTU, A F T E R .

( V P T - the a b b r e v i a t e d p r e s e n t tense form

I

of the verb B Y C ; V O W and C O N m e a n

"used after a vowel" and "used after a c o n s o n a n t " ) .

S o far w e do not see the simple and satisfactory w a y of relating the parameter

• X of % V P T to the other w o r d s and phrases. Provisionally the agreement of the agglutinative forms of the verb B Y E with the corresponding w o r d s m a y Be r e - solved during dictionary lookup in the pre-parsing phase.

T h e other purely linguistic problems are related to influence of the free w o r d - order on accomodating the verb phrase to the gender of a c o m p o u n d noun phrase.

F o r example, the verb phrases in the apo- sition agree in gender with the last consti- tuent of the n o u n phrase, as in:

(i0) JAN m : B M A R I A P R Z Y S Z L i 'John' "or" "Mary ~ "came"

If era]

Similarly, the gender of the verb phrase in the postposition m a y agree with the first constituent of the noun p h r a s e , for example :

(II) P R Z Y S Z E D L JAN LUB MARIA

• C a l n e ~

[ m a s c ]

It is only r e c e n t l y that this difficult problem has been a s u b j e c t of a p a r t i a l r e s e a r c h . The formal syntax d e s c r i p t i o n of written sentences in Polish with neutra]

w o r d - o r d e r is availableS, 6. It accepts practically all nonelliptical declarative and negative sentences, as well as the majority of interrogative sentences, nevertheless, w e can propose only a provisional solution of this problem.

Another complicated question consists in the discontinuity of the phrases which constitute the sentence, as for example interpenetration of the verb phrase and the n o u n phrase :

(12) NOW& K SlAZKE; INPI

DAL JAN MARII IVPI

'new" 'gave" 'Joh~ /book" "Mary"

[ a c c ] ~.nom] [acc] [ d a t ]

"It is a n e w Book t h a t John gave t o Mary",

348

(4)

Therefore the contro] information should allow the search of missing consti-

tuents of the phrases even far off the main component. O n the other hand it should protect against "borrowing" an inap- propriate constituent from a quite different phrase, e.g. from the subordinate clause.

It is n o w clearly visible that parsing free word-order languages is really dif- ferent from the syntactic analysis of, say, English. Although the presented modifica- tions of metamorphosis g r a m m a r s do not solve all the problems discussed above, they provide a useful instrument for furt- her experimental studies.

Finally w e want to emphasize that w e w e r e aware of the semantic and pragmatic functions of free word-order, which are studied e.g. by Sgal! 4 and S z w e d e k 7. But w e believe that, from the methodological point of view, it is justified to prescind from them in the syntax description.

A reader interested in some notions of the impact of word-order on semantico- pragmatic level, m a y wish to consult Biell I .

References

Ill Biefi J.S. Multiple Environments M o d e l of Natural Language [in Polish, unpu- blished Ph. D.thesis ], 1977.

[2] Colmerauer A. Metamorphosis Grammars.

In Bolc L.(ed) Natural Language C o m - munication with Computers, Lecture No- tes in Computer Science 63, 1978.

[3] Pereira F., W a r r e n D . H . D . Definite Clause G r a m m a r s C o m p a r e d with Aug- mented Transition Networks. Dept.of A1 Report 58, University of Edlnburg, 1978.

[4] Sgall P.,Haj1cova E . , B e n e s o v a E. Topic, Focus and Generative Semantics.

Kronberg Taunus: Scriptor Verlag G m b H , 1973.

[5] Szpakowicz S. Automatic Syntactic Ana- lysis of Polish Written Utterances [ in Polish, unpublished Ph.D. thesis], 1978.

[6] Szpakowicz S. Syntactic Analysis of Written Polish. In Bolc L . ( e d ) N a t u r a l Language Communication with Computers, Lecture Notes in C o m p u t e r Science 63,

1978.

[7] S z w e d e k A. W o r d Order, Sentence Stress and Reference in English and Polish.

Edmonton: Linguistic Research Inc. ,1976.

349

Cytaty

Powiązane dokumenty

(i) Copy the tree diagram and add the four missing probability values on the branches that refer to playing with a stick.. During a trip to the park, one of the dogs is chosen

(b) Find the probability that a randomly selected student from this class is studying both Biology and

The example demonstrates how certain relations between sentence components allow to disambiguate t h e morphological properties of individual words without resorting

Put the sentences in the correct order... Put the sentences in the

- Underline all verbs in the following lines (sentence 1: read) and write these words into the column Verbs. - Look for the subject of the sentence (sentence 1: I) write it into

Trzeba koniecznie brać pod uwagę analogiczny charakter tej wypowiedzi, a bardzo powściągliwie form ułow ane quasi-wcielenie D ucha Świętego w Kościół uzmysła­ wia, że

3. Free zero-dimensional topological groups. Here is probably the most natural example of a topological group free in a class that fails to form a variety in a most spectacular

As Lyle Jenkins suggests, the unmarked word order asymmetry is expressible as a group-theoretical factor (included in Chomsky’s third factor): “word order types would be