P A R S I N G F R E E W O R D O R D E R L A N G U A G E S IN P R O L O G
]anusz S t a n i s t a w Biefi + Krystyna Laus- M~czyrlska ++
S tanislaw S z p a k o w i c z +
• + I n s t i t u t e of I n f o r m a t i c s , W a r s a w U n i v e r s i t y W a r s a w , P o l a n d
+ + I n s t i t u t e f o r S c i e n t i f i c T e c h n i c a l a n d E c o n o m i c I n f o r m a t i o n , W a r s a w , P o l a n d
T h e P r o l o g p r o g r a m m i n g l a n g u a g e a l l o w s t h e u s e r to w r i t e p o w e r f u l p a r s e r s in t h e f o r m of m e t a m o r p h o s i s g r a m m a r s . H o w e v e r , t h e m e t a m o r p h o s i s g r a m m a r s , a s d e f i n e d b y C o l m e r a u e r 2 , h a v e to s p e c i f y s t r i c t l y t h e o r d e r of t e r m i n a l a n d n o n t e r m i n a l s y m b o l s . A m o d i f i c a t i o n of P r o l o g h a s b e e n i m p l e m e n - t e d w h i c h a l l o w s " f l o a t i n g t e r m i n a l s " to b e included in a metamorphosis g r a m m a r toge- ther with s o m e information enabling to cori- trol the search for such a terminal in the u n p r o c e s s e d part of the input. T h e modifica- tion is illustrated by several examples from the Polish language and s o m e open questions are discussed.
M e t a m o r p h o s i s g r a m m a r s 2' 3 m a k e a con- venient tool of the formal description of syn- tax of natural languages. Their convenience is due to their straightforward relation to the p r o g r a m m i n g language Prolog. A meta- morphosis g r a m m a r is an ordinary part of a Prolog p r o g r a m . It defines a language as well as a parser for it.
W e suggest here such modifications of the w a y of handling the m e t a m o r p h o s i s gram- m a r s in Prolog w h i c h allow these g r a m m a r s to analyse constructions without strictly specified order of their components.
Let us consider an example. T h e follo- wing sentence in Polish :
( I ) P R A C O W A ( J B Y L O B A R D Z O P R Z Y J E M N I E 'to work' ' i t was' " v e r y \ 'nice ~
"It w a s very nice to w o r k . "
is accepted b y the metamorphosis g r a m m a r given below (nonterminals prefixed by % , terminals by ~, == stands for an a r r o w ) :
%S == %INF %V %ADVP.
%INF == g P R A C O W A C .
%V == ~ B Y L O .
%ADVP = = ~ B AR DZ O ~ P R ZYJE M N I E .
In o r d e r to s i m p l i f y t h e e x a m p l e w e n e - g l e c t t h e g r a m m a t i c a l c a t e g o r i e s of p h r a s e s
and w o r d s . T h e last three rules serve as "dictionary rules".
This g r a m m a r does not, however, account for m a n y correct Polish sentences, such as :
(2) BARDZO PRZYJEM~E BYLO PRACOWAd
( 3 ) B Y L O B A R D Z O P R Z Y J E M N I E PRACOWAd
T o m a k e t h e g r a m m a r a c c e p t t h e s e s e n t e n c e s we s h o u l d , f o r e x a m p l e , a d d two rules :
%S == % A D V P % V N N F .
%S == %V %ADVP %INF.
One-third of the possible permutations of w o r d s B Y L O , B A R D Z O , P R A C O W A C , P R Z Y ] E M N I E constittzte admissible Polish sentences (although sometimes stylistically m a r k e d ) . T h e complete g r a m m a r should then have 21 rules, including dictionary rules. S u c h a solution is obviously clumsy a n d n o t s a t i s f a c t o r y .
O u r f i r s t p r o p o s a l c o n s i s t s in a l l o - w i n g t w o kinds of terminal s y m b o l s : a n c h o - red terminals, retrieved in the current position of a given sentence (available in metamorphosis g r a m m a r s 2 and prefixed By in our example) and floating terminals, retrieved a n y w h e r e in the u n p r o c e s s e d p a r t of a sentence (we shall prefix them by ~ ).
T h e easiest and most concise w a y of expressing a g r a m m a r for the sentences mentioned above consists in replacing eve- ry anchored terminal by a floating termi- nal. It is, however, not satisfactory beca- use such a g r a m m a r accepts also deviant
(syntactically or stylistically) s e q u e n c e s , e.g.
( g ) B Y L O B A R D Z O P R A C O W A C P R Z Y J E M N I E ( 5 ) P R Z Y J E M N I E P R A C O W A C B A R D Z O B Y L O B y u s i n g b o t h t h e a n c h o r e d t e r m i n a l s a n d the floating terminals w e can define the following g r a m m a r :
--346--
% S == %IN.F %V % A D V P .
%INF == @ P R A C O W A C .
%V == @ B Y L O .
% A D V P == ~ B A R D Z O @ P R Z Y J E M N I E .
T h e g r a m m a r a c c e p t s o n l y h a l f of t h e i n c o r r e c t s e q u e n c e s , b u t ( a u s u a l t r a d e - o f f ) it r e j e c t s s o m e c o r r e c t P o l i s h s e n t e n c e s .
I t s e e m s t h a t o n l y a g r a m m a r w i t h numerous specific rules can satisfy the strong requirement of accepting those and only those sequences which are considered correct and no others.
The formalism is, however, quite ap- propriate to describe e.g. the syntax of s o m e noun phrases in Polish or syntacti- cally unbound modifiers.
Introducing the floating terminals into the Marseille-originated Prolog inter- preter requires only minor alterations of the bootstrap. T h e facility has b e e n alre- ady m a d e standard in the Prolog version for O D R A I $ 0 5 (ICL ~900 compatible)which is distributed in Poland.
T o illustrate deficiencies of the pro- posed mechanism in parsing certain kinds of free word-order constructions w e shall consider the following Polish sentences:
(6) T R Z E B A B Y C Z E G O g WII~C EJ 'is needed" "something" "more"
[present, [condi- [genitive]
impersonal] tional formative]
(7) CZEGOS BY WlRCE] TRZEBA
"Something m o r e w o u l d be needed."
T h e sentences (6),(7) consist of the
~mpersonal conditional verb-like phrase T R Z E B A B Y and the noun phrase C Z E G O ~ WII~CEJ. The words C Z E G O S and WII~CE]
m a y occupy any position, but the order of T R Z E B A and B Y is restricted. If B Y precedes T R Z E B A then B Y must not be the first w o r d of a sentence, otherwise, B Y must be adjacent to T R Z E B A .
Therefore in order to m a k e a conci- se g r a m m a r accepting all correct Polish sentences built of the w o r d s T R Z E B A , BY, WII~CEJ, C Z E G O ~ , w e must introduce a m o r e s e l e c t i v e i n f o r m a t i o n c o n c e r n i n g t h e o r d e r of w o r d s . We s u p p l y s e l e c t e d t e r m i - n a l s and nonterminals with-control items restricting their scopes of floating. The lack of such an item m e a n s the restric-
lions inheriled from the left-hand nonter- minal (in particular no restrictions).
.For example, such restrictions could be:
a terminal should be the last (the firsl), a terminal must follow (immediately fol-
low) the recently retrieved terminal.
Coming back to our example w e should specify:
either B Y follows a verb immediately, or B Y must not be the first and must
precede a verb.
W e can n o w write the g r a m m a r accep- ting the sentences (6),(?). T h e g r a m m a r is as follows (variable parameters prefixed by asterisks, control items separated by c o m m a s ).
% S ( ~ T E N S E , ~ M O O D ) ==
% V P I M P E R S ( ~ T E N S E , ~ M O O D ).
% V P I M P E R S ( ~ T E N S E , ~ M O O D ) = =
% V I M P E R S ( ~ T E N S E , ~ M O O D , ~ S Y N T R E Q )
% R E Q ( ~ S Y N T R E Q ) .
% V I M P E R S ( , ~ T E N S E , C O N D , ~ S Y N T R E Q ) ==
% V E R B ( I M P E R S , ~ T E N S E , ~ S Y N T R E Q )
@ B Y , N E X T .
% V I M P E R S ( ~ T E N S E , C O N D , ~ S Y N T R E Q ) ==
@ B Y , N O T . F I R S T
% V E R B ( I M P E R S , ~ T E N S E , ~ S Y N T R E Q ) , A F T E R .
~ V E R B (IMPERS, P R E S E N T , N P (GEN))
= =~ T R Z E B A .
% R E Q ( N P ( ~ C A S E ) ) == % ' N P ( ~ C A S E ).
% N P ( , C A S E ) == % N P R O N ( , C A S E ) %IviOD.
%¢~PRON(GEN) == @ C Z E G O S .
% M O D == e W I E C E J .
In order to m a k e the example clear w e use only the categories relevant for the sentences under discussion. W e omit,
for instance, the n u m b e r and gender of a noun phrase ; the parameter ~ S Y N T R E Q
expresses a single syntactic requirement (in general a verb can have m o r e then one requirement ; for details, see
Szpakowicz 5 ). The rule for N P is also very simplified. .From the point of view of the description of Polish syntax the grmn- m a r presented above is, in fact, unsophi- sticated and fragmentary. It is sufficient, however, to illustrate some linguistic phe- n o m e n a mentioned earlier.
A n experimental version of the O D R A - Prolog accepts the metamorphosis g r a m m a r
--347--
rules with control items (syntactically just Prolog terms). T h e inventory of the w o r d order restrictions has yet to Be established by the research on w o r d order in Polish.
Thus, for the time Being, the interpreta- tion of the control items is implemented in an ad hoc manner.
A formal description of the syntax of a natural language of free w o r d - o r d e r type, as for example Polish and other Slavonic languages, requires, however, s o m e addi- tional technical and linguistic problems to Be solved.
W e want to present n o w those pro- blems which w e find to Be the most impor- tant.
In s o m e cases the occurence of a w o r d - f o r m depends on particular proper- ties of the w o r d w h i c h immediately prece- des it (usually it is the phonetic shape of the preceding w o r d which influences the choice of the proper w o r d - f o r m ). F o r example, agglutin,ative present tense form of the verb B Y C in second person, singu- lar, masculine can Be realized either by
or By E ~ . T h e forms ~, E L are written jointly with the preceding syntactic item But on the level of syntactic descrip- tion they are clearly distinguishable.
Let us illustrate this p r o b l e m by the following sentences :
( 8 ) N A R O B I L + E~ L A D N E G O
"to cause" "cute"
h e r e : 'big"
[ s g , m a s c ] [2p, sg, [ s g , masc, [ s g , m a s c ,
masc] gen ] gen ]
(9) LADNEGO + S KLOPOTU NAROBIL
"You've caused quite a lot of trouble."
K L O P O T U /trouble"
The v e r y simple grammar p r e s e n t e d below a c c e p t s t h e s e t w o s e n t e n c e s but it a c c e p t s also some i n c o r r e c t s e q u e n c e s b e c a u s e the r u l e s do not e x p r e s s the d e p e n d e n c y phenomena mentioned above.
%S == %PP(,GENDER , ~NUMBER , N P ( G E N ) )
%VPT(,~GENDER , ~NUMBER , ~PERSON, ~X)
%NP(~NUMBER2, ~ G E N D E R 2 , G E N ) .
% V P T ( M A S C , S I N G , 2P, V O W ) == @ S .
% V P T ( M A S C , S I N G , 2P, C O N ) == @ E S .
%PP(MASC, SING, NP(GEN ) ) == e N A R O B I L .
i
%NP(SING, MASC, GEN) ==
eLADNEGO @KLOPOTU, A F T E R .
( V P T - the a b b r e v i a t e d p r e s e n t tense form
I
of the verb B Y C ; V O W and C O N m e a n
"used after a vowel" and "used after a c o n s o n a n t " ) .
S o far w e do not see the simple and satisfactory w a y of relating the parameter
• X of % V P T to the other w o r d s and phrases. Provisionally the agreement of the agglutinative forms of the verb B Y E with the corresponding w o r d s m a y Be r e - solved during dictionary lookup in the pre-parsing phase.
T h e other purely linguistic problems are related to influence of the free w o r d - order on accomodating the verb phrase to the gender of a c o m p o u n d noun phrase.
F o r example, the verb phrases in the apo- sition agree in gender with the last consti- tuent of the n o u n phrase, as in:
(i0) JAN m : B M A R I A P R Z Y S Z L i 'John' "or" "Mary ~ "came"
If era]
Similarly, the gender of the verb phrase in the postposition m a y agree with the first constituent of the noun p h r a s e , for example :
(II) P R Z Y S Z E D L JAN LUB MARIA
• C a l n e ~