Corpus Linguistics and the Lexicon

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S FO LIA L IN G U IST IC A 36, 1997

PA R T I. LEXICON

Barbara Lew andow ska T om aszczyk

C O R P U S L IN G U IST IC S A N D T H E LEX ICO N*

1. INTRODUCTION

A n a tte m p t is m ad e in this study to present the place and fu n c tio n o f co m p u ters in th e lexicological analysis o f n atu ra i lan gu age and its lexicog ra p h ic applicatio ns. Issues exam ined in this p ap e r are co nnected w ith th e acqu isitio n o f lexical know ledge from linguistic corp us d a ta , reu sability o f the lexical know ledge in m o n o lin g u al an d m ultiling ual lexicographic task s as well as possible im plications o f such m etho do lo gies fo r the analysis o f h u m an language lexis.

2. CORPUS LINGUISTICS

C orpus Linguistics and, m ore precisely, C o m p u tatio n al C o rp u s Linguistics is a relatively new developm ent in the study o f language, rap id ly developing in the eighties (cf. th e first co rp u s and its d escrip tio n by К u ć e r a and F r a n c i s 1967, cf. also M a k k a i 1980, M e i j s 1987, S i n c l a i r 1991). T h e p rim ary task o f C o rp u s Linguistics is g ath erin g and sto rin g (originally in a b o o k fo rm a t, a t present - in electronic form ) o f large q u an titie s o f au th en tic langu age d a ta , spoken and w ritten. T h e co ncep t o f corpus does n o t entail the sense th a t w ould cover any a rb itra ry collection o f lan guage d a ta . A co rp u s, in the sense used here, is, as G . L e e c h [1991: 11] p u t

* T his research has been partly covered by th e E uropean C om m unity Program o f C ooperation in Science and Technology between W estern and Eastern E uropean Countries Nr. 8453 as well as the T E M P U S grant Nr. 2087.

(2)

it, a collection o f m ach in e-read ab le linguistic d a ta “ designed o r required fo r a p a rtic u la r ‘re p resen ta tiv e’ fu n c tio n ” . T hese ‘d a ta b a n k s ’, as th ey are som etim es called, p ro v id e linguists w ith the real m aterials ag a in st w hich they can test th eir hypotheses.

C o rp o ra o f w ritten language are m o re num erou s. H o w ev er, th e w ork o n speech o u tp u t po in ts to th e need fo r co rp o ra o f sp o k en language. T h e a u th e n tic spoken d a ta are m o re difficult to collect and an ad d itio n al p ro b lem is an o rth o g ra p h ic o r p h o n e tic tran sc rip tio n w hich is a very tim e consum ing enterprise. A ttem pts at autom atic speech analysis and transcription, are m u ch less developed th a n m e th o d s o f w ritten text h and lin g. A t the sam e tim e, such large q u an titie s o f linguistic d a ta can g enerate q u estio n s a n d issues w hich could have never been asked h ad such q u a n titie s o f m aterial n o t been collected and analysed. T he m anagem ent o f such datab an k s, i. e. the access an d retrieval o f lexical in fo rm atio n in th e d igital form , is m a d e possible by co m p u ter softw are o f tagging, p arsin g a n d c o n c o rd an c in g type. D ifferen t c o m p u ter p ro g ram s are used to g en erate lexica, d ictio n aries a n d th esau ri o f th e acquired lexical know ledge.

T h e linguistic corpus can be trea ted as a significant lexical resource, an em b o d im en t o f a to k en ‘n ativ e sp ea k er’ w ith a cum u lative co m p eten ce o f all an d each n ative speaker-m em ber o f a given linguistic co m m u n ity . A nd yet, it sh ould be b o rn e in m ind th a t such ‘su rface’ p h en o m en a , classically a ttrib u te d to ‘p e rfo rm an c e’ ra th e r th a n ‘com petence’, as false starts, clum sy syntax, ab d u ctiv e lexical uses, often paten tly w rong, etc., are also th ere in th e c o rp u s an d even th o u g h th ey m ay be treated as sy m p to m atic o f the sy nchronic lan guage variability o r o f fu tu re linguistic d evelopm ents, it is precisely fo r the an aly st to decide w h a t th eir c u rre n t statu s is.

3. T H E LEX ICO N

T h e lexicon used to be trea ted eith er as “ an app en d ix o f the g ra m m a r” [ B l o o m f i e l d 1933] o r as a d ep o sito ry o f syntactic irregu larities [ C h o m s k y 1965]. W ith the rise o f first sem antically based m od els (e.g. G e n erativ e Sem antics) an d cognitively orien ted appro ach es to lan gu age a t p resen t [Frame Sem antics - F i l l m o r e 1977, Cognitive G ram m ar - L a k o f f and J o h n s o n 1980, L a n g a c k e r 1991, Conceptual Sem antics - J a c k e n d o f f 1983, 1992], th e place and role o f th e lexicon in linguistic m od els have rad ic ally ch an ged. In th e place o f m o d u la r co m p o n e n ts in c o rp o ra tin g sy n tax , p h o n o lo g y , sem antics in a u to n o m o u s c o m p a rtm e n ts , co g n itiv e g ram m atical m od els at p resen t view th o se ‘levels’ as a co n tin u u m ra th e r th a n m odules, unitin g lexicon, m o rp h o lo g y and syntax, each associated w ith

(3)

a ph onological and sem antic stru ctu re. Sem antics is trea ted as a sep arate co m p o n en t feeding syntax in C h o m sk y ’s m odels, while in th e cognitive m odels it is equ ated w ith conceptu alizatio n s and encom passes different kinds o f h u m an experience im m ersed in the recognized social, physical, and linguistic co n tex t [cf. L a n g a c k e r 1991: 2]. T h e sem antic stru ctu re o f a linguistic u n it (lexical, p h rasal, sentential, etc.) is characterized relative to cognitive domains, [frames] u n d erstood as stru ctu ra l co n cep tu alizatio n s o f experience. T h is fact alo n e elim inates th e feasibility o f th e inter-level linguistic d istinctions as well as th a t o f a strict d ich otom y betw een the linguistic and the encyclopaedic know ledge. 1 would be p repared to defend a hypothesis th a t it is precisely the large language c o rp o ra th a t p ro vide a tool to ex tract the know ledge o f the lexis in its entirety in th e co n tex t o f the lexical fram es. Such corpus-based lexical know ledge p arallels the concept o f th e lexicon in its cognitive linguistic form at.

4. LEXICAL A CQ U ISITIO N

T h e extractio n o f lexical in fo rm atio n , term ed also th e acquisition o f the lexicon, is based on the extractin g o f the lexical know ledge from th e c o rp o ra o f texts as well as from m achine readable d ictio n aries (M R D s). E x trac tin g o f full lexical in fo rm atio n from a large co rp u s m an u ally could tu rn o u t to be a life tim e jo b . T herefo re there is an u rgent, and co n tin u ally grow ing need to han d le the search au tom atically. C o rp o ra o f E nglish texts are q u ite n u m ero u s and grow rapidly (IC A M E , H elsinki, L o n g m an , L an- c a stc r-L u n d , L u n d -L o n d o n , O slo-B ergen, etc.). T h e situ atio n co ncerning o th er languages, including Polish, is m uch worse. In P o lan d , som e new spaper p ublishers (e.g. G azeta W yborcza) are ready to share w ith researches th eir linguistic resources in electronic form . O th er possibilities include scanning techniques - the O M N IP A G E packet a t o u r disposal is being used for b u ild in g m onolingual corpora (P olish an d E n glish) as well as bilingual (tran slated texts) and parallel corpora (au th en tic texts covering th e sam e d o m a in in b o th languages). O n th e o th er h an d , th ere exist in P o la n d a few centers w hich c o n trib u te to the do m ain o f L anguage T echnology. A ctivities represented there p ertain to different topics in L anguage T echn olog y, such as co m p u ta tio n a l lexicography, speech generatio n an d recog nition , text u n d e rsta n d in g as well as expert system s fo r know ledge re p resen ta tio n (for a m o re exhaustive list cf. V e t u l a n i 1994).

T h e au to m a tic acquisition o f bilingual lexical know ledge (cf. exam ples below ) from bilingual c o rp o ra is only in the statu nascendi a t present. T h e available softw are, even th o u g h q u ite effective a t the sentence-alignm ent

(4)

level, uses very little linguistic sop histicatio n. T h e p ro g ram s are based m o stly o n c h a ra c te r length s a n d /o r item d istrib u tio n in th e sentence, assisted by the ‘an c h o rin g ’ techniques via p ro p e r nam es, n u m b ers, o r o th er fixed features in the texts. Lexical alignm ent is th e next step in this p rocess, c u rren tly u n d e r in v estig atio n (L a n c a ste r U C R E L team ). T h e ac q u isitio n o f lexical know ledge from n o n -tra n sla te d p arallel c o rp o ra , centered aro u n d sim ilar dom ains, on the o th er h a n d , is only a m a tte r o f theo rizin g at present. Softw are for bilingual sentence-alignm ent (tested at differen t centers at present) com bined w ith co ncord an cin g p ro g ram s, m ay pro v e very useful n o t only fo r lexicography b u t also for C A L L as well as in p a rtic u la r for the train in g o f tra n slato rs/in terp reters an d for th e tra n s la tio n practice. Below are p resen ted tw o pairs o f sentences fro m th e E n g lish -P o lish bilingual co rp u s in th e D e p a rtm e n t o f E ng lish, a t th e U niv ersity o f Ł ódź, aligned at U C R E L , U n iv ersity o f L a n c a s te r (A. M cE nery and M . Oakes):

(1) sub d = 2 ---g

Properly read and interpreted these statem ents give the reader a com  plete, synoptical picture o f the f i r m ’s operations and results in quan tified fo rm .

--- g

Spraw ozdania, właściwie czytane i interpretowane dają czytelniko w i kom pletny, poglądowy obraz działalności fir m y i je j wyników w wyrażeniu liczbowym .

eon d + 232 ---g

But I d o n ’t think there's anything wrong with the school, particulary, I've seen better and I ’ve seen worse.

--- g

Ale nie wydaje m i się, żeby ze szkolą było coś szczególnie nie w porządku. W idziałem w życiu lepsze i gorsze.

Sym bols used:

sub - o ne-to-one-sentence su b stitu tio n

con - co n tra ctio n [two sentences in one language co rresp o n d in g to one sentence a in the o th e r one]

d - distance In te rp re ta tio n :

a low er d-score signifies a m o re confident alignm ent; d - depends on:

a) difference in length in characters b) likelihood o f alignm ent type.

(5)

5. REUSABILITY O F LEXICAL R ESO U RC ES

A s has been m entioned before, Lexical D atab ases (L D B s) an d Lexical K now ledge Bases (LK B s) are p ro d u c ts o f lexical ex tractio n from m achine- -re a d a b le c o rp o ra (i.e. texts and d ictio n aries) an d ca n serve, in tu rn , a n u m b er o f functions fo r b o th hu m an as well as m achine n atu ra l language processing task s such as: verb fram e acquisition, v irtual lexica building, etc. T h is can im prove the lexical acquisition process again and fu rth e r enhance th e L D B /L K B in th e reu sab ility cycle. T o m eet th e re q u ire m e n ts o f reusability o f lexical resources, there have to be assigned stan d ard ized m ark -u p , m o re specifically in lem m atization, part-of-speech (g ram m atical) tagging, syntactic, sem antic, and discourse parsing. T o m eet these co n d itio n s and facilicatc the interchange o f co rpus d a ta a team o f specialists grouped a ro u n d the T ex t E ncoding Initiative (T E I) originated in 1987 and sponsored by the C om puters in H um anities, Association f o r L iterary & Linguistic C om puting and Association fo r C om putational Linguistics is w o rk in g on the p ro d u c tio n o f a uniform system o f guidelines for text enco din g sta n d a rd s called S G M L (Standard General M arkup Language). T erm s such as ‘tag g in g ’ o r ‘p arsin g ’ are p artly a m isnom er. W h a t they involve is in fact all th a t is p ertin e n t to linguistic analysis, from sound to m eaning. 1 he ap p ro ach e s to these task s center aro u n d tw o different m eth o d s, the first one based on the conceptual analysis, the o th er o n e-u tilizin g statistical m eth o d o lo g y .

5.1. Cognitive models in N L processing

P ro b a b ilistic a p p ro a c h e s in N L processin g h ave been p receded by m eth o d s based on cognitive m odels o f know ledge re p resen tatio n . R o oted in psychological findings o f spreading activation netw orks [ A n d e r s o n 1977], they to o aim at c a p tu rin g syntactic, sem antic and discourse stru ctu re s o f n a tu ra l language by m eans o f grap h diagram s o f A u gm ented I ran sitio n N etw orks. T h e n etw orks are com posed o f nodes representing states and arcs representing relations. T h e problem w ith cognitive m od ellin g is th a t such p arsers (frequently w ritten in P R O L O G ) in co rp o rate p red icates based o n tru th co n d itio n al sem antics. W hile useful in certain c o m p u ter task s, tru th co n d itio n al sem antics does no t cover all aspects o f n atu ra l lang uag e m ean in g an d , as th e cognitively oriented linguists w ould argue, is n o t w h at the n atu ra l language sem antics is ab o u t at all. N o w onder th en th a t for the classical tru th -c o n d itio n a l fram ew o rks cognitive sem antics is a n o to rio u s p roblem . M c E n e r y [1993: 109] notices in his C om putational L inguistics

(6)

in connection w ith th at: “ O ne o f the problem s w ith p ro to ty p e sem antics is th a t it is n o t always easy to specify w hat attrib u tes an o bject is com posed of, let alone en u m erate the range values th a t a ttrib u te m ay ta k e w ith respect to the o bject” . T h e ‘p ro b lem ’ M cE n ery p o in ts a t here is exactly w h at p ro to ty p e sem antics is ab o u t. N o w onder then th a t new to o ls have to be looked for in o rd e r to progress in n a tu ra l language processing.

O ne o f them , in the im plem entation stage, is an a tte m p t for a n a tu ra l languag e processing system to be based entirely on cognitive g ra m m a r principles. Its a u th o r, K . H o l m q v i s t [1993], p ro po ses a valence acco m m o d a tio n m eth o d o lo g y to ca p tu re n a tu ra l language co m prehen sio n. T his a p p ro a c h , am b itio u s as it is, is in the p ro to ty p e phase fo r th e tim e being.

T h e ideal, hardly attain ab le a t present, which w ould g u aran tee u n p ro b lem atic reusability o f d a ta , w ould be an entirely ‘th eory n e u tra l’ acqu isition o f lexicon, however. T he first approxim ation to the ideal m ight be ap proach es based o n statistical p ro b ab ility techniques.

5.2. Statistical methods in parsing

C o llectin g c o rp o ra is only one side o f th e coin. A n o th e r, eq u a lly im p o rta n t one, as we have show n above, is to build c o m p u te r p ro g ram s th a t, first o f all, tag the co rp u s sentences w ith th e p arts-of-speech labels, th en syntactically analyse (parse) these sentences. T h e p rob lem here is th a t practically each sentence in a corpus, if analysed in a content-free environm ent, can be proved am biguous n o t only with respect to strict syntactic m ark in g , b u t also w ith respect to reference fixing deictic elem ents. A n a u to m a tic p a rse r is n o t only to cover every possible stru ctu re o f th e sentence, b u t also to be able to choose from am ong them th e m o st p ro b a b le p arsin g in the p a rtic u la r context. In fact, th en, the c o m p u ter pro g ram is expected to be able to perfo rm tasks left u naccounted for in m an y linguistic theories.

F o r such p ractical applications o f co m p u ter in th e d o m ain w here the analy sis p ro v id ed by th e ex p erts is n o t, o r c a n n o t be p e rh a p s, fully axiom atized, it is the statistical m eth o d s th a t p ro ve to be m o st prom ising. F ully au to m a tic m eth o d s o f statistically based parsin g are un d erw ay in a few c o m p u te r centers in E u ro p e and the U nited States, b u t th e results have n o t been published yet. O th er m eth o d s, involving h u m an-assisted parsing involve linguistic rules proposed by th e analysts and th eir app licatio n based o n the statistical algorithm [cf. M c E n e r y 1993]. T h e g ra m m a r p rovided by th e linguist is tested against the co m p u ter d a ta and corrected (‘d eb u g g e d ’) accordingly. A s a result o f processing bilingual c o rp o ra in futu re, one could aim a t building a C om putational C ontrastive G ram m ar o f

(7)

th e lan g u ag es co n c ern e d , w hich co uld be reused in th e ta sk s o f text generation, e.g. fo r the m achine tran slatio n . In o rd e r to apply this m eth o d , the p ro g ram has to be train ed o n a set o f m an u ally -p arsed sentences (usually a ro u n d one m illio n w ords), referred to as a treebank in th e co m p u ta tio n a l linguistics term inology. T his tree b an k o r skeleton parsing (cf. IB M /L an ca ster gro u p ) is usually com plem ented by g ram m atical tagging, th e corpus a n n o ta tio n technique o f p rim ary use in lexicographic practice. T h ere are n u m ero u s tagsets available reported in th e c o m p u ta tio n a l linguistic literatu re, the one, how ever, relativelly widely spread is the C L A W S T ag set (C o nsistent-L ikelihood A u to m atic W ord-T ag ging System , versions one, tw o, and fo u r cf. B lack et al. 1993) referred to also as the L an c aster T ag set. T h e re p o rte d success ra te o f the C L A W S System reaches 9 4 % . T h e exam ples o f tagged and parsed sentences are d raw n from the c o rp o ra o f the U C R E L g ro u p [ E y e s a nd L e e c h 1993: 55]:

A n exam ple o f a tre e b a n k text w ith a p p ro p ria te g ra m m a tica l tags (linked, by underlined sym bols, to each w ord an d p u n c tu a tio n m a rk ) is d raw n from the C an ad ia n H a n sa rd C orpus:

(2) M ay _ V M I_ P P IS 1 say_V V I ,-, M r._ N N S B l S p e a k e r_ N N S l ,-, that_ C S T I_PPIS1 have_V H O sent_V V N a _ A T l copy N N 1 o f_ IO th is _ D D l to _ II the_A T chairm an__N N Sl o f_ IO th e_ A T com m it- tee_ N N J and _ C C to _ II th e_ A T tw o _ M C m inisi.ers_N N S2 invol-v e d _ W N .- .

T a g sym bols explained: VM - m o d a l auxiliary verb

P P IS 1 - personal p ro n o u n , first perso n, subjunctive, singular VVI - general lexical verb infinitive

N N S B I - noun, preceding singular n o u n o f style o r title, ab brev iato ry N N S1 - n o u n o f style, singular

C S T - that as a co n ju n cto r V H O - base form have

VVN - p ast p articip ai o f lexical verb A T I - singular article

NN1 - singular com m on no u n IO - o f as p reposition DD1 - singular determ iner 11 - general p reposition

AT - article, n eutral for n u m b er

N N J - o rg an izatio n n o u n , n eutral fo r n u m ber C C - co o rd in a tin g con junction

M C - card in al num b er, n eutral fo r n u m b er N N S 2 - plural n o u n o f style

(8)

VVN - p ast p articiple o f lexical verb

- p u n c tu a tio n tag - full stop

[There are over 150 tags used in th e C LA W S 2 a T ag set by the L an caster/IB M G roup]

T h e exam ples o f gram m atical tagging are extracted from the C o m p u ter M an u a ls tree b an k o f skeleton-parsed sentences:

(3) [N F ile s_ N N 2 N] [V[V& com e_vvo [P in to _ II [N th e_ A T p rin t_ N N l q u e u e _ N N l N]P]V&] and_C C [V + either_L E [V & [V& m atch JV V O [N[G a_ A T 1 p rin te r_ N N l 's_SG] s e tu p _ N N l N]V&] (_([V + get_V V O [ I n printed_V V N Tn] V -f])_ ) V&] o r_C C [V -f[V & d o _ V D O n o t_ X X m a t c h _ W l V&] (_ ([V + w ait_V V O V + ])_ ) V |-]V + ]V ]._.

A d d itio n al sym bols: in C o n stitu en t L a b e l s for the U C R E L P arsin g Schem e

N - N o u n phrase

V - V erb p hrase & - C o o rd in atio n - initial conjunct + - C o o rd in a tio n - non-initial conjun ct

O n to p o f gram m atical parts-of-speech and syntactic tags, a ttem p ts are being m a d e to m a rk the text w ith sem antic an d d iscourse labels (G . Leech). T h ese techniqu es can bring a b o u t the refinem ent o f th e crude ‘p h ysical’ to o ls fo r language analysis and introduce a m o re su btle m eth o d o lo g y which can constrain the analysis to the level required for a num b er o f com pu tational applications such as e. g. autom atic sense extraction, autom atic abstracting, etc.

6. C O N C LU SIO N S

1. C o m puterized techniques o f linguistic access and retrieval m ak e it possible fo r the linguist to o b tain a large spectrum o f linguistic d a ta in a relatively sh o rt time.

Lexical K no w ledge Bases and th eir subdo m ains k ep t on-line and co n stan tly u p d a te d , m ay be reused for different linguistic tasks (also bi- an d m u lti- -lingual).

L arge linguistic c o rp o ra and M R D s provide d a ta fo r a u to m a tic lexical d a ta and know ledge acquisition.

2. C om puterized language c o rp o ra , efficiently m an ag ed , and assisted by the au to m a tic alignm ent softw are can be used for a n u m b er o f tasks. In lex ico g rap h y , C A L L , tra n s la tio n , th ey pro vide: full lexical kno w led g e including frequencies and contextual m odifications; collocations, associations, ex p lo itatio n s o f conceptual and syntactic p attern s (microframe); full in fo r

(9)

m atio n on pragm atically-sensitive use (macroframe); in form ation o n sim ilarities an d co n tra sts in m eaning.

3. Lexical sem antic tagging supp o rts the parts-of-speech and gram m atized analysis and leads to a u to m a tic analysis o f senses an d its n u m e ro u s applicatio n s such as au to m atic abstracting.

4. S tatistically based technique in a u to m atic a n n o ta tio n uncover the non-discrete n atu re o f lexical senses and th eir in separability from th eir know ledge fram es.

R EFER EN CES

A n d e r s o n , J. (1977) “Induction augmented o f transition netw orks” . Cognitive Science 1: 125-157.

B l a c k , E., G a r s i d e r , L e e c h , G . (eds) (1993), Statistically-driven computer grammars o f

English: the IBM /Lancaster approach [Language and Com puters: Studies in Practical

Linguistics, No. 8 ed. J. A a r t s and W. M e i j s ] A m sterdam : R odopi.

F i l l m o r e , C. (1977), “ Topics in lexical sem mantics” . ln R. , W. C o l e (ed.), Current issues

in Linguistic theory, Bloomington: Indiana University Press, 76-138.

G a l e , W. A., C h u r c h , K. W. (1993) “ A Program for Aligning Sentences in Bilingual C orp o ra” , Computational Linguistics 19, 1.

H o l m q v i s t , K. (1993) Implementing Cognitive Semantics. Lund: D epartm ent of Cognitive Science, Lund University.

J a c k e n d o f f , R. (1983), Semantics and cognition. Cambridge, Mass.: M IT Press.

J a c k e n d o f f , R. (1992) “ W hat is a concept?” ln L e h r e r , A. and E. F. K i t l a y (eds).

Frames, fields and contrasts: New essays in semiotic and lexical organization. Hillsdale, N.

J.; Lawrence Erlbaum , 191-208.

К и б е г а , H. and W. N. F r a c i s (1967), Computational analysis o f present-day American

English. Providence: R . J. Brown University Press.

L a k o f f , G. and M. J o h n s o n (1980) Metaphors we live hy. Chicago: University o f Chicago Press.

L a n g a c k e r , R. (1987) Foundations o f Cognitive Grammar, vol. 1, Stanford: Stanford University Press.

L a n g a c k e r , R. (1991) Foundations o f Cognitive Grammar, Vol. 2. Stanford: Stanford University Press.

L e e c h , G . (1991) “The state o f the art in corpus linguistics” . In A i j m e r , K. and B. A l t e n b e r g (eds). English Corpus Linguistics (Studies in honour o f Jan Svartvik), H arlow: Longm an, 8-29.

M a k k a i , A. (1989), ‘T heoretical and practical aspects o f an associative lexicon for 20th century English:, In: L. Zgusta (ed.), Theory and m ethod in lexicography. Western and

Non-Western Perspectives, Colum bia, S. Carolina: H orbeam Press, 125-46

M c E n e r y , A. M . (1992) C om putational Linguistics. Sigma Press.

M e i j s , W. (ed.) (1987) Corpus Linguistics and Beyond (Proceedings o f the Seventh International Conference on English Language Research on Computerized C orpora). Amsterdam: Rodopi. S i n c l a i r , J. (1991) Corpus, condarce, collocation. Oxford: Oxford University Press.

V e t u l a n i , Z. (1991) “ Polish activity in the dom ain o f Language Technology (LT)” . Paper presented at the meeting Language and Technology. A wam ess Days in Luxem bourg for Central and Eastern Europe, 13-14 January, Luxembourg.

(10)

Barbara Lewandowska-Tomaszczyk

JĘZY K O ZN A W STW O K O R PU SO W E A LEK SY KA

A u to rk a analizuje miejsce i funkcje korpusów językowych w analizie leksykograficznej języka oraz w jej zastosow aniach leksykograficznych. B adana problem atyka dotyczy akwizycji wiedzy leksykalnej z lingwistycznych danych korpusowych, w ielokrotnego używ ania tej wiedzy w zadaniach leksykografii jedno- i wielojęzycznej oraz możliwych implikacji takich metodologii w analizie słownictwa języka naturalnego. W pracy poruszono zagadnienia autom atycznej analizy językowych danych korpusowych i zaprezentowano ich przykłady na m ateriale języka angielskiego.