• Nie Znaleziono Wyników

Corpus Linguistics and the Lexicon

N/A
N/A
Protected

Academic year: 2021

Share "Corpus Linguistics and the Lexicon"

Copied!
10
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S FO LIA L IN G U IST IC A 36, 1997

PA R T I. LEXICON

Barbara Lew andow ska T om aszczyk

C O R P U S L IN G U IST IC S A N D T H E LEX ICO N*

1. INTRODUCTION

A n a tte m p t is m ad e in this study to present the place and fu n c tio n o f co m p u ters in th e lexicological analysis o f n atu ra i lan gu age and its lexicog­ ra p h ic applicatio ns. Issues exam ined in this p ap e r are co nnected w ith th e acqu isitio n o f lexical know ledge from linguistic corp us d a ta , reu sability o f the lexical know ledge in m o n o lin g u al an d m ultiling ual lexicographic task s as well as possible im plications o f such m etho do lo gies fo r the analysis o f h u m an language lexis.

2. CORPUS LINGUISTICS

C orpus Linguistics and, m ore precisely, C o m p u tatio n al C o rp u s Linguistics is a relatively new developm ent in the study o f language, rap id ly developing in the eighties (cf. th e first co rp u s and its d escrip tio n by К u ć e r a and F r a n c i s 1967, cf. also M a k k a i 1980, M e i j s 1987, S i n c l a i r 1991). T h e p rim ary task o f C o rp u s Linguistics is g ath erin g and sto rin g (originally in a b o o k fo rm a t, a t present - in electronic form ) o f large q u an titie s o f au th en tic langu age d a ta , spoken and w ritten. T h e co ncep t o f corpus does n o t entail the sense th a t w ould cover any a rb itra ry collection o f lan guage d a ta . A co rp u s, in the sense used here, is, as G . L e e c h [1991: 11] p u t

* T his research has been partly covered by th e E uropean C om m unity Program o f C ooperation in Science and Technology between W estern and Eastern E uropean Countries Nr. 8453 as well as the T E M P U S grant Nr. 2087.

(2)

it, a collection o f m ach in e-read ab le linguistic d a ta “ designed o r required fo r a p a rtic u la r ‘re p resen ta tiv e’ fu n c tio n ” . T hese ‘d a ta b a n k s ’, as th ey are som etim es called, p ro v id e linguists w ith the real m aterials ag a in st w hich they can test th eir hypotheses.

C o rp o ra o f w ritten language are m o re num erou s. H o w ev er, th e w ork o n speech o u tp u t po in ts to th e need fo r co rp o ra o f sp o k en language. T h e a u th e n tic spoken d a ta are m o re difficult to collect and an ad d itio n al p ro b lem is an o rth o g ra p h ic o r p h o n e tic tran sc rip tio n w hich is a very tim e consum ing enterprise. A ttem pts at autom atic speech analysis and transcription, are m u ch less developed th a n m e th o d s o f w ritten text h and lin g. A t the sam e tim e, such large q u an titie s o f linguistic d a ta can g enerate q u estio n s a n d issues w hich could have never been asked h ad such q u a n titie s o f m aterial n o t been collected and analysed. T he m anagem ent o f such datab an k s, i. e. the access an d retrieval o f lexical in fo rm atio n in th e d igital form , is m a d e possible by co m p u ter softw are o f tagging, p arsin g a n d c o n c o rd an c in g type. D ifferen t c o m p u ter p ro g ram s are used to g en erate lexica, d ictio n aries a n d th esau ri o f th e acquired lexical know ledge.

T h e linguistic corpus can be trea ted as a significant lexical resource, an em b o d im en t o f a to k en ‘n ativ e sp ea k er’ w ith a cum u lative co m p eten ce o f all an d each n ative speaker-m em ber o f a given linguistic co m m u n ity . A nd yet, it sh ould be b o rn e in m ind th a t such ‘su rface’ p h en o m en a , classically a ttrib u te d to ‘p e rfo rm an c e’ ra th e r th a n ‘com petence’, as false starts, clum sy syntax, ab d u ctiv e lexical uses, often paten tly w rong, etc., are also th ere in th e c o rp u s an d even th o u g h th ey m ay be treated as sy m p to m atic o f the sy nchronic lan guage variability o r o f fu tu re linguistic d evelopm ents, it is precisely fo r the an aly st to decide w h a t th eir c u rre n t statu s is.

3. T H E LEX ICO N

T h e lexicon used to be trea ted eith er as “ an app en d ix o f the g ra m m a r” [ B l o o m f i e l d 1933] o r as a d ep o sito ry o f syntactic irregu larities [ C h o m ­ s k y 1965]. W ith the rise o f first sem antically based m od els (e.g. G e n erativ e Sem antics) an d cognitively orien ted appro ach es to lan gu age a t p resen t [Frame Sem antics - F i l l m o r e 1977, Cognitive G ram m ar - L a k o f f and J o h n s o n 1980, L a n g a c k e r 1991, Conceptual Sem antics - J a c k e n d o f f 1983, 1992], th e place and role o f th e lexicon in linguistic m od els have rad ic ally ch an ged. In th e place o f m o d u la r co m p o n e n ts in c o rp o ra tin g sy n tax , p h o n o lo g y , sem antics in a u to n o m o u s c o m p a rtm e n ts , co g n itiv e g ram m atical m od els at p resen t view th o se ‘levels’ as a co n tin u u m ra th e r th a n m odules, unitin g lexicon, m o rp h o lo g y and syntax, each associated w ith

(3)

a ph onological and sem antic stru ctu re. Sem antics is trea ted as a sep arate co m p o n en t feeding syntax in C h o m sk y ’s m odels, while in th e cognitive m odels it is equ ated w ith conceptu alizatio n s and encom passes different kinds o f h u m an experience im m ersed in the recognized social, physical, and linguistic co n tex t [cf. L a n g a c k e r 1991: 2]. T h e sem antic stru ctu re o f a linguistic u n it (lexical, p h rasal, sentential, etc.) is characterized relative to cognitive domains, [frames] u n d erstood as stru ctu ra l co n cep tu alizatio n s o f experience. T h is fact alo n e elim inates th e feasibility o f th e inter-level linguistic d istinctions as well as th a t o f a strict d ich otom y betw een the linguistic and the encyclopaedic know ledge. 1 would be p repared to defend a hypothesis th a t it is precisely the large language c o rp o ra th a t p ro vide a tool to ex tract the know ledge o f the lexis in its entirety in th e co n tex t o f the lexical fram es. Such corpus-based lexical know ledge p arallels the concept o f th e lexicon in its cognitive linguistic form at.

4. LEXICAL A CQ U ISITIO N

T h e extractio n o f lexical in fo rm atio n , term ed also th e acquisition o f the lexicon, is based on the extractin g o f the lexical know ledge from th e c o rp o ra o f texts as well as from m achine readable d ictio n aries (M R D s). E x trac tin g o f full lexical in fo rm atio n from a large co rp u s m an u ally could tu rn o u t to be a life tim e jo b . T herefo re there is an u rgent, and co n tin u ally grow ing need to han d le the search au tom atically. C o rp o ra o f E nglish texts are q u ite n u m ero u s and grow rapidly (IC A M E , H elsinki, L o n g m an , L an- c a stc r-L u n d , L u n d -L o n d o n , O slo-B ergen, etc.). T h e situ atio n co ncerning o th er languages, including Polish, is m uch worse. In P o lan d , som e new spaper p ublishers (e.g. G azeta W yborcza) are ready to share w ith researches th eir linguistic resources in electronic form . O th er possibilities include scanning techniques - the O M N IP A G E packet a t o u r disposal is being used for b u ild in g m onolingual corpora (P olish an d E n glish) as well as bilingual (tran slated texts) and parallel corpora (au th en tic texts covering th e sam e d o m a in in b o th languages). O n th e o th er h an d , th ere exist in P o la n d a few centers w hich c o n trib u te to the do m ain o f L anguage T echnology. A ctivities represented there p ertain to different topics in L anguage T echn olog y, such as co m p u ta tio n a l lexicography, speech generatio n an d recog nition , text u n d e rsta n d in g as well as expert system s fo r know ledge re p resen ta tio n (for a m o re exhaustive list cf. V e t u l a n i 1994).

T h e au to m a tic acquisition o f bilingual lexical know ledge (cf. exam ples below ) from bilingual c o rp o ra is only in the statu nascendi a t present. T h e available softw are, even th o u g h q u ite effective a t the sentence-alignm ent

(4)

level, uses very little linguistic sop histicatio n. T h e p ro g ram s are based m o stly o n c h a ra c te r length s a n d /o r item d istrib u tio n in th e sentence, assisted by the ‘an c h o rin g ’ techniques via p ro p e r nam es, n u m b ers, o r o th er fixed features in the texts. Lexical alignm ent is th e next step in this p rocess, c u rren tly u n d e r in v estig atio n (L a n c a ste r U C R E L team ). T h e ac q u isitio n o f lexical know ledge from n o n -tra n sla te d p arallel c o rp o ra , centered aro u n d sim ilar dom ains, on the o th er h a n d , is only a m a tte r o f theo rizin g at present. Softw are for bilingual sentence-alignm ent (tested at differen t centers at present) com bined w ith co ncord an cin g p ro g ram s, m ay pro v e very useful n o t only fo r lexicography b u t also for C A L L as well as in p a rtic u la r for the train in g o f tra n slato rs/in terp reters an d for th e tra n s ­ la tio n practice. Below are p resen ted tw o pairs o f sentences fro m th e E n g lish -P o lish bilingual co rp u s in th e D e p a rtm e n t o f E ng lish, a t th e U niv ersity o f Ł ódź, aligned at U C R E L , U n iv ersity o f L a n c a s te r (A. M cE nery and M . Oakes):

(1) sub d = 2 ---g

Properly read and interpreted these statem ents give the reader a com ­ plete, synoptical picture o f the f i r m ’s operations and results in quan­ tified fo rm .

--- g

Spraw ozdania, właściwie czytane i interpretowane dają czytelniko w i kom pletny, poglądowy obraz działalności fir m y i je j wyników w wyrażeniu liczbowym .

eon d + 232 ---g

But I d o n ’t think there's anything wrong with the school, particulary, I've seen better and I ’ve seen worse.

--- g

Ale nie wydaje m i się, żeby ze szkolą było coś szczególnie nie w porządku. W idziałem w życiu lepsze i gorsze.

Sym bols used:

sub - o ne-to-one-sentence su b stitu tio n

con - co n tra ctio n [two sentences in one language co rresp o n d in g to one sentence a in the o th e r one]

d - distance In te rp re ta tio n :

a low er d-score signifies a m o re confident alignm ent; d - depends on:

a) difference in length in characters b) likelihood o f alignm ent type.

(5)

5. REUSABILITY O F LEXICAL R ESO U RC ES

A s has been m entioned before, Lexical D atab ases (L D B s) an d Lexical K now ledge Bases (LK B s) are p ro d u c ts o f lexical ex tractio n from m achine- -re a d a b le c o rp o ra (i.e. texts and d ictio n aries) an d ca n serve, in tu rn , a n u m b er o f functions fo r b o th hu m an as well as m achine n atu ra l language processing task s such as: verb fram e acquisition, v irtual lexica building, etc. T h is can im prove the lexical acquisition process again and fu rth e r enhance th e L D B /L K B in th e reu sab ility cycle. T o m eet th e re q u ire m e n ts o f reusability o f lexical resources, there have to be assigned stan d ard ized m ark -u p , m o re specifically in lem m atization, part-of-speech (g ram m atical) tagging, syntactic, sem antic, and discourse parsing. T o m eet these co n d itio n s and facilicatc the interchange o f co rpus d a ta a team o f specialists grouped a ro u n d the T ex t E ncoding Initiative (T E I) originated in 1987 and sponsored by the C om puters in H um anities, Association f o r L iterary & Linguistic C om puting and Association fo r C om putational Linguistics is w o rk in g on the p ro d u c tio n o f a uniform system o f guidelines for text enco din g sta n d a rd s called S G M L (Standard General M arkup Language). T erm s such as ‘tag g in g ’ o r ‘p arsin g ’ are p artly a m isnom er. W h a t they involve is in fact all th a t is p ertin e n t to linguistic analysis, from sound to m eaning. 1 he ap p ro ach e s to these task s center aro u n d tw o different m eth o d s, the first one based on the conceptual analysis, the o th er o n e-u tilizin g statistical m eth o d o lo g y .

5.1. Cognitive models in N L processing

P ro b a b ilistic a p p ro a c h e s in N L processin g h ave been p receded by m eth o d s based on cognitive m odels o f know ledge re p resen tatio n . R o oted in psychological findings o f spreading activation netw orks [ A n d e r s o n 1977], they to o aim at c a p tu rin g syntactic, sem antic and discourse stru ctu re s o f n a tu ra l language by m eans o f grap h diagram s o f A u gm ented I ran sitio n N etw orks. T h e n etw orks are com posed o f nodes representing states and arcs representing relations. T h e problem w ith cognitive m od ellin g is th a t such p arsers (frequently w ritten in P R O L O G ) in co rp o rate p red icates based o n tru th co n d itio n al sem antics. W hile useful in certain c o m p u ter task s, tru th co n d itio n al sem antics does no t cover all aspects o f n atu ra l lang uag e m ean in g an d , as th e cognitively oriented linguists w ould argue, is n o t w h at the n atu ra l language sem antics is ab o u t at all. N o w onder th en th a t for the classical tru th -c o n d itio n a l fram ew o rks cognitive sem antics is a n o to rio u s p roblem . M c E n e r y [1993: 109] notices in his C om putational L inguistics

(6)

in connection w ith th at: “ O ne o f the problem s w ith p ro to ty p e sem antics is th a t it is n o t always easy to specify w hat attrib u tes an o bject is com posed of, let alone en u m erate the range values th a t a ttrib u te m ay ta k e w ith respect to the o bject” . T h e ‘p ro b lem ’ M cE n ery p o in ts a t here is exactly w h at p ro to ty p e sem antics is ab o u t. N o w onder then th a t new to o ls have to be looked for in o rd e r to progress in n a tu ra l language processing.

O ne o f them , in the im plem entation stage, is an a tte m p t for a n a tu ra l languag e processing system to be based entirely on cognitive g ra m m a r principles. Its a u th o r, K . H o l m q v i s t [1993], p ro po ses a valence acco m ­ m o d a tio n m eth o d o lo g y to ca p tu re n a tu ra l language co m prehen sio n. T his a p p ro a c h , am b itio u s as it is, is in the p ro to ty p e phase fo r th e tim e being.

T h e ideal, hardly attain ab le a t present, which w ould g u aran tee u n p ro b ­ lem atic reusability o f d a ta , w ould be an entirely ‘th eory n e u tra l’ acqu isition o f lexicon, however. T he first approxim ation to the ideal m ight be ap proach es based o n statistical p ro b ab ility techniques.

5.2. Statistical methods in parsing

C o llectin g c o rp o ra is only one side o f th e coin. A n o th e r, eq u a lly im p o rta n t one, as we have show n above, is to build c o m p u te r p ro g ram s th a t, first o f all, tag the co rp u s sentences w ith th e p arts-of-speech labels, th en syntactically analyse (parse) these sentences. T h e p rob lem here is th a t practically each sentence in a corpus, if analysed in a content-free environm ent, can be proved am biguous n o t only with respect to strict syntactic m ark in g , b u t also w ith respect to reference fixing deictic elem ents. A n a u to m a tic p a rse r is n o t only to cover every possible stru ctu re o f th e sentence, b u t also to be able to choose from am ong them th e m o st p ro b a b le p arsin g in the p a rtic u la r context. In fact, th en, the c o m p u ter pro g ram is expected to be able to perfo rm tasks left u naccounted for in m an y linguistic theories.

F o r such p ractical applications o f co m p u ter in th e d o m ain w here the analy sis p ro v id ed by th e ex p erts is n o t, o r c a n n o t be p e rh a p s, fully axiom atized, it is the statistical m eth o d s th a t p ro ve to be m o st prom ising. F ully au to m a tic m eth o d s o f statistically based parsin g are un d erw ay in a few c o m p u te r centers in E u ro p e and the U nited States, b u t th e results have n o t been published yet. O th er m eth o d s, involving h u m an-assisted parsing involve linguistic rules proposed by th e analysts and th eir app licatio n based o n the statistical algorithm [cf. M c E n e r y 1993]. T h e g ra m m a r p rovided by th e linguist is tested against the co m p u ter d a ta and corrected (‘d eb u g g e d ’) accordingly. A s a result o f processing bilingual c o rp o ra in futu re, one could aim a t building a C om putational C ontrastive G ram m ar o f

(7)

th e lan g u ag es co n c ern e d , w hich co uld be reused in th e ta sk s o f text generation, e.g. fo r the m achine tran slatio n . In o rd e r to apply this m eth o d , the p ro g ram has to be train ed o n a set o f m an u ally -p arsed sentences (usually a ro u n d one m illio n w ords), referred to as a treebank in th e co m p u ta tio n a l linguistics term inology. T his tree b an k o r skeleton parsing (cf. IB M /L an ca ster gro u p ) is usually com plem ented by g ram m atical tagging, th e corpus a n n o ta tio n technique o f p rim ary use in lexicographic practice. T h ere are n u m ero u s tagsets available reported in th e c o m p u ta tio n a l linguistic literatu re, the one, how ever, relativelly widely spread is the C L A W S T ag set (C o nsistent-L ikelihood A u to m atic W ord-T ag ging System , versions one, tw o, and fo u r cf. B lack et al. 1993) referred to also as the L an c aster T ag set. T h e re p o rte d success ra te o f the C L A W S System reaches 9 4 % . T h e exam ples o f tagged and parsed sentences are d raw n from the c o rp o ra o f the U C R E L g ro u p [ E y e s a nd L e e c h 1993: 55]:

A n exam ple o f a tre e b a n k text w ith a p p ro p ria te g ra m m a tica l tags (linked, by underlined sym bols, to each w ord an d p u n c tu a tio n m a rk ) is d raw n from the C an ad ia n H a n sa rd C orpus:

(2) M ay _ V M I_ P P IS 1 say_V V I ,-, M r._ N N S B l S p e a k e r_ N N S l ,-, that_ C S T I_PPIS1 have_V H O sent_V V N a _ A T l copy N N 1 o f_ IO th is _ D D l to _ II the_A T chairm an__N N Sl o f_ IO th e_ A T com m it- tee_ N N J and _ C C to _ II th e_ A T tw o _ M C m inisi.ers_N N S2 invol-v e d _ W N .- .

T a g sym bols explained: VM - m o d a l auxiliary verb

P P IS 1 - personal p ro n o u n , first perso n, subjunctive, singular VVI - general lexical verb infinitive

N N S B I - noun, preceding singular n o u n o f style o r title, ab brev iato ry N N S1 - n o u n o f style, singular

C S T - that as a co n ju n cto r V H O - base form have

VVN - p ast p articip ai o f lexical verb A T I - singular article

NN1 - singular com m on no u n IO - o f as p reposition DD1 - singular determ iner 11 - general p reposition

AT - article, n eutral for n u m b er

N N J - o rg an izatio n n o u n , n eutral fo r n u m ber C C - co o rd in a tin g con junction

M C - card in al num b er, n eutral fo r n u m b er N N S 2 - plural n o u n o f style

(8)

VVN - p ast p articiple o f lexical verb

- p u n c tu a tio n tag - full stop

[There are over 150 tags used in th e C LA W S 2 a T ag set by the L an caster/IB M G roup]

T h e exam ples o f gram m atical tagging are extracted from the C o m p u ter M an u a ls tree b an k o f skeleton-parsed sentences:

(3) [N F ile s_ N N 2 N] [V[V& com e_vvo [P in to _ II [N th e_ A T p rin t_ N N l q u e u e _ N N l N]P]V&] and_C C [V + either_L E [V & [V& m atch JV V O [N[G a_ A T 1 p rin te r_ N N l 's_SG] s e tu p _ N N l N]V&] (_([V + get_V V O [ I n printed_V V N Tn] V -f])_ ) V&] o r_C C [V -f[V & d o _ V D O n o t_ X X m a t c h _ W l V&] (_ ([V + w ait_V V O V + ])_ ) V |-]V + ]V ]._.

A d d itio n al sym bols: in C o n stitu en t L a b e l s for the U C R E L P arsin g Schem e

N - N o u n phrase

V - V erb p hrase & - C o o rd in atio n - initial conjunct + - C o o rd in a tio n - non-initial conjun ct

O n to p o f gram m atical parts-of-speech and syntactic tags, a ttem p ts are being m a d e to m a rk the text w ith sem antic an d d iscourse labels (G . Leech). T h ese techniqu es can bring a b o u t the refinem ent o f th e crude ‘p h ysical’ to o ls fo r language analysis and introduce a m o re su btle m eth o d o lo g y which can constrain the analysis to the level required for a num b er o f com pu tational applications such as e. g. autom atic sense extraction, autom atic abstracting, etc.

6. C O N C LU SIO N S

1. C o m puterized techniques o f linguistic access and retrieval m ak e it possible fo r the linguist to o b tain a large spectrum o f linguistic d a ta in a relatively sh o rt time.

Lexical K no w ledge Bases and th eir subdo m ains k ep t on-line and co n stan tly u p d a te d , m ay be reused for different linguistic tasks (also bi- an d m u lti- -lingual).

L arge linguistic c o rp o ra and M R D s provide d a ta fo r a u to m a tic lexical d a ta and know ledge acquisition.

2. C om puterized language c o rp o ra , efficiently m an ag ed , and assisted by the au to m a tic alignm ent softw are can be used for a n u m b er o f tasks. In lex ico g rap h y , C A L L , tra n s la tio n , th ey pro vide: full lexical kno w led g e including frequencies and contextual m odifications; collocations, associations, ex p lo itatio n s o f conceptual and syntactic p attern s (microframe); full in fo r­

(9)

m atio n on pragm atically-sensitive use (macroframe); in form ation o n sim ilarities an d co n tra sts in m eaning.

3. Lexical sem antic tagging supp o rts the parts-of-speech and gram m atized analysis and leads to a u to m a tic analysis o f senses an d its n u m e ro u s applicatio n s such as au to m atic abstracting.

4. S tatistically based technique in a u to m atic a n n o ta tio n uncover the non-discrete n atu re o f lexical senses and th eir in separability from th eir know ledge fram es.

R EFER EN CES

A n d e r s o n , J. (1977) “Induction augmented o f transition netw orks” . Cognitive Science 1: 125-157.

B l a c k , E., G a r s i d e r , L e e c h , G . (eds) (1993), Statistically-driven computer grammars o f

English: the IBM /Lancaster approach [Language and Com puters: Studies in Practical

Linguistics, No. 8 ed. J. A a r t s and W. M e i j s ] A m sterdam : R odopi.

F i l l m o r e , C. (1977), “ Topics in lexical sem mantics” . ln R. , W. C o l e (ed.), Current issues

in Linguistic theory, Bloomington: Indiana University Press, 76-138.

G a l e , W. A., C h u r c h , K. W. (1993) “ A Program for Aligning Sentences in Bilingual C orp o ra” , Computational Linguistics 19, 1.

H o l m q v i s t , K. (1993) Implementing Cognitive Semantics. Lund: D epartm ent of Cognitive Science, Lund University.

J a c k e n d o f f , R. (1983), Semantics and cognition. Cambridge, Mass.: M IT Press.

J a c k e n d o f f , R. (1992) “ W hat is a concept?” ln L e h r e r , A. and E. F. K i t l a y (eds).

Frames, fields and contrasts: New essays in semiotic and lexical organization. Hillsdale, N.

J.; Lawrence Erlbaum , 191-208.

К и б е г а , H. and W. N. F r a c i s (1967), Computational analysis o f present-day American

English. Providence: R . J. Brown University Press.

L a k o f f , G. and M. J o h n s o n (1980) Metaphors we live hy. Chicago: University o f Chicago Press.

L a n g a c k e r , R. (1987) Foundations o f Cognitive Grammar, vol. 1, Stanford: Stanford University Press.

L a n g a c k e r , R. (1991) Foundations o f Cognitive Grammar, Vol. 2. Stanford: Stanford University Press.

L e e c h , G . (1991) “The state o f the art in corpus linguistics” . In A i j m e r , K. and B. A l t e n b e r g (eds). English Corpus Linguistics (Studies in honour o f Jan Svartvik), H arlow: Longm an, 8-29.

M a k k a i , A. (1989), ‘T heoretical and practical aspects o f an associative lexicon for 20th century English:, In: L. Zgusta (ed.), Theory and m ethod in lexicography. Western and

Non-Western Perspectives, Colum bia, S. Carolina: H orbeam Press, 125-46

M c E n e r y , A. M . (1992) C om putational Linguistics. Sigma Press.

M e i j s , W. (ed.) (1987) Corpus Linguistics and Beyond (Proceedings o f the Seventh International Conference on English Language Research on Computerized C orpora). Amsterdam: Rodopi. S i n c l a i r , J. (1991) Corpus, condarce, collocation. Oxford: Oxford University Press.

V e t u l a n i , Z. (1991) “ Polish activity in the dom ain o f Language Technology (LT)” . Paper presented at the meeting Language and Technology. A wam ess Days in Luxem bourg for Central and Eastern Europe, 13-14 January, Luxembourg.

(10)

Barbara Lewandowska-Tomaszczyk

JĘZY K O ZN A W STW O K O R PU SO W E A LEK SY KA

A u to rk a analizuje miejsce i funkcje korpusów językowych w analizie leksykograficznej języka oraz w jej zastosow aniach leksykograficznych. B adana problem atyka dotyczy akwizycji wiedzy leksykalnej z lingwistycznych danych korpusowych, w ielokrotnego używ ania tej wiedzy w zadaniach leksykografii jedno- i wielojęzycznej oraz możliwych implikacji takich metodologii w analizie słownictwa języka naturalnego. W pracy poruszono zagadnienia autom atycznej analizy językowych danych korpusowych i zaprezentowano ich przykłady na m ateriale języka angielskiego.

Cytaty

Powiązane dokumenty

Institutional change in architecture of European safety system introduced in response to economic crisis is the establishment of European System of Financial

Psychoanalityczna koncepcja człowieka starego 171 cemu się i/lub staremu nieznane strony i mechanizmy jego osobowości, wskazu- jąc nade wszystko ‒ jak mierzyć się z procesem

International law and policy appear to have been buttressed by a state legal context that included the earnest application of rural family planning programs, and a suitable

above, Anzaldúa defines the human in language that acknowledges social identity categories (referring to “all races / sexes”) yet exceeds the human (body) to embrace non-animal

Omawiana książka dzieli się na trzy części. Pierwsza om awia zagadnienia genetyczne, druga bieg wydarzeń rewolucyjnych, trzecia rozważa bilans R ew o­ lucji i jej

Strefa zalewowa wygenerowana metodą miękką z zastosowaniem globalnej wartości dokładności z wykorzystaniem NMT ISOK oryginalnego a oraz poprawionego przy pomocy mapy

Eine Überschneidung zwischen Macrons Sorbonne-Rede und dem aktuellen Koalitionsvertrag findet sich zum Beispiel bei der Forschung im Feld der künstlichen Intelligenz: das „lasst

46,6% wszystkich respondentów odniosła się do propozycji obejrzenia wy- stawy archeologicznej, spośród tych osób łącznie aż 85,0% wykazało się bardzo dużym i