• Nie Znaleziono Wyników

Describing Linde dictionary of Polishfor retro-digitisation purposes

N/A
N/A
Protected

Academic year: 2021

Share "Describing Linde dictionary of Polishfor retro-digitisation purposes"

Copied!
1
0
0

Pełen tekst

(1)

Describing Linde dictionary of Polish for retro-digitisation purposes

Joanna Bili ´nska

University of Warsaw

Dictionary

• author: Samuel Bogumił Linde

• 6 volumes

• 1807-1814 (1st ed.), 1854-1861 (2nd ed.)

• the first monolingual dictionary of Polish

• both descriptive and normative

• dictionary of Polish with translations into Ger- man, Slavic languages and other languages = multilingualism

• excellent world’s reception

• impact on other languages lexicography

• used by historians, librarians, lexicographers, linguists

Figure 1: Sample page

• mainly alphabetical with nested entries: one can find also derivates, diminutives, etc. within an entry → the word order is not strict

• alphabetic order different from the contempo- rary

• Polish diacritical marks ignored when ordering lemmas

• almost no explanation of the abbreviations

• entry words often separated into two lines be- cause of hyphenation → in retro-digitised ver- sion — requires an index (cf. [2])

Structure

• ca. 5000 two-column pages

• ca. 5400 characters per page

• > 90 languages/dialects (cf. [4], [3])

• as a corpus: about 7 mln tokens ← corpus retro-digitisation (cf. [7])

• many scripts: Latin, Cyrillic (two kinds), Frak- tur, Hebrew, Greek

Microstructure

Figure 2: Beginning of a sample entry

• lemma,

• (orthographic variants),

• grammar information,

• foreign languages translations,

• senses,

• quotations,

• other senses, including phraseology,

• metaphoric senses,

• derivates (with description — subentries),

• other derivates (mainly prephixed) as cross- references without description.

Punctuation and typography analysis — an excerpt (cf. [4])

• dot (.) — end of bigger parts of the entry (eg.

gramamr information, foreign languages trans- lations) and abbreviations

• semicolon (;) — ends smaller parts of the entry (eg. one language translations);

• brackets — () for author’s comments; [ ] for 2nd ed. editors comments;

• pause ( — ) — starts morphology information, divides senses, sometimes metaphorical senses;

• section marks (§) — metaphorical senses;

• asterix (*) — unused lemmas, outdated, neolo- gisms;

• two asterixes (**) — poetical lemmas,

• double oblique hyphen before some comments

Foreign language parts detection algorythm

For latin script the algorythm looks as follows (cf.

[4]):

• text in italics (= abbreviation), – capital letter at the beginning, – dot at the end,

• foreign example (normal font),

• semicolon,

• dot after whole part of foreign language exam- ples

However, there are also foreign language parts in other scripts:

Figure 3: Various scripts

Abbreviations analysis

Figure 4: Expanding abbreviations

Many tags for one language/dialect, eg. Weg., W˛eg., Hung., Hungar., Hng., Hg., Ung., Ungar., w˛egiersk. for Hungarian, cf. [3].

Figure 5: Expanding language tags

References

[1] Bie´n, J.S. (2011) Efficient search in hidden text of large DjVu documents. In: Advanced Language Technologies for Digital Li- braries. Lecture Notes in Computer Science(Theoretical Computer Science and General Issues) (6699). Springer, pp. 1-14. http:

//bc.klf.uw.edu.pl/177/

[2] Bie´n, J.S., Elektroniczny indeks haseł do słownika Lin- dego (druga wersja wst˛epna), https://bitbucket.org/

jsbien/ilindecsv/wiki/Home.md

[3] Bili´nska, J., (2016), Dialekty i j˛ezyki obce w Słowniku j˛ezyka pol- skiego Samuela Bogumiła Lindego – zestawienie na podstawie wydania drugiego, "Prace Filologiczne", LXVIII, p. 27-42.

[4] Bili´nska, J., Analiza i leksykograficzny opis struktury słownika Lin- dego na potrzeby dygitalizacji, upublished Ph.D. thesis, Warszawa 2013, http://bc.klf.uw.edu.pl/347/.

[5] Bili´nska, J., Sostavlenie pereˇcnâ sokrašˇcenij nazvanij âzykov v ramkah proekta digitalizacii «Slovarâ pol’skogo âzyka» S. B.

Linde [w:] Informacionnye tehnologii i pismennoe nasledie.

El’Manuscript-2012. IV nauˇcnaâ konferenciâ, Petrozavodsk, I˙zewsk 2012, p. 34-43, http://bc.klf.uw.edu.pl/301/.

[6] Linde, S.B., Słownik j˛ezyka polskiego, 2nd ed., Lwów 1854-1860, http://kpbc.umk.pl/publication/8173.

[7] Linde, S.B., Słownik j˛ezyka polskiego, corpus digitisation with search engine, https://szukajwslownikach.uw.edu.

pl/en/slownik-lindego-nowy/.

Cytaty

Powiązane dokumenty

Wspomnienia Jana Ernsta zdają się zatem wskazywać, że jednoczesne peł- nienie dwóch odmiennych zawodów — nawet w przypadku ludzi bardzo zdol- nych — na ogół odbija

Paweł Bortkiewicz, Stanisław Mikołajczak i Małgorzata Rybka, Wydaw- nictwo „Poznańskie Studia Polonistyczne”, 2009, ss. „Rozumienie pojęcia odpowiedzialności

rozpoczęto trw ające do 1954 r, system atyczne badania archeologiczne, efektem których było odkrycie naw arstw ień kulturow ych związanych z okresem halsztackim ,

Success of an international exchange requires figuring out what is the most appropri- ate market entry strategy for it and understanding of environmental forces, enterprise

W zrasta poczucie św ia­ domości narodowej, zw iększają się oczekiwania i dążenia grup etnicz­ nych, które do tej pory nie eksponowały tak mocno w łasnych

Il vise à soutenir le travail de l’élève dans une perspective actionnelle pouvant conduire à la mise en œuvre d’une réelle activité professionnelle, du type : –

Linde’s dictionary of Polish – new retro-digitisation and electronic word index..