Spring naar hoofd-inhoud

Database of Balto-Finnic languages

INTAS III

INTERNATIONAL ASSOCIATION FOR THE PROMOTION OF CO-OPERATION WITH SCIENTISTS FROM THE INDEPENDENT STATES OF THE FORMER SOVIET UNION 

THE CONSTRUCTION OF A FULL-TEXT DATABASE ON BALTO-FINNIC LANGUAGES AND RUSSIAN DIALECTS IN NORTHWEST RUSSIA

Saint-Petersburg University Petrozavodsk University
Groningen University Joensuu University

INTAS Project 99-01164 Final report  2003

 

Contents:

Titlepage with contents of the report and page numbers

Summary

1. Research

2. Management

3. Finances

4. Role and Impact of INTAS

5. Recommendations to INTAS

6. Annexes

Annex 1: Report SPB

Annex 2: Report PSU

Annex 3: Report JOE

Annex 4: Report GRO

Annex 5: Precedent tagged corpus

Annex 6: Samples of toponymic dictionary

Annex 7- 9: The three published books

Annex 10: Frisian video film on Ingrain Fins with contents


Co-ordinator: Dr. Tjeerd de Graaf

Groningen University – Department of Linguistics

PO Box 716

NL - 9700 AS Groningen

telephone: +31-50-3635982

fax: +31-50-3636855

e-mail: tdegraaf(a)fryske-akademy.nl

  

Summary of the original work plan

The project has been prepared in a co-operation between St.Petersburg University (Russia), Groningen University (The Netherlands),  Petrozavodsk University (Karelia, Russia), and Joensuu University (Finland).

The main objectives of the project have been:

  1. to reveal and analyze various types of written sources for disappearing Balto-Finnic languages and archaic  Russian dialects in Russia's Northwest area;
  2. to seek and record unparalleled modern text samples for these languages;
  3. to construct dictionaries and full-text databases on Balto-Finnic languages and archaic Northwestern Russian dialects;
  4. to compile both conventional paper and electronic dictionaries.

At the end of the first year (12 months) the main stock of material on Balto-Finnic and northwestern archaic Russian dialectal text samples has been collected and partly systematized and analyzed.

At the end of the second year (24 months) material on Balto-Finnic and northwestern archaic Russian dialects is to be systematized and formats for computer dictionary & toponymic databases are prepared for implementation.

At the end of the third year (36 months) databases on Balto-Finnic and northwestern archaic Russian dialects are made ready to be used in pilot studies concerning the search and retrieval and verification of data and the database organization.

 

Results

The goals mentioned in the summary of the work plan have been achieved. Some of the most important results are as follows.

Various types of written sources for Balto-Finnic languages and archaic Russian dialects in selected places of Russia's North-western area have been analysed. During special field expeditions to these areas relevant speech samples were recorded. About 23 hours of recordings representing the speech of 25 informants were prepared for the project and coherent fragments of texts recordings were transcribed. The total text amounts to more than 400 000 signs or 10 worksheets, which equals to 270 standard book pages. The material has been supplied with comments and editorial notes, and two monographs «Yazyk i Narod» (Language and ethnos: Texts and comments) have been published by the Saint-Petersburg University Publishing House and one monograph  «Speech samples of Northern and Middle Ingria»  by Joensuu University.

In Saint-Petersburg, Joensuu and Petrozavodsk large archives of dialectal speech records have been examined and analysed for the occurrence of contacts between various Balto-Finnic and archaic north-western Russian dialects. The recordings that were chosen for the full-text database on Balto-Finnic languages and Russian dialects in Northwest Russia represent various types of oral communication. A unified transcription system has been implemented by means of a special computer font intended to unite coding tables for both phonemic representation for Vepsian, Ingrian, and Karelian, as well as for the phonetic one for Ingermanland-Finnish. This encoding is also applied to the phonemic non-Cyrillic representation of Russian dialectal texts. A special font has been developed, which is used for a unified representation of texts in the database as well as in the publications in  “Yazyk I Narod” (Language and Ethnos: Texts and comments).

The contractors proceeded with the linguistic analysis of dialectal Balto-Finnic and archaic Russian texts. Part of the texts were presented with grammatical tags. Several formal procedures for an automatic morpheme segmentation of dialect texts in Balto-Finnic languages (Vepsian, Karelian, Izhorian, Ingermanland-Finnish) written in phonemic transcription were developed. The results of segmentation were compared with the precedent corpora, so the procedures may be used on the wider scale.

During the process of linguistic analysis and verification of archival text recordings, systematization of toponymic vocabulary several valuable observations were published in the monograph “Yazyk I Narod” (Language and Ethnos: Social-linguistic situation in Northwest Russia) published by the Saint-Petersburg University Publishing House. This book presents a complex picture of the bilingual situation in the area of tight contacts between Russians with Balto-Finns as well as interference and code switching in the dialectal speech of inhabitants of northwest Russia.

 

Impact of the Results

The database obtained will be used for scientific purposes (the study of language variety in Russia and language contact) and for the development of methods for language teaching in a bicultural environment. It  provides the best opportunity for investigating ethnic and cultural processes in the contact zone of ancient Slavonic and Balto-Finnic languages. This effort to preserve endangered indigenous languages and traditional knowledge is a very urgent task, in particular in the Russian Federation. 

The results of this project will contribute essentially to the study of ethnolinguistics and to the description of some endangered languages and dialects. Several of these languages are now well documented and in case the last speaker of a language will be gone, these databases will be very valuable for the study of the cultural heritage of his ethnic group. The work is also important for new initiatives of UNESCO and other organisations in the field of endangered languages and cultures.

 

KEY REFERENCES 

  1. Yazyk I Narod (Language and ethnos: Texts and comments) Edited by A. Gerd, T. de Graaf, M. Savijärvi. Saint-Petersburg, 2002. 206 p. In this monograph all Contractors participated, presenting collected Balto-Finnic and north-western archaic Russian dialectal text samples, supplied with description of chosen settlements, dialectal characteristics.
  2. Yazyk I Narod (Language and ethnos: Social-linguistic situation in Northwest Russia) Edited by A. Gerd, T. de Graaf, M. Savijärvi, Saint-Petersburg, 2003. 136 p. - Published in Russian. - In this monograph all Contractors participated, presenting scholarly survey of linguistic, and toponymic linguistic processes in the area of Balto-Finns and Russians contacts in the Northwest Russia.
  3. Kokko O., Savijärvi  M. & I. Ennevvanhaa. Pohjois- ja Keski-Inkerin kieltä ja kohtaloita (Speech samples of Northern and Middle Ingria) – Studia Carelica Humanistica 18. Joensuu, 2003. 175 p.
  4. Gerd A., Azarova I., Nikolaev I.Construction of the full-text database of languages and dialects of Northwest Russia / Proceeding of International Conference on Corpora Linguistics. Saint-Petersburg. March, 5-7, 2002. Saint-Petersburg, 2002. - Published in Russian.
  5. De Graaf T.The Use of Archives and Fieldwork for the Study of the endangered Languages of Russia / International Conference on Language Resources. Las Palmas, 26-31 May, 2002.
  6. De Graaf T.Voices of Tundra and Taiga: Endangered Languages in Russia on the Internet//Conference Handbook on Endangered Languages, Kyoto, November 2002. P. 57 – 79.
  7. Karlova O. The anthroponymic base of the geographic names in the village of Voknavolok / Rural Russia: the past and the future. Proceedings of the 8th Russian scientific and practical conference in Orel. December, 2001. Issue 2. M., 2001. – Published in Russian.
  8. Kokko O. The Variation of local case of present-day Ingrian-Finnish. Congressus nonus internationalis fenno-ugristarum 7-13.8.2000. Pars II. Tartu, 2000. P. 120.
  9. Mullonen I."Upper reaches" hydronyms of the Onega / Linguistic & cultural problems of tolerance. Proceedings of the First International conference in Ekaterinburg. October, 24-26. Ekaterinburg, 2001. P. 366-368. – Published in Russian.
  10. Nikolaev I.The Construction of a Full-Text Database on Balto-Finnic Languages and Russian Dialects in Northwest-Russia / Linguistic Perspectives on Endangered Languages A SKY Symposium. Helsinki. August 29 - September 1, 2001. Helsinki, 2001.

 

2 . MANAGEMENT

2.1. Meetings and visits

Mr A. Gerd (SPB) — 2000 April — Joensuu (Finland) — discussion of co-operation between SRb and JOE team during the Project.

Mr T. de Graaf (GRO) — Mrs I. Mullonen (PSU) — Mr A. Gerd, Mrs I. Azarova, Mr I. Nikolaev (SPB) — 2000, June — Saint-Petersburg — a meeting in on work programme discussion, financial problems, detailed programme of work for the second half of 2000.

Mr A. Gerd (SPB) — 2000, July — Petrozavodsk — discussion of co-ordination of works during the Project and Balto-Finnic languages representation problems.

Mr T. de Graaf (GRO) — 2000, October — Saint-Petersburg — discussion of the intermediate work results.

Mr A. Gerd (SPB) — 2000, November — Joensuu (Finland) —discussion of the representation and amount of Balto-Finnic texts which should be presented at the end of first 12 months; special consideration of a possible joint publication (book) on the problems of social linguistic research in Joensuu and St-Petersburg; Mr A. Gerd looked through the collection of Ingermandland-Finnish records at the Joensuu university.

Mrs I. Mullonen (PSU) — 2001, January — St-Petersburg — special consideration of computer usage in the process of text preparation.

Mr I.Nikolaev (SPB)— 2001, January  — University of Joensuu, Finland —consultations with  Finnish experts on the problems connected with the execution of the Project.

Mr A. Gerd (SPB) — 2001, January  — Groningen — discussion of the problems connected with the Project — Amsterdam & Leiden  — search in libraries of recent literature on language contacts and social linguistics.

Mr T. de Graaf (GRO)— 2001, February — St-Petersburg — consultations with the co-ordinator of the Project the problems connected with the execution of the Project.

Mr A. Gerd (SPB) – Mrs M. Savijärvi, Mr I. Savijärvi (JOE) – May, 2001 – Joensuu (Finland) – discussion of results of the first year work on the Project with the JOE team.

Mrs I. Mullonen (PSU) – Mr A. Gerd, Mrs I. Azarova, Mr I. Nikolaev (SPB) – June, 2001 – Saint-Petersburg (Russia) – discussion of the working programme for the second year of the Project.

Mr T. de Graaf (GRO) – Mrs I. Mullonen (PSU) – Mr A. Gerd, Mrs I. Azarova (SPB) – June, 2001 – Petrozavodsk (Russia) – a meeting on the detailed working programme for the second year of the Project.

Mr I. Nikolaev (SPB) – August, 2001 – Helsinki (Finland) – Linguistic Perspectives on Endangered Languages A SKY Symposium.

Mrs I. Mullonen (PSU) – Mr A. Gerd, Mrs I. Azarova, Mr I. Nikolaev (SPB) – October, 2001, – Saint-Petersburg (Russia) – discussion of the guidelines for the final version of the coding system of the Balto-Finnic text representation.

Mr A. Gerd (SPB) – Mrs M. Savijärvi, Mr I. Savijärvi (JOE) – October, 2001 – Joensuu (Finland) – discussion of the guidelines for the final coding system of the Balto-Finnic text representation.

Mr A. Gerd (SPB) – Mrs M. Savijärvi, Mr I. Savijärvi (JOE) – January, 2002 – Joensuu (Finland) – a meeting for the first proof-reading session of Ingermanland-Finnish texts for the monograph.

Mrs I. Mullonen (PSU) – Mr A. Gerd (SPB) – January, 2002 – Saint-Petersburg (Russia) – a meeting for the first proof-reading session of Vepsian and Karelian texts for the monograph. Mrs I. Mullonen worked for 5 days at the Russian State Archive (Saint-Petersburg), collecting ancient toponymy of the Onega region according to the hand-written maps of the XIXth century.

Mr I. Nikolaev (SPB)– January, 2002 – participated in the Winter school on Corpora Linguistics in Leiden (Netherlands).

Mr T. de Graaf (GRO) – Mr A. Gerd, Mrs I. Azarova, Mr I. Nikolaev (SPB) – April, 2002 – Saint-Petersburg (Russia) – discussion of the results received according the working programme for the second year of the Project.

Mr I. Nikolaev (SPB) – April, 2002 – Petrozavodsk (Russia) – Bubrikh Readings.

Mr A. Gerd (SPB) – Mrs M. Savijärvi, Mr I. Savijärvi (JOE) – April, 2002 – Joensuu (Finland) – a meeting for the final proof-reading session of Ingermanland-Finnish texts for the monograph.

Mr A. Gerd (SPB)– 2002, June– Greifswald, Rostok, Berlin (Germany)– seminars & lectures on sociolinguistics.

Mr. T. de Graaf (GRO)– Mrs M. Savijärvi, Mr I. Savijärvi (JOE)– Mr I. Nikolaev (SPB) – 2002, August – Joensuu (Finland) – conference visit and consultations with the Finnish project partners.

Mr A. Gerd (SPB)– 2002, October– Estonia– interviews with Ingrian-Finns.

Mr A. Gerd (SPB) – Mrs M. Savijärvi, Mr I. Savijärvi (JOE) –2002, November – Joensuu (Finland) – consultations on the progress of the project.

Mrs I. Mullonen (PSU) – 2002, November – Moscow – Vneshtorgbank.

Mr. T. de Graaf (GRO)– 2002, November – Kyoto – report on the project activities during the IVth International Conference on Endangered Language.

Mr. T. de Graaf (GRO)– Mr A. Gerd, Mrs I. Azarova, Mr I. Nikolaev (SPB) – 2003, January – St-Petersburg – consultations on the progress of the project.

Mrs I. Mullonen (PSU) – 2003, January – Moscow – Vneshtorgbank.

Mr A. Gerd (SPB) – Mrs M. Savijärvi, Mr I. Savijärvi (JOE) – March, 2003 – Joensuu (Finland) – consultations concerning the final report of the project.

Mr. T. de Graaf (GRO) – 2003, March – Paris – report on Some Endangered Languages in Europe and Siberia at the International Expert Meeting on the UNESCO Program Safeguarding of Endangered Languages.

Mrs O. Karlova (PSU) – 2003, March – Joshkar-Ola – PhD committee meeting.

Mr. T. de Graaf – 2003, 1 April – 6 July – work as guest professor at St-Petersburg University, evaluation of the project results and planning of the follow-up.

 

 
Visits
 
 
Number of scientists
 
 
Number of person days
 
 
West ==> East
 
 
7
 
 
78
 
 
East ==> West
 
 
13
 
 
30
 
 
West ==> West
 
 
3
 
 
8
 
 
East ==> East
 
 
12
 
 
19
 

 

2.2. Collaboration

Co-operation among Contractors was high, during the Project period it became more close and productive. All problems were discussed in tight co-operation.

 
Intensity of Collaboration
 
 
high
 
 
rather high
 
 
rather low
 
 
low
 
 
West <=> East< /FONT>
 
 
+
 
 
 
 
 
 
 
 
 
 
 
West <=> West< /FONT>
 
 
 
 
 
+
 
 
 
 
 
 
East <=> East< /FONT>
 
 
+
 
 
 
 
 
 
 
 
 
 

 

2.3. Time Schedule

In the research programme we oriented to the planned time schedule of the work program, it helped us to organise the work. Assessing matching of actual and proposed schemes it should be noted that the research followed basically the time table, though some tasks were fulfilled in shorter time and some requires more time (cf. the time shcedule in section 1.1 with the one give in the work program).

For each of the Balto-Finnic languages (Vepsian, Ingrian, Karelian, Ingermandland-Finnish) and archaic northwestern Russian dialects the following tasks were planned.

  • T1 (Seeking archive materials…)was fulfilled in shorter time (months 1-9) by SPB and PSU teams, but took more time (months 1-18) for JOE team.
  • T2 (Field expeditions…) has required additional time for PSU team (months 3-4, 15-16, 27-28).
  • T3 (Interpretation and transcription…) appeared to be very laborious, but the planned period was sufficient (months 8-24).
  • T4 (Linguistic analysis…) appeared to be complex, having subtasks of itself, so it lasted longer (months 13-36).
  • T5 (Differentiating words, morphemes…) appeared to be more complex including experiments (months 10-28).
  • T6 (Choosing the notational system…) was very short-term, and fulfilled in planned period (months 13-16).
  • T7 (Elaborating computer formats…) was rather short-term, and fulfilled in roughly planned period(months 19-27).
  • T8 (Developing fonts…) was fulfilled in a short period and earlier(months 18-22).
  • T9 (Text proof-reading…publications) divided into two subtasks the first lasted to the end of the second year and the second up to the end of the project(months 19-24, 30-36).
  • T10 (Developing full-text database resources to retrieve information…) received a very short time in the plan(months 22-36).
  • T11 (…proof-reading of the toponymic data) was very short-term, and fulfilled in planned period(months 25-33).
  • T12 (Developing a computer dictionary database, final report and publication) was fulfilled in the planned period (months 31-36).

 

2.4. Problems encountered

We encountered with a minor problem – the transfer of funds, which took place with a delay, and we met some problems with the payment of our expenditures and the organisation of field expeditions.

 
Problems encountered
 
 
major
 
 
minor
 
 
None
 
 
not applicable
 
 
Co-operation of team
 
Members
 
 
 
 
 
 
 
 
+
 
 
 
 
 
Transfer of funds
 
 
 
 
 
+
 
 
 
 
 
 
 
 
Telecommunication
 
 
 
 
 
 
 
 
+
 
 
 
 
 
Transfer of goods
 
 
 
 
 
 
 
 
 
 
 
+
 
 
Other

2.5. Actions required

We donot see any action INTAS should take in solving our problems.

  

2.6 Manpower invested

A rough estimate of the efforts required for the completion of the project is 10 person-years. It should be stressed that without the INTAS grant it would  have been impossible to achieve these results, so most of the results are due to INTAS funding.

 

 

3. FINANCES (in EURO)                         

For each Contractor the spending of the grant from INTAS can be specified as follows, taking into account the various cost categories.

 

 
#
 
 
Name of Contractor
 
 
Cost Categories
 
 
Total
 
 
 
 
 
 
 
 
Labour Cost
 
 
Over­heads
 
 
Travel & Subsistence
 
 
Consu­mables
 
 
Equip­ment
 
 
Other Costs
 
 
Total Euro
 
 
1
 
 
GRO
 
 
1000
 
 
1050
 
 
4515
 
 
670
 
 
 
 
 
1805
 
 
 9040
 
 
2
 
 
Joe
 
 
 
 
 
71
 
 
605
 
 
517
 
 
 
 
 
1800
 
 
2993
 
 
3
 
 
SPB
 
 
21600
 
 
380
 
 
7268
 
 
306
 
 
1130
 
 
1106
 
 
31790
 
 
4
 
 
PSU
 
 
10800
 
 
665
 
 
2234
 
 
975
 
 
1536
 
 
 
 
 
16210
 
 
Subtotal
 
 
(Euro)
 
 
33400
 
 
2166
 
 
14622
 
 
2468
 
 
2666
 
 
4711
 
 
60033
 

In general the spending has been in accordance with the one foreseen in the Work Programme.

LABOUR COSTS were salaries paid to the NIS participants and an assistentship to a young scholar in Groningen who assisted the project co-ordinator

Consumableswere spent on writing materials, diskettes, cassette tapes, printer cartridges

Other costs were spent on typographic costs of the monograph with collected texts, printing of the monograph with a collection of papers, and a monograph published in Joensuu.

EQUIPMENT for the PSU team included a computer (Pentium-III) with a printer, and a recorder for field expeditions.

EQUIPMENT for the SPB team included a computer (Pentium-III) with a printer (HP 2200).

The INTAS grant has totally been spent and we hope to receive the final instalment of the grant after the approval of the final report (6000 euro) which will then be allocated as follows to the NIS participants:

SALARIES LAST SIX MONTHS                    4800 euro

OVERHEADS AND CONSUMABLES           1200 euro

 

3.2 Other funding

There was no other funding for the project, except the salaries paid to the participants by their univerisities and part of the overhead costs (for heating working facilities, computer use etc.)

 

4. ROLE AND IMPACT OF INTAS

This grant was very important for the starting and carrying out of the project. The project would rather not have been started and carried out without funding by INTAS. 

We should like to state that the achievements of the project are very important for the creation of new international contacts, additional funds for all research groups involved and helping the NIS scientists. Moreover, we got additional prestige, which resulted in various invitations to important conferences and the involvement in UNESCO initiatives in the field of endangered languages.

The follow-up of the project is very important as it might lead to the stabilisation of ethnic relations in the areas concerned and will contribute essentially to the study of ethnolinguistics. We really hope to continue this co-operation in the future with a similar group of contractors. In the international UNESCO expert group of Endangered Languages (where we are represented) initiatives will be taken to set up and support similar project activities in the future.

 

5. RECOMMENDATIONS TO INTAS

In general the INTAS support provides an excellent ways for the stimulation of joint projects between NIS and INTAS research groups and we think that the organisation should go on in the same way. The following suggestions can be made for a change in the INTAS policy:

  • try to stimulate more joint projects in the field of humanities;
  • consider the possibility to realize larger individual grants for labour in parts of the NIS countries, where the costs of living have increased considerably in the last years.