Database of Balto-Finnic languages
INTAS III
INTERNATIONAL ASSOCIATION FOR THE PROMOTION OF CO-OPERATION WITH SCIENTISTS FROM THE INDEPENDENT STATES OF THE FORMER SOVIET UNION
THE CONSTRUCTION OF A FULL-TEXT DATABASE ON BALTO-FINNIC LANGUAGES AND RUSSIAN DIALECTS IN NORTHWEST RUSSIA
Saint-Petersburg University Petrozavodsk University
Groningen University Joensuu University
INTAS Project 99-01164 Final report 2003
Contents:
Titlepage with contents of the report and page numbers
Summary
1. Research
2. Management
3. Finances
4. Role and Impact of INTAS
5. Recommendations to INTAS
6. Annexes
Annex 1: Report SPB
Annex 2: Report PSU
Annex 3: Report JOE
Annex 4: Report GRO
Annex 5: Precedent tagged corpus
Annex 6: Samples of toponymic dictionary
Annex 7- 9: The three published books
Annex 10: Frisian video film on Ingrain Fins with contents
Co-ordinator: Dr. Tjeerd de Graaf
Groningen University Department of Linguistics
PO Box 716
NL - 9700 AS Groningen
telephone: +31-50-3635982
fax: +31-50-3636855
e-mail: tdegraaf(a)fryske-akademy.nl
Summary of the original work plan
The project has been prepared in a co-operation between St.Petersburg University (Russia), Groningen University (The Netherlands), Petrozavodsk University (Karelia, Russia), and Joensuu University (Finland).
The main objectives of the project have been:
- to reveal and analyze various types of written sources for disappearing Balto-Finnic languages and archaic Russian dialects in Russia's Northwest area;
- to seek and record unparalleled modern text samples for these languages;
- to construct dictionaries and full-text databases on Balto-Finnic languages and archaic Northwestern Russian dialects;
- to compile both conventional paper and electronic dictionaries.
At the end of the first year (12 months) the main stock of material on Balto-Finnic and northwestern archaic Russian dialectal text samples has been collected and partly systematized and analyzed.
At the end of the second year (24 months) material on Balto-Finnic and northwestern archaic Russian dialects is to be systematized and formats for computer dictionary & toponymic databases are prepared for implementation.
At the end of the third year (36 months) databases on Balto-Finnic and northwestern archaic Russian dialects are made ready to be used in pilot studies concerning the search and retrieval and verification of data and the database organization.
Results
The goals mentioned in the summary of the work plan have been achieved. Some of the most important results are as follows.
Various types of written sources for Balto-Finnic languages and archaic Russian dialects in selected places of Russia's North-western area have been analysed. During special field expeditions to these areas relevant speech samples were recorded. About 23 hours of recordings representing the speech of 25 informants were prepared for the project and coherent fragments of texts recordings were transcribed. The total text amounts to more than 400 000 signs or 10 worksheets, which equals to 270 standard book pages. The material has been supplied with comments and editorial notes, and two monographs «Yazyk i Narod» (Language and ethnos: Texts and comments) have been published by the Saint-Petersburg University Publishing House and one monograph «Speech samples of Northern and Middle Ingria» by Joensuu University.
In Saint-Petersburg, Joensuu and Petrozavodsk large archives of dialectal speech records have been examined and analysed for the occurrence of contacts between various Balto-Finnic and archaic north-western Russian dialects. The recordings that were chosen for the full-text database on Balto-Finnic languages and Russian dialects in Northwest Russia represent various types of oral communication. A unified transcription system has been implemented by means of a special computer font intended to unite coding tables for both phonemic representation for Vepsian, Ingrian, and Karelian, as well as for the phonetic one for Ingermanland-Finnish. This encoding is also applied to the phonemic non-Cyrillic representation of Russian dialectal texts. A special font has been developed, which is used for a unified representation of texts in the database as well as in the publications in Yazyk I Narod (Language and Ethnos: Texts and comments).
The contractors proceeded with the linguistic analysis of dialectal Balto-Finnic and archaic Russian texts. Part of the texts were presented with grammatical tags. Several formal procedures for an automatic morpheme segmentation of dialect texts in Balto-Finnic languages (Vepsian, Karelian, Izhorian, Ingermanland-Finnish) written in phonemic transcription were developed. The results of segmentation were compared with the precedent corpora, so the procedures may be used on the wider scale.
During the process of linguistic analysis and verification of archival text recordings, systematization of toponymic vocabulary several valuable observations were published in the monograph Yazyk I Narod (Language and Ethnos: Social-linguistic situation in Northwest Russia) published by the Saint-Petersburg University Publishing House. This book presents a complex picture of the bilingual situation in the area of tight contacts between Russians with Balto-Finns as well as interference and code switching in the dialectal speech of inhabitants of northwest Russia.
Impact of the Results
The database obtained will be used for scientific purposes (the study of language variety in Russia and language contact) and for the development of methods for language teaching in a bicultural environment. It provides the best opportunity for investigating ethnic and cultural processes in the contact zone of ancient Slavonic and Balto-Finnic languages. This effort to preserve endangered indigenous languages and traditional knowledge is a very urgent task, in particular in the Russian Federation.
The results of this project will contribute essentially to the study of ethnolinguistics and to the description of some endangered languages and dialects. Several of these languages are now well documented and in case the last speaker of a language will be gone, these databases will be very valuable for the study of the cultural heritage of his ethnic group. The work is also important for new initiatives of UNESCO and other organisations in the field of endangered languages and cultures.
KEY REFERENCES
- Yazyk I Narod (Language and ethnos: Texts and comments) Edited by A. Gerd, T. de Graaf, M. Savijärvi. Saint-Petersburg, 2002. 206 p. In this monograph all Contractors participated, presenting collected Balto-Finnic and north-western archaic Russian dialectal text samples, supplied with description of chosen settlements, dialectal characteristics.
- Yazyk I Narod (Language and ethnos: Social-linguistic situation in Northwest Russia) Edited by A. Gerd, T. de Graaf, M. Savijärvi, Saint-Petersburg, 2003. 136 p. - Published in Russian. - In this monograph all Contractors participated, presenting scholarly survey of linguistic, and toponymic linguistic processes in the area of Balto-Finns and Russians contacts in the Northwest Russia.
- Kokko O., Savijärvi M. & I. Ennevvanhaa. Pohjois- ja Keski-Inkerin kieltä ja kohtaloita (Speech samples of Northern and Middle Ingria) Studia Carelica Humanistica 18. Joensuu, 2003. 175 p.
- Gerd A., Azarova I., Nikolaev I.Construction of the full-text database of languages and dialects of Northwest Russia / Proceeding of International Conference on Corpora Linguistics. Saint-Petersburg. March, 5-7, 2002. Saint-Petersburg, 2002. - Published in Russian.
- De Graaf T.The Use of Archives and Fieldwork for the Study of the endangered Languages of Russia / International Conference on Language Resources. Las Palmas, 26-31 May, 2002.
- De Graaf T.Voices of Tundra and Taiga: Endangered Languages in Russia on the Internet//Conference Handbook on Endangered Languages, Kyoto, November 2002. P. 57 79.
- Karlova O. The anthroponymic base of the geographic names in the village of Voknavolok / Rural Russia: the past and the future. Proceedings of the 8th Russian scientific and practical conference in Orel. December, 2001. Issue 2. M., 2001. Published in Russian.
- Kokko O. The Variation of local case of present-day Ingrian-Finnish. Congressus nonus internationalis fenno-ugristarum 7-13.8.2000. Pars II. Tartu, 2000. P. 120.
- Mullonen I."Upper reaches" hydronyms of the Onega / Linguistic & cultural problems of tolerance. Proceedings of the First International conference in Ekaterinburg. October, 24-26. Ekaterinburg, 2001. P. 366-368. Published in Russian.
- Nikolaev I.The Construction of a Full-Text Database on Balto-Finnic Languages and Russian Dialects in Northwest-Russia / Linguistic Perspectives on Endangered Languages A SKY Symposium. Helsinki. August 29 - September 1, 2001. Helsinki, 2001.
2 . MANAGEMENT
2.1. Meetings and visits
Mr A. Gerd (SPB) 2000 April Joensuu (Finland) discussion of co-operation between SRb and JOE team during the Project.
Mr T. de Graaf (GRO) Mrs I. Mullonen (PSU) Mr A. Gerd, Mrs I. Azarova, Mr I. Nikolaev (SPB) 2000, June Saint-Petersburg a meeting in on work programme discussion, financial problems, detailed programme of work for the second half of 2000.
Mr A. Gerd (SPB) 2000, July Petrozavodsk discussion of co-ordination of works during the Project and Balto-Finnic languages representation problems.
Mr T. de Graaf (GRO) 2000, October Saint-Petersburg discussion of the intermediate work results.
Mr A. Gerd (SPB) 2000, November Joensuu (Finland) discussion of the representation and amount of Balto-Finnic texts which should be presented at the end of first 12 months; special consideration of a possible joint publication (book) on the problems of social linguistic research in Joensuu and St-Petersburg; Mr A. Gerd looked through the collection of Ingermandland-Finnish records at the Joensuu university.
Mrs I. Mullonen (PSU) 2001, January St-Petersburg special consideration of computer usage in the process of text preparation.
Mr I.Nikolaev (SPB) 2001, January University of Joensuu, Finland consultations with Finnish experts on the problems connected with the execution of the Project.
Mr A. Gerd (SPB) 2001, January Groningen discussion of the problems connected with the Project Amsterdam & Leiden search in libraries of recent literature on language contacts and social linguistics.
Mr T. de Graaf (GRO) 2001, February St-Petersburg consultations with the co-ordinator of the Project the problems connected with the execution of the Project.
Mr A. Gerd (SPB) Mrs M. Savijärvi, Mr I. Savijärvi (JOE) May, 2001 Joensuu (Finland) discussion of results of the first year work on the Project with the JOE team.
Mrs I. Mullonen (PSU) Mr A. Gerd, Mrs I. Azarova, Mr I. Nikolaev (SPB) June, 2001 Saint-Petersburg (Russia) discussion of the working programme for the second year of the Project.
Mr T. de Graaf (GRO) Mrs I. Mullonen (PSU) Mr A. Gerd, Mrs I. Azarova (SPB) June, 2001 Petrozavodsk (Russia) a meeting on the detailed working programme for the second year of the Project.
Mr I. Nikolaev (SPB) August, 2001 Helsinki (Finland) Linguistic Perspectives on Endangered Languages A SKY Symposium.
Mrs I. Mullonen (PSU) Mr A. Gerd, Mrs I. Azarova, Mr I. Nikolaev (SPB) October, 2001, Saint-Petersburg (Russia) discussion of the guidelines for the final version of the coding system of the Balto-Finnic text representation.
Mr A. Gerd (SPB) Mrs M. Savijärvi, Mr I. Savijärvi (JOE) October, 2001 Joensuu (Finland) discussion of the guidelines for the final coding system of the Balto-Finnic text representation.
Mr A. Gerd (SPB) Mrs M. Savijärvi, Mr I. Savijärvi (JOE) January, 2002 Joensuu (Finland) a meeting for the first proof-reading session of Ingermanland-Finnish texts for the monograph.
Mrs I. Mullonen (PSU) Mr A. Gerd (SPB) January, 2002 Saint-Petersburg (Russia) a meeting for the first proof-reading session of Vepsian and Karelian texts for the monograph. Mrs I. Mullonen worked for 5 days at the Russian State Archive (Saint-Petersburg), collecting ancient toponymy of the Onega region according to the hand-written maps of the XIXth century.
Mr I. Nikolaev (SPB) January, 2002 participated in the Winter school on Corpora Linguistics in Leiden (Netherlands).
Mr T. de Graaf (GRO) Mr A. Gerd, Mrs I. Azarova, Mr I. Nikolaev (SPB) April, 2002 Saint-Petersburg (Russia) discussion of the results received according the working programme for the second year of the Project.
Mr I. Nikolaev (SPB) April, 2002 Petrozavodsk (Russia) Bubrikh Readings.
Mr A. Gerd (SPB) Mrs M. Savijärvi, Mr I. Savijärvi (JOE) April, 2002 Joensuu (Finland) a meeting for the final proof-reading session of Ingermanland-Finnish texts for the monograph.
Mr A. Gerd (SPB) 2002, June Greifswald, Rostok, Berlin (Germany) seminars & lectures on sociolinguistics.
Mr. T. de Graaf (GRO) Mrs M. Savijärvi, Mr I. Savijärvi (JOE) Mr I. Nikolaev (SPB) 2002, August Joensuu (Finland) conference visit and consultations with the Finnish project partners.
Mr A. Gerd (SPB) 2002, October Estonia interviews with Ingrian-Finns.
Mr A. Gerd (SPB) Mrs M. Savijärvi, Mr I. Savijärvi (JOE) 2002, November Joensuu (Finland) consultations on the progress of the project.
Mrs I. Mullonen (PSU) 2002, November Moscow Vneshtorgbank.
Mr. T. de Graaf (GRO) 2002, November Kyoto report on the project activities during the IVth International Conference on Endangered Language.
Mr. T. de Graaf (GRO) Mr A. Gerd, Mrs I. Azarova, Mr I. Nikolaev (SPB) 2003, January St-Petersburg consultations on the progress of the project.
Mrs I. Mullonen (PSU) 2003, January Moscow Vneshtorgbank.
Mr A. Gerd (SPB) Mrs M. Savijärvi, Mr I. Savijärvi (JOE) March, 2003 Joensuu (Finland) consultations concerning the final report of the project.
Mr. T. de Graaf (GRO) 2003, March Paris report on Some Endangered Languages in Europe and Siberia at the International Expert Meeting on the UNESCO Program Safeguarding of Endangered Languages.
Mrs O. Karlova (PSU) 2003, March Joshkar-Ola PhD committee meeting.
Mr. T. de Graaf 2003, 1 April 6 July work as guest professor at St-Petersburg University, evaluation of the project results and planning of the follow-up.
Visits | Number of scientists | Number of person days |
West ==> East | 7 | 78 |
East ==> West | 13 | 30 |
West ==> West | 3 | 8 |
East ==> East | 12 | 19 |
2.2. Collaboration
Co-operation among Contractors was high, during the Project period it became more close and productive. All problems were discussed in tight co-operation.
Intensity of Collaboration | high | rather high | rather low | low |
West <=> East< /FONT> | + | | | |
West <=> West< /FONT> | | + | | |
East <=> East< /FONT> | + | | | |
2.3. Time Schedule
In the research programme we oriented to the planned time schedule of the work program, it helped us to organise the work. Assessing matching of actual and proposed schemes it should be noted that the research followed basically the time table, though some tasks were fulfilled in shorter time and some requires more time (cf. the time shcedule in section 1.1 with the one give in the work program).
For each of the Balto-Finnic languages (Vepsian, Ingrian, Karelian, Ingermandland-Finnish) and archaic northwestern Russian dialects the following tasks were planned.
- T1 (Seeking archive materials )was fulfilled in shorter time (months 1-9) by SPB and PSU teams, but took more time (months 1-18) for JOE team.
- T2 (Field expeditions ) has required additional time for PSU team (months 3-4, 15-16, 27-28).
- T3 (Interpretation and transcription ) appeared to be very laborious, but the planned period was sufficient (months 8-24).
- T4 (Linguistic analysis ) appeared to be complex, having subtasks of itself, so it lasted longer (months 13-36).
- T5 (Differentiating words, morphemes ) appeared to be more complex including experiments (months 10-28).
- T6 (Choosing the notational system ) was very short-term, and fulfilled in planned period (months 13-16).
- T7 (Elaborating computer formats ) was rather short-term, and fulfilled in roughly planned period(months 19-27).
- T8 (Developing fonts ) was fulfilled in a short period and earlier(months 18-22).
- T9 (Text proof-reading publications) divided into two subtasks the first lasted to the end of the second year and the second up to the end of the project(months 19-24, 30-36).
- T10 (Developing full-text database resources to retrieve information ) received a very short time in the plan(months 22-36).
- T11 ( proof-reading of the toponymic data) was very short-term, and fulfilled in planned period(months 25-33).
- T12 (Developing a computer dictionary database, final report and publication) was fulfilled in the planned period (months 31-36).
2.4. Problems encountered
We encountered with a minor problem the transfer of funds, which took place with a delay, and we met some problems with the payment of our expenditures and the organisation of field expeditions.
Problems encountered | major | minor | None | not applicable |
Co-operation of team Members | | | + | |
Transfer of funds | | + | | |
Telecommunication | | | + | |
Transfer of goods | | | | + |
Other |
2.5. Actions required
We donot see any action INTAS should take in solving our problems.
2.6 Manpower invested
A rough estimate of the efforts required for the completion of the project is 10 person-years. It should be stressed that without the INTAS grant it would have been impossible to achieve these results, so most of the results are due to INTAS funding.
3. FINANCES (in EURO)
For each Contractor the spending of the grant from INTAS can be specified as follows, taking into account the various cost categories.
# | Name of Contractor | Cost Categories | Total | |||||
| | Labour Cost | Overheads | Travel & Subsistence | Consumables | Equipment | Other Costs | Total Euro |
1 | GRO | 1000 | 1050 | 4515 | 670 | | 1805 | 9040 |
2 | Joe | | 71 | 605 | 517 | | 1800 | 2993 |
3 | SPB | 21600 | 380 | 7268 | 306 | 1130 | 1106 | 31790 |
4 | PSU | 10800 | 665 | 2234 | 975 | 1536 | | 16210 |
Subtotal | (Euro) | 33400 | 2166 | 14622 | 2468 | 2666 | 4711 | 60033 |
In general the spending has been in accordance with the one foreseen in the Work Programme.
LABOUR COSTS were salaries paid to the NIS participants and an assistentship to a young scholar in Groningen who assisted the project co-ordinator
Consumableswere spent on writing materials, diskettes, cassette tapes, printer cartridges
Other costs were spent on typographic costs of the monograph with collected texts, printing of the monograph with a collection of papers, and a monograph published in Joensuu.
EQUIPMENT for the PSU team included a computer (Pentium-III) with a printer, and a recorder for field expeditions.
EQUIPMENT for the SPB team included a computer (Pentium-III) with a printer (HP 2200).
The INTAS grant has totally been spent and we hope to receive the final instalment of the grant after the approval of the final report (6000 euro) which will then be allocated as follows to the NIS participants:
SALARIES LAST SIX MONTHS 4800 euro
OVERHEADS AND CONSUMABLES 1200 euro
3.2 Other funding
There was no other funding for the project, except the salaries paid to the participants by their univerisities and part of the overhead costs (for heating working facilities, computer use etc.)
4. ROLE AND IMPACT OF INTAS
This grant was very important for the starting and carrying out of the project. The project would rather not have been started and carried out without funding by INTAS.
We should like to state that the achievements of the project are very important for the creation of new international contacts, additional funds for all research groups involved and helping the NIS scientists. Moreover, we got additional prestige, which resulted in various invitations to important conferences and the involvement in UNESCO initiatives in the field of endangered languages.
The follow-up of the project is very important as it might lead to the stabilisation of ethnic relations in the areas concerned and will contribute essentially to the study of ethnolinguistics. We really hope to continue this co-operation in the future with a similar group of contractors. In the international UNESCO expert group of Endangered Languages (where we are represented) initiatives will be taken to set up and support similar project activities in the future.
5. RECOMMENDATIONS TO INTAS
In general the INTAS support provides an excellent ways for the stimulation of joint projects between NIS and INTAS research groups and we think that the organisation should go on in the same way. The following suggestions can be made for a change in the INTAS policy:
- try to stimulate more joint projects in the field of humanities;
- consider the possibility to realize larger individual grants for labour in parts of the NIS countries, where the costs of living have increased considerably in the last years.