4.3. Corpora
A regionally representative corpus of TİD was sponsored by the Ministry of Family and Social Policies. The corpus consists of 104 hours of video recording, out of which approximately 20 hours have been annotated according to its description in 2015 [Socio-historical Background - 4.1.].
Boğaziçi University Sign Language Corpus is still under construction. This corpus was started in 2012. The first set of recordings was conducted between 2012 and 2015 under the umbrella of the TİDBİL Project (TÜBİTAK 111K314) and SignGramProject (COST Action IS1006). By 2015, the corpus had video recordings of TİD signers based on structured as well as semi-structured tasks, and some of these recordings were annotated/glossed using the annotation tool ELAN. There are 55 participants who contributed to the corpus and who are native or near-native signers, and most of them are from İstanbul. The corpus has had additional recordings and ELAN annotations since April 2016 with the support of the SIGN-HUB Project (Horizon 2020 693349). By the end of May 2020, there were approximately 23 hours of glossed/annotated video recordings.
Another language resource, TiDLaR, which consists of an online dictionary and a Turkish-TİD parallel corpus treebank was developed by a collaboration between linguists at Boğaziçi University and computer engineers at Istanbul Technical University. For a description of the online dictionary, see [Socio-historical Background - 4.2]. The parallel corpus comprises 420 annotated Turkish-TİD utterance pairs. These utterances are based on 306 sentences from 1st grade elementary school coursebooks published and distributed by the Ministry of Education. See Eryiğit et al. (2016) and Eryiğit et al. (2019) for a detailed description of the annotation schemes of the signs and the treebank. The development of this language resource was funded by TÜBİTAK (project no 114E263).