CE-DOHS presents texts written in Portuguese by authors of different ethnicities, born in theBrazil, between 1586 and 1986. Additionally, a small collection of born in Portugal, between 1450 and 1850.
Currently there are almost 50 collections, which add up to over 1 million words (the goal is to reach 5 million words), available on the world wide web, with different forms of access.
The entire text base is the result of research on prospection and editing of documentary sources - found in dozens of public and private national and international archives - as well as speech recordings on since 1993 by researchers and scholarship students from Graduate and Scientific Initiation Department of the Department of Letters and Arts (DLA) of the University State Feira de Santana (UEFS), and also provided by other projects, through partnership with other Higher Education institutions, in some cases through formal cooperation agreement.
The CE-DOHS logo is the unique, exclusive and standardized graphic form to convey the basic Corpus Eletronic platform identification signal for the history of Brazilian Portuguese. It should perform the following functions: immediately and uniquely and strikingly identify CE-DOHS visual messages; unify and integrate CE-DOHS visual messages, consolidating your visual identity.
The Logo was created, at the request of the Coordination, by the artist. Juraci Dória, in 2010, having as inspiration the Bahian hinterlands, the project's coverage area.
It should be noted that regardless of how it is used, the CE-DOHS logo must always be displayed in full; Therefore, stylizations, additions or deletions, variations of any nature in their forms, are prohibited.
Use the following formatting:
SOBRENOME DO EDITOR, nome (conferir metadados). Título. In: Carneiro, Zenaide de Oliveira Novais; Lacerda, Mariana Fagundes de Oliveira (Org.).Projeto CE-DOHS: Plataforma Corpus Eletrônico para a história do Português Brasileiro (Fapesb/CNPq). URL: http:// www.uefs.br/cedohs. Acesso em: dia, mês e ano.
Edital Referência. Projeto: CE-DOHS: Corpus Eletrônico de Documentos Históricos do Sertão . (5566/2010/Consepe: 202/2010).
Coordenação: Zenaide de Oliveira Novais Carneiro (UEFS/Fapesb/CNPq) e Mariana Fagundes de Oliveira Lacerda (UEFS/Fapesb)
Projeto de Pesquisa para estágio de Pós-Doutorado: Proposta de um sistema de edições eletrônicas
e seus
aspectos tecnológicos: a construção de um Piloto de Corpus Eletrônico a ser implantado na UEFS
para o estudo
da língua portuguesa no semiárido baiano (séculos XVII-XXI) (FAPESB1658/2009).
A língua portuguesa no tempo e no espaço: contato linguístico, gramáticas em competição e mudança paramétrica (FAPESP 12/06078-9 - 1/10/ 2012 - 30/9/2017).
Parceria Unicamp-UEFSVozes do Sertão em Dados: história, povos e formação do português brasileiro
Período: julho de 2009/julho de 2011 Edital/Chamada: Edital MCT/CNPq 02/2009 - Ciências Humanas,
Sociais e Sociais Aplicadas Financiador: CNPq. Número do processo: 401433/2009-9/Consepe
102/2009.
Coordenação: Zenaide de Oliveira Novais Carneiro (1ª, 2ª, 3ª, 4ª e 5ª fases) e Mariana Fagundes de Oliveira Lacerda (2ª, 3ª, 4ª e 5ª fases).
Projeto Reconstrução da Língua Portuguesa no Interior da Bahia: aspectos sócio-históricos e linguísticos (FAPESB 001/2012/Consepe: 150/2012). (Edital Interno)
Coordenação: Mariana Fagundes de Oliveira Lacerda (UEFS/FAPESB).
A Língua Portuguesa no Semiárido baiano (Fases I, II, III e IV)
Parceria na fase III com o projeto Vertentes/UFBAContribuições para a Constituição de um Banco de Textos e de um Banco de
Dados para o Estudo da História do Português do Brasil, do Século XVII ao XX (Filiação ao
PHPB).
Vice-Coordenação: Zenaide de Oliveira Novais Carneiro e Norma Lucia Fernandes de Almeida
The CE-DOHS: Electronic Corpus of Sertão Historical Documents project is part of the Portuguese Language Studies Center (NELP), since 2012, of the Department of Letters and Arts (DLA) of Feira de Santana State University (UEFS) ).
NELP works with three agendas: formation of Portuguese-language text bank and socio-historical and linguistic study of Portuguese. CE-DOHS stands out by offering, through a technological partnership with the Corpus Histórico do Português Tycho Brahe project, from the State University of Campinas and under the coordination of Professor Charlotte Galves, an electronic bank of over one million of words, to study the history of Brazilian Portuguese, in partnership with the National Project for the History of Brazilian Portuguese (PHPB). This database constitution, according to Bacelar do Nascimento (2004, p. 1),
[...] essentially favors a descriptive linguistics, strongly supported by new technologies, and allows us to take as a starting point of the description the analysis of a significant amount of authentic data, similar to that is done in other scientific fields. The use of corpora allows the realization of empirically based linguistic descriptions and promotes the discussion of solidly grounded theoretical questions.
Created in 2012, with funding from the Bahia State Research Support Foundation (FAPESB), the CE-DOHS project, organized in two phases: phase 1, which covers documents from the 18th to the 20th century, and phase 2 , which covers documents from the 16th and 17th centuries, brings together the philological field and the computational field, promoting the editing, in XML language, of the texts traditionally edited, according to semi-diplomatic editing criteria, by the researchers of the project Vozes do Sertão em Data, created in 2009, and by CE-DOHS researchers, who have always been seeking to diversify the bank, with texts representative of the popular, especially, and cultured aspects of Brazilian Portuguese.
The first phase of the project aimed to compose a text bank from 1750 to 2000, representative of the historical period of Brazilian Portuguese characterized by localized multilingualism; It allows studying the history of Brazilian Portuguese cult, semi-cult and popular in this context. As a result of this phase, there are several papers published by the team (cf. Lattes of team participants). These are the phase 1 subprojects:
The second phase, which is now being implemented in 2019, goes further back in time to a time when multilingualism in Brazil was widespread (1500-1750) (MATTOS AND SILVA, 2004; LUCCHESI, 2017). It aims to study the gestation of Brazilian Portuguese. Faces this phase to rarity of sources: texts written by groups born in Brazil are rare, especially those from Indians and Blacks, ethnic groups who did not have access to school (the sources for the study of dominant linguistics are more generous); the project, however, has small collections significant period of time, soon available on the Platform.
The methodology used to control socio-historical aspects and the Theory of Variation Linguistics (WLH, 2006 [1968]; LABOV, 2008 [1972]; 1982; 1994; 2001a; 2001b; WLH, 1986), with application to texts written in the so-called Socio-Historical Historical Linguistics (MATTOS E SILVA, 2008). Consider themselves at causes that impact the process of change from the standpoint of linguistics Diachronic Chomskian Generative Grammar (CHOMSKY, 1986), as the contact between languages, both languages typologically similar as distinct, contact with nearby indigenous languages ??and also?miscellaneous genetically (ARYON, 1986; 1993) and with sub-Saharan African languages, mainly (7,000 languages, in between 1676-1700, mainly from the Niger Congo family (CASTRO, 2002); until 1780 Brazil received more than one million and two hundred thousand slaves , in the process of acquiring Portuguese as L2, and its transmission to the descendants as L1 (BAXTER, 1985; LUCCHESI & BAXTER, 2009; LUCCHESI, 2009).(CASTRO, 2002).
For editing in XML or electronic language, eDictor, developed by Paixão de Sousa, Kepler and Faria (2010); this is a text editor especially focused on work philological analysis and automatic linguistic analysis.
According to Shepherd et al. (2012, p. 11),
The idea of ??collecting natural text collections for the purpose to submit them to linguistic analysis goes back to the work of the American structuralists of Harris (1951) and Fries (1952). With Brown Corpus (Francis and Kucera, 1954), the first electronic corpus compiled for this purpose would emerge. Although until today this corpus is widely used, at the time there were virtually no texts written in computers were huge and expensive machines that occupied whole rooms, and the computer programs took hours and even days to run.
The CE-DOHS bank has added to the electronic corpora constituted fundamentally for analysis Linguistic It is a valuable job this database formation on the platforms for language studies in general, especially as regards CE-DOHS - considering the socio-historical questions that underpinned its constitution - for studies of the formation of Brazilian Portuguese in the field of historical linguistics.
The editions that make up the electronic corpora have philological rigor, captured by entirely in editing in XML language through the use of the eDictor tool (PAIXÃO DE SOUSA, 2004; TRIPPEL E PAIXÃO DE SOUSA, 2006; PAIX?O DE SOUSA, 2007; PAIXÃO SOUSA E KEPLER, 2007; SOUSA, KLEPER, PAIXÃO DE SOUZA E FARIA,2010). The tool offers facsimile, semi-diplomatic, modernized and technical versions.(for parser input), as well as derivative products such as: the edit lexicon.
CE-DOHS, in its annotated version, allows automatic and reliable language searches using in cutting-edge project technologies such as Brahe Parsed Corpus of Historical Penn-Helsinki Parsed Corpora. Access can be done through automatic searches such as corpus search.
Philological editions of documents dating from 1500, written by people born in?many different Brazilian cities; controlled for their origin, reliability, production context, location and date of writing, for whom they were written, and why.
From their writers are presented biographical sheets with information on place of birth / nationality, schooling, type of language acquisition, place and date of birth, affiliation and profession. It also controls the ethnicity: Portuguese of different origins social, Indians, Mamluk, Africans, Mestizos and Browns.
This information can be captured in fact sheets and summary tables as well as automatic, accessing the metadata through the E-corp tool. The bench allows the assembly of corpora in the interests of the researcher.
The agenda of traditional philological editions develops under the Project for History of Brazilian Portuguese (PHPB), in a joint work that encompasses several universities Brazilian women. National PHPB Corpus. CE-DOHS operates specifically on the Corpora Bahia platform (PHPB-BA), coordinated by teachers Zenaide de Oliveira Novais Carneiro and Mariana Fagundes de Oliveira Lacerda
The Platform invests in computational tools. Create tools like E-corp and collaborate at implementation of tools such as eDictor. Through partnership, tools are used to projects like Tycho Brahe .