About


CE-DOHS presents texts written in Portuguese by authors of different ethnicities, born in theBrazil, between 1586 and 1986. Additionally, a small collection of born in Portugal, between 1450 and 1850.

Currently there are almost 50 collections, which add up to over 1 million words (the goal is to reach 5 million words), available on the world wide web, with different forms of access.

The entire text base is the result of research on prospection and editing of documentary sources - found in dozens of public and private national and international archives - as well as speech recordings on since 1993 by researchers and scholarship students from Graduate and Scientific Initiation Department of the Department of Letters and Arts (DLA) of the University State Feira de Santana (UEFS), and also provided by other projects, through partnership with other Higher Education institutions, in some cases through formal cooperation agreement.

Logo


The CE-DOHS logo is the unique, exclusive and standardized graphic form to convey the basic Corpus Eletronic platform identification signal for the history of Brazilian Portuguese. It should perform the following functions: immediately and uniquely and strikingly identify CE-DOHS visual messages; unify and integrate CE-DOHS visual messages, consolidating your visual identity.

The Logo was created, at the request of the Coordination, by the artist. Juraci Dória, in 2010, having as inspiration the Bahian hinterlands, the project's coverage area.

It should be noted that regardless of how it is used, the CE-DOHS logo must always be displayed in full; Therefore, stylizations, additions or deletions, variations of any nature in their forms, are prohibited.

Citation

Use the following formatting:

SOBRENOME DO EDITOR, nome (conferir metadados). Título. In: Carneiro, Zenaide de Oliveira Novais; Lacerda, Mariana Fagundes de Oliveira (Org.).Projeto CE-DOHS: Plataforma Corpus Eletrônico para a história do Português Brasileiro (Fapesb/CNPq). URL: http:// www.uefs.br/cedohs. Acesso em: dia, mês e ano.

Previous Projects

2010-2018

Edital Referência. Projeto: CE-DOHS: Corpus Eletrônico de Documentos Históricos do Sertão . (5566/2010/Consepe: 202/2010).


Coordenação: Zenaide de Oliveira Novais Carneiro (UEFS/Fapesb/CNPq) e Mariana Fagundes de Oliveira Lacerda (UEFS/Fapesb)


2009-2018

Projeto de Pesquisa para estágio de Pós-Doutorado: Proposta de um sistema de edições eletrônicas e seus aspectos tecnológicos: a construção de um Piloto de Corpus Eletrônico a ser implantado na UEFS para o estudo da língua portuguesa no semiárido baiano (séculos XVII-XXI) (FAPESB1658/2009).

Autor: Zenaide de Oliveira Novais Carneiro (UEFS/Fapesb/CNPq)
2012-2017

A língua portuguesa no tempo e no espaço: contato linguístico, gramáticas em competição e mudança paramétrica (FAPESP 12/06078-9 - 1/10/ 2012 - 30/9/2017).

Parceria Unicamp-UEFS
2009-2011

Vozes do Sertão em Dados: história, povos e formação do português brasileiro Período: julho de 2009/julho de 2011 Edital/Chamada: Edital MCT/CNPq 02/2009 - Ciências Humanas, Sociais e Sociais Aplicadas Financiador: CNPq. Número do processo: 401433/2009-9/Consepe 102/2009.

Coordenação: Zenaide de Oliveira Novais Carneiro (1ª, 2ª, 3ª, 4ª e 5ª fases) e Mariana Fagundes de Oliveira Lacerda (2ª, 3ª, 4ª e 5ª fases).


About
2012-2016

Projeto Reconstrução da Língua Portuguesa no Interior da Bahia: aspectos sócio-históricos e linguísticos (FAPESB 001/2012/Consepe: 150/2012). (Edital Interno)

Coordenação: Mariana Fagundes de Oliveira Lacerda (UEFS/FAPESB).


Project
Reports
1993-2016

A Língua Portuguesa no Semiárido baiano (Fases I, II, III e IV)

Parceria na fase III com o projeto Vertentes/UFBA
Coordenação: Norma Lucia Fernandes de Almeida e Zenaide de Oliveira Novais Carneiro
Project
Sobre
1997-2000

Contribuições para a Constituição de um Banco de Textos e de um Banco de Dados para o Estudo da História do Português do Brasil, do Século XVII ao XX (Filiação ao PHPB).
Vice-Coordenação: Zenaide de Oliveira Novais Carneiro e Norma Lucia Fernandes de Almeida

History


The CE-DOHS: Electronic Corpus of Sertão Historical Documents project is part of the Portuguese Language Studies Center (NELP), since 2012, of the Department of Letters and Arts (DLA) of Feira de Santana State University (UEFS) ).

NELP works with three agendas: formation of Portuguese-language text bank and socio-historical and linguistic study of Portuguese. CE-DOHS stands out by offering, through a technological partnership with the Corpus Histórico do Português Tycho Brahe project, from the State University of Campinas and under the coordination of Professor Charlotte Galves, an electronic bank of over one million of words, to study the history of Brazilian Portuguese, in partnership with the National Project for the History of Brazilian Portuguese (PHPB). This database constitution, according to Bacelar do Nascimento (2004, p. 1),

[...] essentially favors a descriptive linguistics, strongly supported by new technologies, and allows us to take as a starting point of the description the analysis of a significant amount of authentic data, similar to that is done in other scientific fields. The use of corpora allows the realization of empirically based linguistic descriptions and promotes the discussion of solidly grounded theoretical questions.

Created in 2012, with funding from the Bahia State Research Support Foundation (FAPESB), the CE-DOHS project, organized in two phases: phase 1, which covers documents from the 18th to the 20th century, and phase 2 , which covers documents from the 16th and 17th centuries, brings together the philological field and the computational field, promoting the editing, in XML language, of the texts traditionally edited, according to semi-diplomatic editing criteria, by the researchers of the project Vozes do Sertão em Data, created in 2009, and by CE-DOHS researchers, who have always been seeking to diversify the bank, with texts representative of the popular, especially, and cultured aspects of Brazilian Portuguese.

The first phase of the project aimed to compose a text bank from 1750 to 2000, representative of the historical period of Brazilian Portuguese characterized by localized multilingualism; It allows studying the history of Brazilian Portuguese cult, semi-cult and popular in this context. As a result of this phase, there are several papers published by the team (cf. Lattes of team participants). These are the phase 1 subprojects:

  • Development of computational tools for construction and use of CE-DOHS
  • Application of linguistic and web-semantic annotation techniques in CE-DOHS (partnership with USP).
  • Collections of cultured, semi-cultured and popular Brazilian Portuguese letters (19th and 20th centuries).
  • Letters written by candid hands: the case of the unskilled (20th century).
  • Oral corpora of cultured and popular Brazilian Portuguese (20th century).

The second phase, which is now being implemented in 2019, goes further back in time to a time when multilingualism in Brazil was widespread (1500-1750) (MATTOS AND SILVA, 2004; LUCCHESI, 2017). It aims to study the gestation of Brazilian Portuguese. Faces this phase to rarity of sources: texts written by groups born in Brazil are rare, especially those from Indians and Blacks, ethnic groups who did not have access to school (the sources for the study of dominant linguistics are more generous); the project, however, has small collections significant period of time, soon available on the Platform.

The methodology used to control socio-historical aspects and the Theory of Variation Linguistics (WLH, 2006 [1968]; LABOV, 2008 [1972]; 1982; 1994; 2001a; 2001b; WLH, 1986), with application to texts written in the so-called Socio-Historical Historical Linguistics (MATTOS E SILVA, 2008). Consider themselves at causes that impact the process of change from the standpoint of linguistics Diachronic Chomskian Generative Grammar (CHOMSKY, 1986), as the contact between languages, both languages typologically similar as distinct, contact with nearby indigenous languages ??and also?miscellaneous genetically (ARYON, 1986; 1993) and with sub-Saharan African languages, mainly (7,000 languages, in between 1676-1700, mainly from the Niger Congo family (CASTRO, 2002); until 1780 Brazil received more than one million and two hundred thousand slaves , in the process of acquiring Portuguese as L2, and its transmission to the descendants as L1 (BAXTER, 1985; LUCCHESI & BAXTER, 2009; LUCCHESI, 2009).(CASTRO, 2002).

For editing in XML or electronic language, eDictor, developed by Paixão de Sousa, Kepler and Faria (2010); this is a text editor especially focused on work philological analysis and automatic linguistic analysis.

According to Shepherd et al. (2012, p. 11),

The idea of ??collecting natural text collections for the purpose to submit them to linguistic analysis goes back to the work of the American structuralists of Harris (1951) and Fries (1952). With Brown Corpus (Francis and Kucera, 1954), the first electronic corpus compiled for this purpose would emerge. Although until today this corpus is widely used, at the time there were virtually no texts written in computers were huge and expensive machines that occupied whole rooms, and the computer programs took hours and even days to run.

The CE-DOHS bank has added to the electronic corpora constituted fundamentally for analysis Linguistic It is a valuable job this database formation on the platforms for language studies in general, especially as regards CE-DOHS - considering the socio-historical questions that underpinned its constitution - for studies of the formation of Brazilian Portuguese in the field of historical linguistics.

Social Function


CE-DOHS has a rich and extensive material that offers the scientific community different possibilities of research; and for the history of Brazilian Portuguese (especially of Portuguese within the Bahia), It is an extremely significant corpus. Basic Education professionals find in it an richness of data to be explored in the classroom, in the approach of phenomena of variation and change in language,among others. Professional Master Students in Letters ( PROFLETRAS ), from the Feira de Santana State University, has used the bank in discussing their research topics with students.

Acting

Digital Editions / xml and automatic generation of separate facsimile editions

The editions that make up the electronic corpora have philological rigor, captured by entirely in editing in XML language through the use of the eDictor tool (PAIXÃO DE SOUSA, 2004; TRIPPEL E PAIXÃO DE SOUSA, 2006; PAIX?O DE SOUSA, 2007; PAIXÃO SOUSA E KEPLER, 2007; SOUSA, KLEPER, PAIXÃO DE SOUZA E FARIA,2010). The tool offers facsimile, semi-diplomatic, modernized and technical versions.(for parser input), as well as derivative products such as: the edit lexicon.

Corpus Annotated (parsed)

CE-DOHS, in its annotated version, allows automatic and reliable language searches using in cutting-edge project technologies such as Brahe Parsed Corpus of Historical Penn-Helsinki Parsed Corpora. Access can be done through automatic searches such as corpus search.

Semi-Diplomatic Editions

Philological editions of documents dating from 1500, written by people born in?many different Brazilian cities; controlled for their origin, reliability, production context, location and date of writing, for whom they were written, and why.

From their writers are presented biographical sheets with information on place of birth / nationality, schooling, type of language acquisition, place and date of birth, affiliation and profession. It also controls the ethnicity: Portuguese of different origins social, Indians, Mamluk, Africans, Mestizos and Browns.

This information can be captured in fact sheets and summary tables as well as automatic, accessing the metadata through the E-corp tool. The bench allows the assembly of corpora in the interests of the researcher.

The agenda of traditional philological editions develops under the Project for History of Brazilian Portuguese (PHPB), in a joint work that encompasses several universities Brazilian women. National PHPB Corpus. CE-DOHS operates specifically on the Corpora Bahia platform (PHPB-BA), coordinated by teachers Zenaide de Oliveira Novais Carneiro and Mariana Fagundes de Oliveira Lacerda

Development of computational tools for corpora

The Platform invests in computational tools. Create tools like E-corp and collaborate at implementation of tools such as eDictor. Through partnership, tools are used to projects like Tycho Brahe .

Subprojects

Phase 1 (1750-2000) - Concluding
Sub-Projeto: Preenchendo lacunas: acervos de cartas de portugu?s brasileiro culto, semiculto e popular no século XX: cartas marienses e cartas da Família Tuy Batista. Coordenação: Patrícia Brito e Priscila Tuy Batista

Sub-Projeto: Cartas escritas por mãos cândidas: o caso dos inábeis: inserção nos metadados de todo o acervo CE-DOHS. Coordenação: Huda da Silva Santiago (UEFS). Consultoria: Afrânio Barbosa.

Sub-Projeto: Novas amostras de fala: Áreas especiais indígenas, comunidades quilombolas, áreas urbanas e rurais de pouco escolarizados, já gravados até o ano 2000. Coordenação: Norma Lucia Fernandes; Silvana Araujo; Mariana Fagundes de Oliveira Lacerda, Norma da Silva Lopes, Rejane Cristine Santana Cunha, Daiane Lemos.
Phase 2 (1500-1750) - In progress
Sub-Projeto: Um corpus para os seiscentos (a partir de 1617). Documentos escritos por brasileiros: família Vieira Ravasco e outros contemporâneos
Coordenação: Lara Cardoso

Sub-Projeto: Um corpus raro: escrito por indígenas integrados, mamelucos, pretos, pardos e brancos pobres (anos finais do século XVII).
Coordenação: Lara Cardoso e Thaysy Ribeiro.

Sub-Projeto: Documentos da Feira do Capuame (1729-1830).
Coordenação: Elaine Brandão.

Sub-Projeto: Cartas e Atas produzidos por homens bons da C?mara de Salvador, a partir do século XVII.
Coordenação: Williane Silva Coroa.

Sub-Projeto: Recuando ao Século XVIII: documentos privados do Sobrado do Brejo Seco (1755-1910).
Coordenação: Adilson Silva de Jesus, Elaine Brandão Santos. Rui Marcos Moura, Wellington de Jesus.
Consultoria: Emília Helena Portella Monteiro de Souza; Mariana Fagundes de Oliveira
Lacerda, Zenaide de Oliveira Novais Carneiro.

Sub-Projeto: Inserção de níveis de inabilidade e habilidade nos metadados do Corpus CE-DOHS (Etapas 1 e 2).
Coordenação: Huda da Silva Santiago.
Consultoria: Afrânio Gonçalves Barbosa
Colaboradores: Elane Santos, Gutemberg Barbosa, Jana?na Mascarenhas, Lorena Rosa, Maiara Lemos, Marinalda
Freitas, Rosana Brito.

Eletronic Part

Sub-Projeto: Elaboração de ferramentas computacionais (E-Corp e outras), para construção e uso do CE-DOHS.
Coordenação: Igor Leal. (Unicamp)
Consultoria: Pablo Faria.

Sub-Projeto: Aplicação de técnicas de anotação linguística e web-semântica no CE-DOHS.
Coordenação: Priscila Tuy.
Consultoria: Maria Clara Paixão de Sousa.

Sub-Projeto: Anotação morfológica e sintática de acervos do CE-DOHS: parceria com o Corpus Histórico do Português Tycho Brahe (UNICAMP).
Coordenação: Shirley Guedes.

Sub-Projeto: Revisão de Metadados: polarização sociolinguística; separação por normas; níveis de escolaridade; normas, capital/interior; diferenciação diatópico-diacrônica e por gêneros textuais.
Coordenação: Lorena Rosa Rosana Brito, Shirley Guedes, Maiara Lemos.
Consultoria: Mariana Fagundes de Oliveira Lacerda, Zenaide de Oliveira Novais Carneiro.
Direct corpus access