This section of the website is devoted to the Gothic Bible and minor fragments.
The TEI text and database are based on the work of Wilhelm Streitberg:
Die gotische Bibel: Herausgegeben von Wilhelm Streitberg. (Germanische Bibliothek, 2. Abteilung, 3. Band)
- 1. Teil: Der gotische Text und seine griechische Vorlage. Mit Einleitung, Lesarten und Quellennachweisen sowie den kleineren Denkmälern als Anhang. Heidelberg: Carl Winter, 1919.
- 2. Teil: Gotisch-griechisch-deutsches Wörterbuch. Heidelberg: Carl Winter, 1910.
To avoid possible copyright issues, we used the 1919 edition. The Speyer fragment, which was discovered in 1970, is cited from Piergiuseppe Scardigli's Nachtrag zum ersten Band in the latest edition (2000), which is basically a reprint – differences between the readings in the 1919 and 2000 edition are listed below. The latest edition (ISBN: 3-8253-0745-X and 3-8253-0746-8) can be ordered at Universitätsverlag WINTER (copies of earlier editions can sometimes be found at catalogues of antiquarian books like Zentrales Verzeichnis Antiquarischer Bücher or Abebooks).
Though it is not error-free, Streitberg's work is generally considered the standard edition: cf. James Marchand, WEMSK: This is the standard edition and has superseded all others. Its ‘reconstruction’ of the Greek Vorlage is seriously flawed by his dependance on von Soden, and he is given to conjectures, but this is the text you must use
. Note that Magnús Snædal's “Concordance to Biblical Gothic” (Reykjavík 1998) includes a new text edition with numerous emendations and corrections to Streitberg's readings; unfortunately, his book is currently out of print. Some of the minor fragments at Christian Petersen's website are based on Snædal's readings.
The text was digitized in 1997 by Robert Tannert, David Landau and Tom De Herdt and has been thoroughly checked and proofread. In 2002, the reliability of the transcription was assessed by an automated collation with another electronic edition, provided by the TITUS project. We wrote a simple script that aligned the texts and compared them on a byte-by-byte basis, reporting about 800 differences. These were individually compared to printed copies of Streitberg's 1919 and 1965 edition, and corrected where necessary. Since both texts were independently digitized, using different methods (scanning vs. typing), chances that the same error occurs at the same location are relatively small and a comparison should reveal most remaining errors on both sides.
The current edition is an accurate transcription of Streitberg's 1919 text, only a very small number of obvious printing errors have been corrected. Most of them are mentioned in De Tollenaere 1976, others in Streitberg's Berichtigungen (note that some of the typos still figure in the 2000 edition). Corrections are not marked in the text but listed in the front matter of the TEI edition – there's a copy of the list below.
The actual text is encoded in XML and follows the guidelines of the Text Encoding Initiative (TEI P4 DTD). We refer to the extensive TEI header for detailed information. The file contains a complete transcription of the Gothic text in Streitberg's edition, i.e. it does not include the preface, introduction, Greek text, commentary or information on the minor fragments, nor is it intended to reproduce the appearance of the original work. The project's primary goal is a linguistically annotated text based on Streitberg's readings, not a reproduction of his book. There are, however, plans to digitize Streitberg's critical apparatus, possibly with some annotations and references to newer readings.
Conversion to TEI has proven to be surprisingly complicated, mainly because there are at least three overlapping layers of annotation: the logical structure or so called ‘canonical reference system’ (in this case: books, chapters, verses), the linguistic structure (sentences, clauses, words, morphemes; part-of-speech-tags, lemmatization, parsing) and textual criticism (unclear or missing text, conjectures by Streitberg and his predecessors, variations between different witnesses). Ideally, one would like to go back to the source and read the manuscripts as well, adding in yet another level of markup (leaf, side, line; hand(s), corrections made by the scribe(s)). It would be very hard to combine all those levels of information in one manageable and reasonably elegant data structure. Biblical verses do not always correspond to sentences, unclear readings often cross word boundaries (strictly speaking, Gothic doesn't even have word boundaries, since the manuscripts are written in scriptio continua), page and line breaks can occur virtually everywhere, etc. The different layers form intersecting hierarchies that just don't fit into a single XML document tree. This very common problem – along with some standard solutions that inevitably boil down to tradeoffs between various sets of different advantages and disadvantages
– is described in chapter 31 of the TEI Guideline: Multiple Hierarchies; see also Renear/Mylonas/Durand 1996, Durusau/O'Donnell 2001 among others.
We tried to avoid the problem by:
[breaking up] what might be considered a single element into multiple smaller elements, in order to make it fit within the hierarchy[TEI P4, 31.3], in other words converting overlapping sequences like “i{n him}inam” to “i{n} {him}inam”. Since Streitberg uses italics to mark up unclear spans of text, fragmentation is not visible and does not result in any loss of information.
At this point, the TEI document does not yet contain POS-tags or lexical information. The linguistic annotations are stored in a relational database that is implicitly linked to the TEI text by means of corresponding numeric identifiers. The database contains a digital dictionary based on Streitberg 1910, a table of tokens in the running text and a morphosyntactic tagset; every token has been automatically linked to one or more lemma/POS pairs. Ultimately, the linguistic analyses and interpretations will be incorporated in the TEI document, but as long as not every token has been manually verified or disambiguated, the use of a relational database offers practical and technical advantages over TEI/XML (automatic handling of referential integrity, performance, user-friendly interface).
Since Gothic is a non-productive language with few extant texts, the text has been tagged by generating paradigms for every entry in the dictionary rather than writing a transducer, in other words by building a lookup table of possible forms (±3600 lemmata yield ±250000 inflected forms). Though entirely based on morphological features of isolated words, this ‘naïve’ method worked reasonably well, mainly due to the relatively low degree of syncretism in Gothic inflectional morphology. About 58% of the tokens in the Gothic Bible could be unambiguously linked to one lemma and one POS-tag. Most of the remaining tokens could be lemmatized, but were morphologically ambiguous (e.g. nominative and accusative of neuter nouns). A small number of forms turned out to be lexically ambiguous as well: the word ita for instance can be a neuter pronoun (‘it’) or a verb (‘I eat’), the same applies to the very frequent form im: ‘(to) them’ or ‘I am’. Obviously, the correct analysis can only be determined by looking at the context. Given the small size of the corpus (±67400 tokens), we decided to disambiguate manually rather than writing a full-fledged statistical or rule-based parser. Ambiguous forms are marked in orange; the color should gradually disappear as incorrect analyses are deleted from the link table in the database. The dictionary entries are based on Streitberg's Gotisch-griechisch-deutsches Wörterbuch (1910).
In order to generate the Gothic lexicon, we developed an XML application for the formal description of inflectional morphology (working name: Gomorph, named after a C++ prototype that hardcoded Gothic morphology). The syntax is conceptually similar to MathML, a standard mathematical markup language defined by the W3C, and can theoretically be used for any inflected language. The model is based on inheritance: morphological classes can be derived from other classes, adding new rules or overriding rules defined in the parent class, e.g. ‘noun’ > ‘a-stems’ > ‘ja-stems’ > ‘Mja’ for the class of short masculine ja-stems in Gothic (see Daelemans, Gazdar & De Smedt 1992 for an interesting overview of inheritance in Natural Language Processing). The actual classes are defined by expressions involving parameters (e.g. Lemma), variables (e.g. Root, Suffix), functions (e.g. Umlaut) and two operators, ‘concatenation’ and ‘union’ (basically, each expression defines a regular language, without using Kleene star). There is only one data type: a set of strings, which makes it easier to handle spelling variations or alternative forms. Functions are defined using simple regular expression substitutions that operate on each element of a string set. Finally, each expression has a specified range, i.e. applies to a given subset of the entire paradigm (which allows us for instance to apply a function ‘Ablaut()’ to a variable ‘Root’ in the preterite only). Here are a few examples taken from the definition of masculine u-stems in Gothic, written using pseudo-code:
parameter Lemma = "sunus" [i.e. the default value] function GetRoot(): replace /us$/ with "" [i.e. strip final -us] function Phonology(): ... [normally inherited] variable Root(*) = GetRoot(Lemma) variable Form(*) = Phonology(Root • Suffix) variable Suffix(NS) = {"us"} variable Suffix(AS) = {"u"} ... variable Suffix(VS) = {"au", "u"} ...
... and using the Gomorph DTD (somewhat simplifying):
<class name="Mu" description="Masculine u-stems" inherits="_uStems"> <parameters> <parameter name="Lemma" default="sunus"/> </parameters> <functions> <function name="DeriveRoot"> <rgx pattern="us$" replace=""/> </function> <!-- function Phonology inherited from parent class --> </functions> <paradigm> <!-- variables Form and Root would normally be inherited from the parent class but are included here for illustration: --> <variable name="Form"> <assign range="*"> <apply-function name="Phonology"> <concatenation> <var name="Root"/> <var name="Suffix"/> </concatenation> </apply-function> </assign> </variable> <variable name="Root"> <assign range="*"> <apply-function name="DeriveRoot"> <param name="Lemma"/> </apply-function> </assign> </variable> <variable name="Suffix"> <assign> <list> <literal value="us"/> <literal value="u"/> <literal value="au"/> <literal value="aus"/> <literal value="au|u" type="expression"/> <literal value="jus"/> <literal value="uns"/> <literal value="um"/> <literal value="iwe"/> <null/> </list> </assign> </variable> </paradigm> </class>
The XML notation is rather verbose, but offers many advantages: readily available parsers, validation, editors with syntax highlighting and ‘intellisense’, conversion to other formats using XSL transformations (as a matter of fact, since XSLT is said to be Turing complete, it should be possible to write a transformation that compiles the XML specification or actually builds paradigms). In our current implementation (a 100% functional prototype written in Visual Basic), the XML specification is directly interpreted by a program that generates paradigms, based on parameters supplied by the user or stored in a database. A more interesting approach would be to compile the specification, for instance by translating the morphological classes to Java classes or .NET code.
Our specification of Gothic inflectional morphology can be downloaded (Gothic.xml and Gothic.ent – you need both files, in the same directory) or browsed online in XHTML 1.0 format, generated from the XML source using this stylesheet.
Unfortunately, there has been a lot of duplicate effort. As far as we know, the Gothic Bible (or more precisely, Streitberg's edition, with or without corrections and minor fragments based on other sources) has been digitized at least five times independently, using technology that ranges from punched cards to TEI P4:
The original computer corpus for this work was punched in 1962 at the IBM Research Center in Yorktown Heights, New York, under the direction of Philip H. Smith, Jr. [...] The text was later updated according to the fifth edition of Streitberg (1965) and expanded at the Leiden Institute for Netherlandic Lexicology to include a new version of the Skeireins together with all available biblical and non-biblical texts in early Gothic.[in January 2003, Mr. De Tollenaere wrote me that the resulting tape might still be available at the Institute for Dutch Lexicology (INL) in Leiden].
Computer texts of Gothic have also been prepared in the past few years by William Estabrook and James W. Marchand. Estabrook has produced a word-index and a reverse word-list, Marchand a grammatical concordance. None of these, however, has yet been published.
I have on my “litte” 1984 issue AT the entire Greek New Testament, the King James Bible, and the Gothic Bible, with plenty of room left over for the software to interrogate themand
I entered this text on punched cards in 1960, with a grammatical analysis of each word(my emphasis; cited from Christian Petersen: Gotica Minora, SYLLABUS-Verlag, Hanau 2002).
I worked on Gothic many years ago and scanned the New Testament texts intending to design a tagged corpus.].
Streitberg 1919, Berichtigungen: “S. 13 M 8,14 ist die Überlieferung beizubehalten: jah qimands Iesus in garda Paitraus jah gasaƕ swaihron is ligandein in heitom
, vgl. E. A. Kock Kontinentalgermanische Streifzüge (Lund und Leipzig 1919) S. 1. Die Intonation bestätigt die Ursprünglichkeit der überlieferten Fassung.”
Missing period in the 1919 edition (corrected in the 2000 edition).
De Tollenaere 1976: “manuscript and Streitberg's first [1908] edition”.
Streitberg 1919, Berichtigungen: “S. 33 J 6,22 lies siponjam seinaim statt sainaim”.
De Tollenaere 1976: “manuscript”.
De Tollenaere 1976: “manuscript”.
De Tollenaere 1976: “manuscript”.
De Tollenaere 1976: “manuscript”. Streitberg's glossary has the correct form fotubaurd (Streitberg 1910, p. 36).
De Tollenaere 1976: “manuscript”.
Streitberg 1919, Anhang: “K 7,5. þaþroþ|þan, nicht þaþroh|þan.”; Streitberg 1919, Berichtigungen: “S. 255 K 7,5 lies þaþroþ-þan statt þaþroh-þan”.
Incorrect verse separation. De Tollenaere 1976: “cf. B and verse separation in the Greek text”.
Streitberg 1919, Berichtigungen: “S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.”
Streitberg 1919, Berichtigungen: “S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.”
Streitberg 1919, Berichtigungen: “Ebenso ist S. 445 Tit 1,5 die eckige Klammer bei in þize zu tilgen: der Wortlaut von B wird durch die Intonation als ursprünglich erwiesen.”
De Tollenaere 1976: “manuscript”.
De Tollenaere 1976.
De Tollenaere 1976.
1919: lisand[a]
2000: lisanda
1919: utusiddjedun
2000: ut usiddjedun
Misprint or deliberate correction? If it was a deliberate correction, it seems inconsistent: cf. the verb innatgaggan (e.g. Luke 7:45 [CA]: innatiddja).
1919: laisarja nih
2000: laisarja, nih
1919: innakundai is
2000: innakundai is.
Period is missing in the 1919 edition. Corrected in this edition.
1919: sainaim
2000: seinaim
Streitberg 1919, Berichtigungen: “S. 33 J 6,22 lies siponjam seinaim statt sainaim”.
1919: Iairusaulwmi[a]m
2000: Iairusaulwmiam
1919: <jah> gasat ana ina
2000: gasat ana ina
1919: friaþwa[i]
2000: friaþwai
1919: praitoria<un>
2000: praitoria
1919: widuwo <swe> jere
2000: widuwo jere
1919: Herodes.
2000: Herodes,
1919: þai[ei]
2000: þaiei
1919: <afar>daga
2000: daga
1919: habaiu <þo> du ustiuhan
2000: habaiu du ustiuhan
1919: ubuhwopida
2000: ubuƕopida
De Tollenaere 1976: “manuscript and Streitberg's second edition”.
1919: faur<a>gaggandans
2000: faurgaggandans
Streitberg 1919, Anhang: “L 18,39. faurgaggandans CA, fauragaggandans GL. Dieses ist intonationsgemäß und entspricht der Lesart προάγοντες; faurgaggandans könnte durch παράγοντες beeinflußt oder durch faurgaggandein· διαπορευομένου (V. 36) hervorgerufen sein”.
1919: psalmo<no>
2000: psalmo
Streitberg 1919, Anhang: “L 20,42. psalmono für psalmo CA wird durch die Intonation gefordert. Die got. Flexion des Fremdworts ist wie so häufig vom Dativ Sg. ausgegangen, vgl. Akk. Sg. psalmon K 14,26”.
1919: [jah fralailotun]
2000: jah fralailotun
1919: Barteimai[a]us <sa> blinda
2000: Barteimaiaus blinda
1919: faur[a]hah
2000: faurahah
1919: ni galaubidedun.
2000: ni galaubidedun
1919: gadikis
2000: gadigis
Streitberg 1919 & 2000, apparatus: “gadikis] A deutlich Br., für gadigis”.
1919: jah <sa> galaubjands
2000: jah sa galaubjands
1919: briggan
2000: briggau
De Tollenaere 1976: “Streitberg's first and second edition”.
1919: þaþroh-þan
2000: þaþroþ þan
Streitberg 1919, Anhang: “K 7,5. þaþroþ|þan, nicht þaþroh|þan.” 2000 corrects the error, but hyphen is missing.
1919: galewiþs was, nam hlaif
2000: galewiþs was. nam hlaif
1919: iþ þan ufkunna
2000: <iþ> þan ufkunna
1919: dauþaize;
2000: dauþaize:
Probably due to poor facsimile reproduction in the 2000 edition.
1919: dauhtrum
2000: dauhtram
De Tollenaere 1976: “Streitberg's second edition”.
1919: aikkles<jon>
2000: aikklesjon
1919: anainsokun
2000: ana insokun
De Tollenaere 1976: “Streitberg's second edition”.
1919: jan-ni
2000: jan ni
De Tollenaere 1976: “Streitberg's first and second edition”.
1919: ubila(na)
2000: ubila
1919: praizbwtairei<n>s
2000: praizbwtaireis
1919: ni man<n>hun lagjais
2000: niman<n>hun nlagjais
De Tollenaere 1976: “Streitberg's first and second edition”.
1919: þo[ei]
2000: þoei
Streitberg 1919, Berichtigungen: “S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.”
1919: þo[(ei)]
2000: þo(ei)
Streitberg 1919, Berichtigungen: “S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.”
1919: witands
2000: witāds
De Tollenaere 1976: “expanded in Streitberg's first and second edition”.
1919: usgildiþ
2000: us gildiþ
Due to missing soft hyphen at end of line.
1919:
[in þize]
2000: in þize
Streitberg 1919, Berichtigungen: “Ebenso ist S. 445 Tit 1,5 die eckige Klammer bei in þize zu tilgen: der Wortlaut von B wird durch die Intonation als ursprünglich erwiesen.”
1919: ·n· dage
2000: ·n dage
1919: und þatei urrinnai sunno
2000: und þatei urrinnai, sunno
1919: <Az>gadis
2000: <Az->gadis
Error due to hyphenation in 1919.