Introduction

1. Overview

This section of the website is devoted to the Gothic Bible and minor fragments.

There is a database where you can:
- browse the tagged text with selected interlinear translations
- click on words to obtain lexical and morphosyntactic information
- search the text using regular expressions
- browse a dictionary
Digital facsimile of Streitberg's Gotisches Elementarbuch (1920).
Digital facsimile of Streitberg's Gotisch-Griechisch-Deutsches Wörterbuch (1910) with preliminary text transcription.
TEI edition with conversions to HTML and plain text.
Formal model of Gothic inflectional morphology.
Information on the manuscripts and a few pointers to other sites.

2. The source text

The TEI text and database are based on the work of Wilhelm Streitberg:

Die gotische Bibel: Herausgegeben von Wilhelm Streitberg. (Germanische Bibliothek, 2. Abteilung, 3. Band)

1. Teil: Der gotische Text und seine griechische Vorlage. Mit Einleitung, Lesarten und Quellennachweisen sowie den kleineren Denkmälern als Anhang. Heidelberg: Carl Winter, 1919.

2. Teil: Gotisch-griechisch-deutsches Wörterbuch. Heidelberg: Carl Winter, 1910.

To avoid possible copyright issues, we used the 1919 edition. The Speyer fragment, which was discovered in 1970, is cited from Piergiuseppe Scardigli's Nachtrag zum ersten Band in the latest edition (2000), which is basically a reprint – differences between the readings in the 1919 and 2000 edition are listed below. The latest edition (ISBN: 3-8253-0745-X and 3-8253-0746-8) can be ordered at Universitätsverlag WINTER (copies of earlier editions can sometimes be found at catalogues of antiquarian books like Zentrales Verzeichnis Antiquarischer Bücher or Abebooks).

Though it is not error-free, Streitberg's work is generally considered the standard edition: cf. James Marchand, WEMSK: This is the standard edition and has superseded all others. Its ‘reconstruction’ of the Greek Vorlage is seriously flawed by his dependance on von Soden, and he is given to conjectures, but this is the text you must use. Note that Magnús Snædal's “Concordance to Biblical Gothic” (Reykjavík 1998) includes a new text edition with numerous emendations and corrections to Streitberg's readings; unfortunately, his book is currently out of print. Some of the minor fragments at Christian Petersen's website are based on Snædal's readings.

The text was digitized in 1997 by Robert Tannert, David Landau and Tom De Herdt and has been thoroughly checked and proofread. In 2002, the reliability of the transcription was assessed by an automated collation with another electronic edition, provided by the TITUS project. We wrote a simple script that aligned the texts and compared them on a byte-by-byte basis, reporting about 800 differences. These were individually compared to printed copies of Streitberg's 1919 and 1965 edition, and corrected where necessary. Since both texts were independently digitized, using different methods (scanning vs. typing), chances that the same error occurs at the same location are relatively small and a comparison should reveal most remaining errors on both sides.

The current edition is an accurate transcription of Streitberg's 1919 text, only a very small number of obvious printing errors have been corrected. Most of them are mentioned in De Tollenaere 1976, others in Streitberg's Berichtigungen (note that some of the typos still figure in the 2000 edition). Corrections are not marked in the text but listed in the front matter of the TEI edition – there's a copy of the list below.

3. TEI Encoding

The actual text is encoded in XML and follows the guidelines of the Text Encoding Initiative (TEI P4 DTD). We refer to the extensive TEI header for detailed information. The file contains a complete transcription of the Gothic text in Streitberg's edition, i.e. it does not include the preface, introduction, Greek text, commentary or information on the minor fragments, nor is it intended to reproduce the appearance of the original work. The project's primary goal is a linguistically annotated text based on Streitberg's readings, not a reproduction of his book. There are, however, plans to digitize Streitberg's critical apparatus, possibly with some annotations and references to newer readings.

Conversion to TEI has proven to be surprisingly complicated, mainly because there are at least three overlapping layers of annotation: the logical structure or so called ‘canonical reference system’ (in this case: books, chapters, verses), the linguistic structure (sentences, clauses, words, morphemes; part-of-speech-tags, lemmatization, parsing) and textual criticism (unclear or missing text, conjectures by Streitberg and his predecessors, variations between different witnesses). Ideally, one would like to go back to the source and read the manuscripts as well, adding in yet another level of markup (leaf, side, line; hand(s), corrections made by the scribe(s)). It would be very hard to combine all those levels of information in one manageable and reasonably elegant data structure. Biblical verses do not always correspond to sentences, unclear readings often cross word boundaries (strictly speaking, Gothic doesn't even have word boundaries, since the manuscripts are written in scriptio continua), page and line breaks can occur virtually everywhere, etc. The different layers form intersecting hierarchies that just don't fit into a single XML document tree. This very common problem – along with some standard solutions that inevitably boil down to tradeoffs between various sets of different advantages and disadvantages – is described in chapter 31 of the TEI Guideline: Multiple Hierarchies; see also Renear/Mylonas/Durand 1996, Durusau/O'Donnell 2001 among others.

We tried to avoid the problem by:

Focusing on Streitberg's edition rather than a transcription of the manuscripts. A lemmatized and POS-tagged edition of a 6th century copy of a 4th century text will inevitably require a certain amount of abstraction and reconstruction. In that case, it probably makes sense to start from Streitberg's semi-critical edition rather than a strictly diplomatic edition like Uppström's work or the text files of the Codex Argenteus prepared by David Landau. The ideal, of course, would be to present them side by side, providing both philological accuracy and linguistic abstraction without forcing both levels into one single encoding.
Using the parallel segmentation method to record all readings in extenso, even when they are identical (cf. Birnbaum 1999). This mirrors Streitberg's decision to transcribe the full text of all witnesses. Variations have been marked up with a generic segmentation element.
Making sure that Streitberg's additions and deletions do not cross word boundaries, i.e. either contain exactly one or more tokens (e.g. John 7:12 CA: “jah birodeins mikila <bi ina> was”) or are entirely contained within one token (e.g. Timothy I 3:4 A: “ufhausjan[jan]dona”). As mentioned above, unclear text generally does cross word boundaries. We solved that problem by applying the so-called fragmentation technique, [breaking up] what might be considered a single element into multiple smaller elements, in order to make it fit within the hierarchy [TEI P4, 31.3], in other words converting overlapping sequences like “i{n him}inam” to “i{n} {him}inam”. Since Streitberg uses italics to mark up unclear spans of text, fragmentation is not visible and does not result in any loss of information.

4. Linguistic annotations

At this point, the TEI document does not yet contain POS-tags or lexical information. The linguistic annotations are stored in a relational database that is implicitly linked to the TEI text by means of corresponding numeric identifiers. The database contains a digital dictionary based on Streitberg 1910, a table of tokens in the running text and a morphosyntactic tagset; every token has been automatically linked to one or more lemma/POS pairs. Ultimately, the linguistic analyses and interpretations will be incorporated in the TEI document, but as long as not every token has been manually verified or disambiguated, the use of a relational database offers practical and technical advantages over TEI/XML (automatic handling of referential integrity, performance, user-friendly interface).

Since Gothic is a non-productive language with few extant texts, the text has been tagged by generating paradigms for every entry in the dictionary rather than writing a transducer, in other words by building a lookup table of possible forms (±3600 lemmata yield ±250000 inflected forms). Though entirely based on morphological features of isolated words, this ‘naïve’ method worked reasonably well, mainly due to the relatively low degree of syncretism in Gothic inflectional morphology. About 58% of the tokens in the Gothic Bible could be unambiguously linked to one lemma and one POS-tag. Most of the remaining tokens could be lemmatized, but were morphologically ambiguous (e.g. nominative and accusative of neuter nouns). A small number of forms turned out to be lexically ambiguous as well: the word ita for instance can be a neuter pronoun (‘it’) or a verb (‘I eat’), the same applies to the very frequent form im: ‘(to) them’ or ‘I am’. Obviously, the correct analysis can only be determined by looking at the context. Given the small size of the corpus (±67400 tokens), we decided to disambiguate manually rather than writing a full-fledged statistical or rule-based parser. Ambiguous forms are marked in orange; the color should gradually disappear as incorrect analyses are deleted from the link table in the database. The dictionary entries are based on Streitberg's Gotisch-griechisch-deutsches Wörterbuch (1910).

In order to generate the Gothic lexicon, we developed an XML application for the formal description of inflectional morphology (working name: Gomorph, named after a C++ prototype that hardcoded Gothic morphology). The syntax is conceptually similar to MathML, a standard mathematical markup language defined by the W3C, and can theoretically be used for any inflected language. The model is based on inheritance: morphological classes can be derived from other classes, adding new rules or overriding rules defined in the parent class, e.g. ‘noun’ > ‘a-stems’ > ‘ja-stems’ > ‘Mja’ for the class of short masculine ja-stems in Gothic (see Daelemans, Gazdar & De Smedt 1992 for an interesting overview of inheritance in Natural Language Processing). The actual classes are defined by expressions involving parameters (e.g. Lemma), variables (e.g. Root, Suffix), functions (e.g. Umlaut) and two operators, ‘concatenation’ and ‘union’ (basically, each expression defines a regular language, without using Kleene star). There is only one data type: a set of strings, which makes it easier to handle spelling variations or alternative forms. Functions are defined using simple regular expression substitutions that operate on each element of a string set. Finally, each expression has a specified range, i.e. applies to a given subset of the entire paradigm (which allows us for instance to apply a function ‘Ablaut()’ to a variable ‘Root’ in the preterite only). Here are a few examples taken from the definition of masculine u-stems in Gothic, written using pseudo-code:

	parameter Lemma = "sunus" [i.e. the default value]
	function GetRoot(): replace /us$/ with "" [i.e. strip final -us]
	function Phonology(): ... [normally inherited]
	variable Root(*) = GetRoot(Lemma)
	variable Form(*) = Phonology(Root • Suffix)
	variable Suffix(NS) = {"us"}
	variable Suffix(AS) = {"u"}
	...
	variable Suffix(VS) = {"au", "u"}
	...

... and using the Gomorph DTD (somewhat simplifying):

	<class name="Mu" description="Masculine u-stems" inherits="_uStems">
	  <parameters>
	    <parameter name="Lemma" default="sunus"/>
	  </parameters>
	  <functions>
	    <function name="DeriveRoot">
	      <rgx pattern="us$" replace=""/>
	    </function>
	    <!-- function Phonology inherited from parent class -->
	  </functions>
	  <paradigm>
	  <!-- variables Form and Root would normally be inherited from the parent class
	          but are included here for illustration: -->
	    <variable name="Form">
	      <assign range="*">
	        <apply-function name="Phonology">
	          <concatenation>
	            <var name="Root"/>
	            <var name="Suffix"/>
	          </concatenation>
	        </apply-function>
	      </assign>
	    </variable>
	    <variable name="Root">
	      <assign range="*">
	        <apply-function name="DeriveRoot">
	          <param name="Lemma"/>
	        </apply-function>
	      </assign>
	    </variable>
	    <variable name="Suffix">
	      <assign>
	        <list>
	          <literal value="us"/>
	          <literal value="u"/>
	          <literal value="au"/>
	          <literal value="aus"/>
	          <literal value="au|u" type="expression"/>
	          <literal value="jus"/>
	          <literal value="uns"/>
	          <literal value="um"/>
	          <literal value="iwe"/>
	          <null/>
	        </list>
	      </assign>
	    </variable>
	  </paradigm>
	</class>

The XML notation is rather verbose, but offers many advantages: readily available parsers, validation, editors with syntax highlighting and ‘intellisense’, conversion to other formats using XSL transformations (as a matter of fact, since XSLT is said to be Turing complete, it should be possible to write a transformation that compiles the XML specification or actually builds paradigms). In our current implementation (a 100% functional prototype written in Visual Basic), the XML specification is directly interpreted by a program that generates paradigms, based on parameters supplied by the user or stored in a database. A more interesting approach would be to compile the specification, for instance by translating the morphological classes to Java classes or .NET code.

Our specification of Gothic inflectional morphology can be downloaded (Gothic.xml and Gothic.ent – you need both files, in the same directory) or browsed online in XHTML 1.0 format, generated from the XML source using this stylesheet.

5. On duplicate efforts

Unfortunately, there has been a lot of duplicate effort. As far as we know, the Gothic Bible (or more precisely, Streitberg's edition, with or without corrections and minor fragments based on other sources) has been digitized at least five times independently, using technology that ranges from punched cards to TEI P4:

The introduction to De Tollenaere & Jones 1976 states: The original computer corpus for this work was punched in 1962 at the IBM Research Center in Yorktown Heights, New York, under the direction of Philip H. Smith, Jr. [...] The text was later updated according to the fifth edition of Streitberg (1965) and expanded at the Leiden Institute for Netherlandic Lexicology to include a new version of the Skeireins together with all available biblical and non-biblical texts in early Gothic. [in January 2003, Mr. De Tollenaere wrote me that the resulting tape might still be available at the Institute for Dutch Lexicology (INL) in Leiden].
A footnote to the same introduction mentions: Computer texts of Gothic have also been prepared in the past few years by William Estabrook and James W. Marchand. Estabrook has produced a word-index and a reverse word-list, Marchand a grammatical concordance. None of these, however, has yet been published.
See Marchand 1987: I have on my “litte” 1984 issue AT the entire Greek New Testament, the King James Bible, and the Gothic Bible, with plenty of room left over for the software to interrogate them and I entered this text on punched cards in 1960, with a grammatical analysis of each word (my emphasis; cited from Christian Petersen: Gotica Minora, SYLLABUS-Verlag, Hanau 2002).
Wolfgang Griepentrog for the TITUS-project in 1986-1988.
Ljuba Veselinova (Dept. of Linguistics, Stockholm University) scanned the Gospels [email 2003-11-09: I worked on Gothic many years ago and scanned the New Testament texts intending to design a tagged corpus.].
The text available here, digitized by R. Tannert, D. Landau and myself in 1997. Unfortunately, I was not aware of the other versions at that time.
Magnús Snædal's Concordance to Biblical Gothic (1998) is probably based on an electronic edition as well, but since the book is currently out of print, we have no information on how the concordance was built.

Appendix A: deviations from Streitberg's 1919 edition

[1] Matthew 8:14 (CA) : [jah] gasaƕ → jah gasaƕ, <jah> in heitom → in heitom: Streitberg 1919, Berichtigungen: “S. 13 M 8,14 ist die Überlieferung beizubehalten: jah qimands Iesus in garda Paitraus jah gasaƕ swaihron is ligandein in heitom, vgl. E. A. Kock Kontinentalgermanische Streifzüge (Lund und Leipzig 1919) S. 1. Die Intonation bestätigt die Ursprünglichkeit der überlieferten Fassung.”
[2] Matthew 10:36 (CA) : innakundai is → innakundai is.: Missing period in the 1919 edition (corrected in the 2000 edition).
[3] John 6:22 (CA) : þatai → þatei: De Tollenaere 1976: “manuscript and Streitberg's first [1908] edition”.
[4] John 6:22 (CA) : sainaim → seinaim: Streitberg 1919, Berichtigungen: “S. 33 J 6,22 lies siponjam seinaim statt sainaim”.
[5] Luke 3:31 (CA) : sanaus → sunaus: De Tollenaere 1976: “manuscript”.
[6] Luke 9:32 (CA) : geseƕun → gaseƕun: De Tollenaere 1976: “manuscript”.
[7] Mark 4:39 (CA) : da → du: De Tollenaere 1976: “manuscript”.
[8] Mark 12:36 (CA) : fotaubaurd → fotubaurd: De Tollenaere 1976: “manuscript”. Streitberg's glossary has the correct form fotubaurd (Streitberg 1910, p. 36).
[9] Romans 9:27 (A) : Iaraelis → Israelis: De Tollenaere 1976: “manuscript”.
[10] Corinthians I 7:5 (A) : þaþroh~þan → þaþroþ~þan: Streitberg 1919, Anhang: “K 7,5. þaþroþ|þan, nicht þaþroh|þan.”; Streitberg 1919, Berichtigungen: “S. 255 K 7,5 lies þaþroþ-þan statt þaþroh-þan”.
[11] Colossians 1:21 (A) : waurstwam ubilaim, 22 iþ nu gafriþodai → waurstwam ubilaim, iþ nu gafriþodai: Incorrect verse separation. De Tollenaere 1976: “cf. B and verse separation in the Greek text”.
[12] Timothy I 5:25 (A) : þo[ei] → þoei: Streitberg 1919, Berichtigungen: “S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.”
[13] Timothy I 5:25 (B) : þo[(ei)] → þo(ei): Streitberg 1919, Berichtigungen: “S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.”
[14] Titus 1:5 (B) : [in þize] → in þize: Streitberg 1919, Berichtigungen: “Ebenso ist S. 445 Tit 1,5 die eckige Klammer bei in þize zu tilgen: der Wortlaut von B wird durch die Intonation als ursprünglich erwiesen.”
[15] Philemon 1:14 (A) : sawswe → swaswe: De Tollenaere 1976: “manuscript”.
[16] Skeireins 5:2 (E) : anþaranuhþan → anþaranuh þan, (a)nþaranuhþan → (a)nþaranuh þan: De Tollenaere 1976.
[17] Skeireins 6:6 (E) : sumanuhþan → sumanuh þan, sumanuhþan → sumanuh þan: De Tollenaere 1976.

Appendix B: differences between Streitberg 1919 and 2000

[1] Matthew 7:16 (CA) , different interpretation:

1919: lisand[a]
2000: lisanda

[2] Matthew 9:32 (CA) , different segmentation:

1919: utusiddjedun
2000: ut usiddjedun

Misprint or deliberate correction? If it was a deliberate correction, it seems inconsistent: cf. the verb innatgaggan (e.g. Luke 7:45 [CA]: innatiddja).

[3] Matthew 10:24 (CA) , different punctuation:

1919: laisarja nih
2000: laisarja, nih

[4] Matthew 10:36 (CA) , different punctuation:

1919: innakundai is
2000: innakundai is.

Period is missing in the 1919 edition. Corrected in this edition.

[5] John 6:22 (CA) , correction in 2000:

1919: sainaim
2000: seinaim

Streitberg 1919, Berichtigungen: “S. 33 J 6,22 lies siponjam seinaim statt sainaim”.

[6] John 11:18 (CA) , different interpretation:

1919: Iairusaulwmi[a]m
2000: Iairusaulwmiam

[7] John 12:14 (CA) , different interpretation:

1919: <jah> gasat ana ina
2000: gasat ana ina

[8] John 15:13 (CA) , different interpretation:

1919: friaþwa[i]
2000: friaþwai

[9] John 18:28 (CA) , different interpretation:

1919: praitoria<un>
2000: praitoria

[10] Luke 2:37 (CA) , different interpretation:

1919: widuwo <swe> jere
2000: widuwo jere

[11] Luke 3:19 (CA) , different punctuation:

1919: Herodes.
2000: Herodes,

[12] Luke 8:14 (CA) , different interpretation:

1919: þai[ei]
2000: þaiei

[13] Luke 9:37 (CA) , different interpretation:

1919: <afar>daga
2000: daga

[14] Luke 14:28 (CA) , different interpretation:

1919: habaiu <þo> du ustiuhan
2000: habaiu du ustiuhan

[15] Luke 18:38 (CA) , misprint in 2000:

1919: ubuhwopida
2000: ubuƕopida

De Tollenaere 1976: “manuscript and Streitberg's second edition”.

[16] Luke 18:39 (CA) , different interpretation:

1919: faur<a>gaggandans
2000: faurgaggandans

Streitberg 1919, Anhang: “L 18,39. faurgaggandans CA, fauragaggandans GL. Dieses ist intonationsgemäß und entspricht der Lesart προάγοντες; faurgaggandans könnte durch παράγοντες beeinflußt oder durch faurgaggandein· διαπορευομένου (V. 36) hervorgerufen sein”.

[17] Luke 20:42 (CA) , different interpretation:

1919: psalmo<no>
2000: psalmo

Streitberg 1919, Anhang: “L 20,42. psalmono für psalmo CA wird durch die Intonation gefordert. Die got. Flexion des Fremdworts ist wie so häufig vom Dativ Sg. ausgegangen, vgl. Akk. Sg. psalmon K 14,26”.

[18] Mark 2:4 (CA) , different interpretation:

1919: [jah fralailotun]
2000: jah fralailotun

[19] Mark 10:46 (CA) , different interpretation(s):

1919: Barteimai[a]us <sa> blinda
2000: Barteimaiaus blinda

[20] Mark 15:38 (CA) , different interpretation:

1919: faur[a]hah
2000: faurahah

[21] Mark 16:11 (CA) , different punctuation:

1919: ni galaubidedun.
2000: ni galaubidedun

[22] Romans 9:20 (A) , Inconsistent correction in 2000:

1919: gadikis
2000: gadigis

Streitberg 1919 & 2000, apparatus: “gadikis] A deutlich Br., für gadigis”.

[23] Romans 9:33 (A) , different interpretation:

1919: jah <sa> galaubjands
2000: jah sa galaubjands

[24] Romans 11:11 (A) , misprint in 2000:

1919: briggan
2000: briggau

De Tollenaere 1976: “Streitberg's first and second edition”.

[25] Corinthians I 7:5 (A) , (Incomplete) correction:

1919: þaþroh-þan
2000: þaþroþ þan

Streitberg 1919, Anhang: “K 7,5. þaþroþ|þan, nicht þaþroh|þan.” 2000 corrects the error, but hyphen is missing.

[26] Corinthians I 11:23 (A) , different punctuation:

1919: galewiþs was, nam hlaif
2000: galewiþs was. nam hlaif

[27] Corinthians I 13:12 (A) , different reading:

1919: iþ þan ufkunna
2000: <iþ> þan ufkunna

[28] Corinthians I 15:21 (A) , different punctuation:

1919: dauþaize;
2000: dauþaize:

Probably due to poor facsimile reproduction in the 2000 edition.

[29] Corinthians II 6:18 (B) , misprint in 2000:

1919: dauhtrum
2000: dauhtram

De Tollenaere 1976: “Streitberg's second edition”.

[30] Ephesians 3:21 (A) , different interpretation:

1919: aikkles<jon>
2000: aikklesjon

[31] Galatians 2:6 (A) , different segmentation:

1919: anainsokun
2000: ana insokun

De Tollenaere 1976: “Streitberg's second edition”.

[32] Philippians 3:3 (A) , misprint in 2000:

1919: jan-ni
2000: jan ni

De Tollenaere 1976: “Streitberg's first and second edition”.

[33] Colossians 3:5 (B) , different reading:

1919: ubila(na)
2000: ubila

[34] Timothy I 4:14 (B) , different interpretation:

1919: praizbwtairei<n>s
2000: praizbwtaireis

[35] Timothy I 5:22 (B) , misprint in 2000:

1919: ni man<n>hun lagjais
2000: niman<n>hun nlagjais

De Tollenaere 1976: “Streitberg's first and second edition”.

[36] Timothy I 5:25 (A) , correction in 2000:

1919: þo[ei]
2000: þoei

Streitberg 1919, Berichtigungen: “S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.”

[37] Timothy I 5:25 (B) , correction in 2000:

1919: þo[(ei)]
2000: þo(ei)

Streitberg 1919, Berichtigungen: “S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.”

[38] Timothy I 6:4 (B) , Inconsistent correction in 2000:

1919: witands
2000: witāds

De Tollenaere 1976: “expanded in Streitberg's first and second edition”.

[39] Timothy II 4:14 (A) , misprint in 2000:

1919: usgildiþ
2000: us gildiþ

Due to missing soft hyphen at end of line.

[40] Titus 1:5 (B) , correction in 2000:

1919: [in þize]
2000: in þize

Streitberg 1919, Berichtigungen: “Ebenso ist S. 445 Tit 1,5 die eckige Klammer bei in þize zu tilgen: der Wortlaut von B wird durch die Intonation als ursprünglich erwiesen.”

[41] Nehemiah 6:15 (D) , misprint in 2000:

1919: ·n· dage
2000: ·n dage

[42] Nehemiah 7:3 (D) , different punctuation:

1919: und þatei urrinnai sunno
2000: und þatei urrinnai, sunno

[43] Nehemiah 7:17 (D) , misprint in 2000:

1919: <Az>gadis
2000: <Az->gadis

Error due to hyphenation in 1919.