Sunday, 13 December 2009

Building an integrated historical geographical information system, part 1

Historical source-documents are increasingly becoming available on the Internet. It's a huge resource waiting to be elaborated. The sources often come in one of two formats. They are published as graphical scans of printed book pages with or without large chunks of unedited OCR-text, or they can be fully formatted as an electronic text, that can be copied, edited or searched. Often little or no metadata about the source documents are available. I will mention a couple of examples. Digital libraries such as Google Books or Gallica have huge amounts of scans of book-pages with OCR-text. A very rough estimation is that the accompanying OCR-text are correct to 60 percent. Searching for a term in the text will sometimes give a correct result, sometimes not. Maybe we can expect the OCR-technique to perform better in the future, I dont know. Otherwise we have to depend on humans taking responsibility for manual processing and correction of the texts. An example of the latter is the project Chartae Burgundiae Medii Aevi, University of Burgundy, in Dijon France, aiming at making all medieval charter editions of Burgundy available on the internet as fully corrected electronic texts, in part through own digitazation efforts, including publishing previously unedited texts from manuscripts online, correction of digitized text found at Gallica and Google Books. Currently there are 25 editions avialable, which are all downloadable in Text- or Word-format. Other examples of projects with high ambitions to publish fully corrected editions of historical source-documents are the digital Monumenta Germaniae historica (dMGH), Regesta Imperii (RI), and Codice diplomatico della Lombardia medievale (secoli VIII - XII).

It's important for scholarly work to have the possibility to cite/quote individual pages or source documents in editions. In order for electronical text to become part of the scientific research process they must fulfill these requirements. Currently there are a number of ways that these requirements are beginning to become fullfilled. The dMGH allows building of URL to indivudal pages through-out the entire body of all editions in Monumenta Germaniae historica, using widely known abbreviations to individual editions of source documents. The following example is an URL to a individual page in Gesta Dagoberti I. in: SS rer. Merov. 2. page 396 ("S." means "Seite", German for page).

Next example demonstrates an URL built from widely used sequential number of royal diplomas in Die Urkunden der Karolinger 1 (Royal diplomas issued by king Pepin, Carloman and Charlemagne) The following two example refers to diploma no. 165, donation by Charlemagne to monastery Prüm (Rheinland-Pfalz) issued on the 9th of June 790 in Mainz (Mayence). Note the slight different composition of the two URL retrieving the same document by charter-number and page-number respectively.

Regesta Imperii is a series of source-summaries (regesta) ordering all written testimonials of Frankish and German kings chronologically throughout the middle-ages, with references to evidene in source editions. The lastest contribution is the first volume of source summaries of Charles the Bald, king of West-Francia (840-877), Die Regesten Karls des Kahlen 840 (823) - 877, edited by Irmgard Fees and published 2007. Not only evidence in diplomas but also evidence of activities of rulers in narrative sources. Regesta Imperii is also important for the listing of diplomas and other evidence of emperor Louis the Pioux, because of the still missing edition of his diplomas in MGH (the draft of this editions was destroyed during WW2). The following source summary refers to Regesta Imperii no 1005, Royal donation issued by Louis the Pious issued on the 8th of May 840 in Salz, concerning royal estates in modern Belgium. \ 

Even in non scientific resources like Google Books it's possible to link to indivudal pages of the source-edition. In Google by page-number of the printed original. Charter issued by Haroin in Wissembourg in 742, Liber donationum, no. 1 on page 7. in: Traditiones possessionesque Wizenburgenses. Codices duo cum supplementis. Zeuss (ed.) Speyer 1842, where id is a distinct and peristant identifier of this source edition.

Unfortunatelly Gallica only permits link to the sequential number of the scanned pages, which is not compatible with the printed edition of the edition. If you wish to link to a certain page, you have the visit the actual webpage and copy the link. In other words, you can't construct the link with knowledge of the book-identifier alone, like Google.

My project Regnum Francorum Online aims at referencing historical events from the Merovingian and Carolingian period in time, space, and by agency, building a collection of meta-data about the events including links to indivudal source-documents if they are available online, taking advantage of the possibility to link to individual source-documents as described in the examples above. Referencing in time means that events are given a numerical estimation of time. In PHP the concept of Julian day count is implemented, and it's utilzed here. Referencing in space means geo-referencing places mentioned in the event to modern geographical concept of longitude and latitude as well as administrative affinity like country and province and other territorial divisions, distinctly identifying a placename. Referencing by agency means identififying individuals mentioned in the event as well-known historical persons, or if not possible, to individuals with a recognized name. Uncertainty in referencing must be taken into account.