General principles of corpus management
Where to maintain my corpus? And how to keep track of versions?
Generally, a corpus is better maintained outside of InkVisitor, in a repository which allows systematic version control, such as GitHub.
How to describe texts in the repository itself?
In CASTEMO workflows, the best place to store metadata is neither the full-text itself (e.g., the document's TEI XML header, if you are using TEI XML) nor the repository, but the InkVisitor database, which holds such metadata close to other research data and in a readily machine-operable form.
Nevertheless, it is good practice to outline some basics and protocol your choices and general approach to text correction in a README.md file in the repository itself (e.g., GitHub), enclosing the README.md file (in Markdown formatting) with the full-text document in the folder of this document. It is a good idea to put in licence information, information on the edition used, and approach to text transformations and corrections.
An example of such a README.md file:
# License
All rights reserved. This digital text is intended exclusively for private research use in the DISSINET research group, and should not be shared or circulated.
# Edition used
The source for this digital text is an optically recognized and cleaned version (with editorial matter removed) of:
> Thouzellier, Christine, ed. *Une somme anti-cathare: Le Liber contra Manicheos de Durand de Huesca*. Spicilegium sacrum Lovaniense, 1964.
# Quotations
Quotations are preserved using the `<quote>` element. In the original edition, italics seems to be used only for biblical quotations, and thus, all italicized words were transformed into spans delimited specifically as biblical quotations: `<quote source="#Vulgate">` (citations from, e.g., Augustine were not marked by italics but by quotation marks). There are two sources referenced through the `<quote>` element.
* **Anonymous Cathar treatise:** Marked by `<quote source="#Cathar treatise">`. This separates quoted text from the polemicist's text.
* **Latin Bible:** Quotes are marked by `<quote source="#Vulgate">`.
No comments to display
No comments to display