From texts to structured data: Building knowledge graphs through Computer-Assisted Semantic Text Modelling (CASTEMO)
David Zbíral, Robert L. J. Shaw, Petr Hanák, Tomáš Hampejs, and Adam Mertel
This book documents:
- a data collection workflow of Computer-Assisted Semantic Text Modelling (CASTEMO);
- the conceptual as well as technical structure of CASTEMO knowledge graphs;
- the research environment implementing this workflow - InkVisitor.
Cite this book:
Zbíral, David, Robert L. J. Shaw, Petr Hanák, Tomáš Hampejs, and Adam Mertel. 2025. From Texts to Structured Data: Building Knowledge Graphs through Computer-Assisted Semantic Text Modelling (CASTEMO). Brno: Masaryk University. https://docs.religionistika.phil.muni.cz/books/from-texts-to-structured-data-building-knowledge-graphs-through-computer-assisted-semantic-text-modelling-castemo.
Contact David Zbíral at david.zbiral@mail.muni.cz.
Acknowledgements
The CASTEMO data collection workflow and the InkVisitor research environment were developped by t...
List of abbreviations
Abbreviation Meaning A Action type B Living Being C Concept E Even...
Why knowledge graphs?
Knowledge graphs are flexible data structures which store data as (1) nodes, and (2) ties between...
Entities
This chapter describes the different entity types of the CASTEMO data model, and their recommende...
Entities overview
CASTEMO recognizes 11 entity types (SPECTRABLOG – Statements, Persons, Events, Concepts, Territor...
Actions
Actions (or more fully, Action types) represent individual semantically disambiguated verbs. They...
Concepts
Concepts represent, alongside Action types, another generic entity type, which holds the data sem...
Attributes of entities
Any entity type has some internal Attributes, which allow to characterize the entity. The InkVisi...
Statements
Structure and purpose Statements model the syntactic structure and semantics of clauses. They ha...
Persons
JSON Structure The general entity attributes are inherited from the base entity object, which is...
Groups
JSON Structure The general entity attributes are inherited from the base entity object, which is...
Living Beings
JSON Structure The general entity attributes are inherited from the base entity object, which is...
Objects
JSON Structure The general entity attributes are inherited from the base entity object, which is...
Locations
JSON Structure The general entity attributes are inherited from the base entity object, which is...
Events
JSON Structure The general entity attributes are inherited from the base entity object, which is...
Territories
JSON Structure The general entity attributes are inherited from the base entity object, which...
Resources
JSON Structure The general entity attributes are inherited from the base entity object, which is...
Values
JSON Structure The general entity attributes are inherited from the base entity object, which is...
Properties
This chapter explains a vital kind of relation in the CASTEMO data model: properties, which serve...
Relations
Some core semantic and ontological relations between entities are highlighted in the CASTEMO data...
Relations overview
Some core semantic and ontological relations between entities are highlighted in the CASTEMO dat...
Superclass (SCL)
Superclass (SCL) is a semantic relation which relates an Action to one or more Actions, or a Conc...
Superordinate Entity (SOE)
Superordinate Entity (SOE) is a Relation which connects a subordinate entity to an entity in whic...
Classification (CLA)
Classification (CLA) is a Relation between a specific PLOGESTRB entity and the class (Concept) to...
Identification (IDE)
Identification (IDE) serves to declare the identity between PLOGESTRB entities, both within an en...
Synonym (SYN)
The CASTEMO data model recommends a strong understanding of synonymy. For two lexemes to be relat...
Antonym (ANT)
The CASTEMO data model recommends a strong understanding of antonymy, i.e. one which to some degr...
Holonym (HOL)
Holonym (HOL) Relation denotes the relation between a Concept representing a part of something to...
Property Reciprocal (PRR)
Property Reciprocal (PRR) is a Relation connecting two Concepts which can feature as a Property T...
Action/Event Equivalent (AEE)
The Action/Event Equivalent Relation (AEE) connects always one Action to one Concept, and it serv...
Implication
Implication (IMP) is a Relation which connects an Action to one or more other Actions. It denotes...
Subject/Actant1 Reciprocal (SAR)
The Subject/Actant1 Reciprocal (SAR) Relation relates two Actions. It is a type of Implication, b...
Actant semantics: Subject Semantics (SUS), Actant 1 Semantics (A1S), and Actant 2 Semantics (A2S)
Subject Semantics (SUS), Actant 1 Semantics (A1S), and Actant 2 Semantics (A2S) are Relations eac...
Related (REL)
Related (REL) is the least specific Relation which allows to relate entities of any type by way o...
References
Corpus in InkVisitor: Managing textual data and representing them in the CASTEMO ontology
The Annotator component of InkVisitor allows you to perform various operations with full-texts. ...
Corpus in the CASTEMO workflow
Constituting a corpus of full texts is a natural component of the CASTEMO workflow. Full-texts ar...
Represent textual versions
Basics The full text you plan to annotate has its often quite complex history of versions. At...
Import a full-text document and start annotating
Before starting to annotate, you need to import a full text in InkVisitor, create a Resource repr...
Use Annotator
Annotator is a component of the InkVisitor software adapted to the annotation of full-texts. Unli...
Search and replace strings in full-text documents
The Annotator component in InkVisitor has in-built functionalities for: searching text; repla...
Add anchors using search
Adding anchor to one search hit In the highlight mode, it also allows you to add anchor to the s...
Decide on the focus and extent of annotation
Any semantic annotation, as comprehensive as it might be, always has a purpose, that is, is conne...
How best collect CASTEMO data?
Describe your data collection choices
Every data collection campaign, even the most comprehensive CASTEMO annotation, necessarily makes...
"Same as above": Referencing information content in CASTEMO knowledge graphs
Referring to the content of another statement Basic logic Statements often make references to o...
Querying CASTEMO knowledge graphs
Now, time to get knowledge out of the knowledge graphs. This chapter categorizes some useful quer...
Finding inconsistent and invalid data
This is the collapsible text. For various reasons, such as data import or bugs of some version o...
Querying CASTEMO knowledge graphs in Neo4j
Querying with relations
Get data of trial events and their participants (DISSINET-specific)
How to match the person physically at trial (in a deposition, deponent)? In DISSINET data, the...
Data import to InkVisitor
InkVisitor installation on a server
This chapter is intended for your IT support. It describes how to deploy the InkVisitor applicati...