Grammars of Coercion
Towards a cross-corpora annotation model
This document describes the first digital steps of Working Group 1 “Grammars of Coercion” (hereafter WG1) towards a common and multilingual cross-corpora annotation model for the semantic analysis of historical sources on coercion in labor. It is the result of four rounds of experiments with Catma (a semantic annotation and analysis tool) held online during the first semester of 2020. The purposes of this working paper are threefold: 1. to (help) keep track of the discussions and decisions that led to designing our initial annotation model; 2. to serve as guidelines for members of WG1 committed to contribute to the project by annotating texts; 3. to serve as a basis for a future methodological article.
WG1 has adopted a bottom-up methodology which consists in collecting structured information on coercion in labor from various historical contexts and with a primary focus on “texts” as a means to reconceptualize “coercion”. To avoid Eurocentric biases that would consist in transposing “universal” modern categories onto pre- and early-modern contexts, WG1 tries to avoid analysis based on labels and nouns and, instead, focuses on the actions describing coercion in labor activities and relations. Inspired by Maria Ågren (et alii)’s “verb-oriented” method, we propose an annotation model that takes “action-phrases” and their “actors” as the main focus of annotation, combined with the “entry-extraction-exit” phases of labor relations designed by Marcel van der Linden. As a whole, the annotation model is designed to isolate “action-phrases” from texts without losing related information as to the actors involved in the actions. The texts chosen for this digital experiment are highly heterogenous. They are provided by active members of the working group, based on the sources they know best. Such a disparity is not without causing issues, in particular as to the comparability of the (con)texts, and as to variations in “genres” and modes of speech. However, facing this crucial problem at an early stage of the project shall also provide an opportunity to address it along the road.
In our annotation model, an “action phrase” is conceived as a coherent continuous grammatical compound describing an “action”, whatever the grammatical structure of the language of the text. It can be a whole sentence, but also a segment of a sentence (when, for instance, long sentences describe several actions). In this model, “action phrases” are not dissociated from their “actors”. The term “actors” is understood in a broad sense. “Actors” can be individuals, groups, institutions, administrations, etc. who take part in or are concerned by the action, but also the related contextual information (such as time and space) that play a role in the action. The relationship between the “action-phrase” and its “actors” is maintained by assigning “roles” to the actors (e.g. a person who does an action is the “actor” of the action, a group who is done something through an action is the target of the action, a place has a role in an action, etc.).
The annotation model is (at this stage of our experiment) composed of 5 different tagsets, that might undergo changes as we refine the methodology and experiment more with the practice of annotation:
- “action phrases”,
- “actors”,
- “contextual information”,
- “phase”,
- “semasiological naming”.
Besides “action phrases”, “actors and”contextual information”, the “phase” tagset helps situate an “action phrase” in a chain of actions and within the life cycle of a labor relation (e.g. the action corresponds to the “entry” into a labor relation, takes place after the “exit” phase, or between an “exit” phase and a new “entry” phase, etc.). The “semasiological naming” tagset is still at a very early stage of conceptualization and elaboration. Its raison d’être is that at early stages of our annotation attempts we felt the need to keep track of context-specific concepts, and of the “emic labels” and “emic attributes” separated from the “action phrase” itself, but implicitly related to it or to its actors. For example, an action can be judged “shameful” farther in the text; an actor can be assigned some qualification in another portion of the text where no action phrase related to labor and/coercion is to be found; or an actor can be related to an “emic label” in the action phrase, while being related to another label elsewhere in the text.
As for the organization of the tagsets, we decided to keep the five tagsets separated. Catma provides many ways to structure an annotation model, in particular thanks to the possibility to add as many “subtag” levels as the user wishes to. As the annotation model is still under development and might undergo major changes, it seems relevant to compartmentalize the tagsets so as to permit focused changes within one tagset without altering the others.
Experimenting with Catma has been a decisive phase for the team to familiarize with semantic annotation methodology. It has been a crucial step toward building an annotation model and elaborating a methodological approach. Yet, the decision to carry on with the tool depends on a few important factors.
First, having access to a more stable Catma instance would be an improvement. All of us have experienced bugs and other annoying small technical issues. On the “user-friendly” side, being able to create tags and subtags with pre-defined mandatory properties and values fields would be highly time-saving: We expect a very large collection of “action-phrases” subtags. Creating each of them will take time, all the more if, for each new subtag, we have to manually add the same properties and value-lists again (notwithstanding the risk of human error when doing so, which might undermine computing results in the end).
Second, it requires to clearly assess whether we are expecting too much from the tool, and whether it can achieve what we expect it to do. Our annotation methodology is based on one important assumption, which requires further discussion: When comes the time for analysis, can we maintain the links between the tagged portions of text and the tags from different tagsets (which we implicitly build by structuring the tagsets the way we do)?
Third, even though Catma is able to maintain links between the tags and the segments of texts they appear in, another problem will arise when an implicit link exists between an “action phrase” and information found outside of the action phrase itself. This can appear in the following situations:
In the case of “semasiological” tagging, mentioned above;
In the case where an actor is implicitly linked to an action phrase (e.g. when the identity of the actor of an action phrase is only implicit; or when the subject of a long sentence is only present once but assigned to two or more actions, like in the fictitious sentence “the slave ran away, disappeared, and hired himself out as a mariner”).
Our solution to this issue would be to automatically assign unique identifiers to actors and action phrases. This would make possible, for instance, to tag an actor as being assigned a role in an action described in another part of the text through its unique identifier, not by its position in the text; it would also allow to relate an actor to its emic attribute present in another part of the text. However, this not only seems a lot more work, but also not something Catma was designed for (having Catma assign unique identifiers automatically would, again, be time saving, on the condition that two tags can have the same identifier, like in the case of the exact same action being described more than once in a text). Unique identifiers employed to relate tags within a same text shall also be differentiated from identifiers we assign to external reference tables (such as one recording all the biographical information on the individual actors mentioned in the texts).
Finally, we would have to maintain a relation between the work we do with Catma and external tables or tools we use for different purposes. Developing a general “biographical information” table of all the actors mentioned in the texts might serve many purposes, one of which could be to assess the representativity of the “work” situations explored in the project. We also need a complete list of the “action phrase” tags and all their declinations, properties and values for the sake of curating this central part of our project.
1. Action Phrases
An “action-phrase” is any sentence (or segment of a sentence) expressing views on, or describing actions related to labor, to labor relations, and to coercion. Each “action-phrase” tag shall be reduced to a minimal expression using the infinitive form of an English verb (e.g. “to do”) chosen as the closest “translation” of the action described and corresponding as closely as possible to what the text actually says (e.g., “to perform labor” shall be used when the text says an actor performs a kind of labor, not to interpret an action as a labor performance). The first-level subtag (the infinitive form of an English verb) is not assigned any object (like “s.th.”, or “labor” as in the above- mentioned example). Further characterizations of the action will be provided through additional levels of subtags.
In all cases, the “action phrases” tagset will require regular curating activities, either through small discussions or workshops, based on an updated list of the currently used verb-forms.
2. Actors
The actors are all the persons, groups of persons and entities who participate, are involved or concerned with an action, whatever their role, and who are not “contextual actors” (such as time and space, for which there is a separate tagset). So far, we have not much elaborated other tags than the “persons” tag.
3. Phase
The “Entry-Extraction-Exit” model designed by Marcel van der Linden requires additional liminal and post/pre phases that might be mentioned in an action. As a first possibility, some of us considered that the “phase” annotation could have been a property of the action itself. However, we came to consider that an “action-phrase” can involve more than one “phase”. For the moment, we only annotate “action-phrases” or segments thereof as “entry”, “exit”, “extraction”, etc. Properties and values might offer possibilities of refining the “phase” annotation, with regards either of the conditions of “entry”, “exit”, etc., or of sequences.
4. Contextual Information
All information on the time, space, and other context of the actions, including objects and prices. We see Space and Time as “actors” of an “action-phrase” (although in a different way than “persons” and “groups”) as they also play a “role” in an action.
5. Semasiological naming
To be developed…
6. Examples
Example 1
A sentence from a Xing’an huilan case stating (in a raw English translation followed by the original text):
Because her female servant Wu Yunzhu had an illicit sexual intercourse with Liu Chun, who cultivated her land, Mrs. Niu Hulu expelled Liu Chun, and exchanged Wu Yunzhu against a female servant of Tuo Xinbao’s.
This sentence:
- describes two actions (in blue, between brackets and underlined):
- to expel the male farmer
- to exchange the female servant
- both actions have the same main named actor (in green between brackets):
- Mrs. Niu “does” the two above-mentioned “actions (ID 0001 in the table below)
- both actions have the same justification (in orange between brackets):
- Because of an illicit sexual intercourse
- each action has a different named target (in red):
- the named male farmer (who is expelled, ID 0002 in the table below)
- the named female servant (who is exchanged ID 0003 in the table below)
- the second action has two “secondary actors” (in yellow between parenthesis):
- The named person with who the exchange is done (ID 0004)
- The unnamed female servant against whom the first female servant is exchanged (ID 0005)
- There is one “emic label” (the term 使女, in maroon between parenthesis), used both to qualify the named female servant who is exchanged because she had a relation with the farmer, and who is used to name the unnamed female servant against the other is exchanged.
- One issue here, is that the “emic label” present in the action, which is not a legal term but a household designation, is also implicitly related to a legal category mentioned earlier in the text (how to reconnect establish this link?)
Annotations:
Action 1
Tagset | Tags | Subtags 1 | Subtags 2 | Properties | Values |
---|---|---|---|---|---|
Action phrase | “action phrase” | “to exchange” | type | s.o. against. s.o. | |
lawfulness | lawful | ||||
outcome | success | ||||
intention | willingly | ||||
justification | offence committed by target | ||||
occurrence | unique | ||||
target compliance | unrequired (n/a) | ||||
incentive | n/a | ||||
ID | … | ||||
Actors | “person” | ID | 0001 | ||
role | actor | ||||
Employer/med… | n/a | ||||
gender | female | ||||
“person” | ID | 0002 | |||
role | cause of action | ||||
Employer/med… | n/a | ||||
gender | male | ||||
“person” | ID | 0003 | |||
role | target | ||||
Employer/med… | n/a | ||||
gender | female | ||||
“person” | ID | 0004 | |||
role | beneficiary | ||||
Employer/med… | n/a | ||||
gender | male | ||||
“person” | ID | 0005 | |||
role | target | ||||
Employer/med… | n/a | ||||
gender | female |
Contextual information | none directly mentioned in the action, but we know the region where the case took place from elsewhere in the text |
Phase | “exit” although the exchange of servants is, for each of them an exit and a new entry at the same time |
Semasiological | Emic label | type | household role | ||
label | 使女 |
Action 2
Tagset | Tag | Subtag 1 | Subtag 2 | Properties | Values |
---|---|---|---|---|---|
Example 2
A short letter written from Margherita Datini to her husband Francesco Datini, Prato to Florence, on 22 March 1395 (English translation by Carolyn James/Antonio Pagliaro).
Schiatta and his wife were here on Sunday. He came to get a baby [for his wife] to wet nurse, but he didn’t come to an agreement with anyone. He left word that they should come to speak to me for further information because he is one of our workers, but no one came to give me a reply. He has left it to me to search for a baby but I haven’t done so; and I won’t, because it would be wrong and shameful. The woman is old and the milk too abundant, despite the fact that she says she has very little. […]
This letter:
describes three actions (in blue, between brackets and underlined):
- to search employment for so.
- to ask so for help to find an employment for so.
- to refuse to help
the actions have two main named actors (in green between brackets):
- Schiatta, Francesco’s worker, “does” the first two “actions” (ID 0001 in the table below)
- Margherita, Francesco’s wife, “does” the third “action” (ID0002 in the table below)
the second and third action are followed by a justification (in orange between brackets):
- Schiatta asks for help “because he was Francesco’s worker”
- Margherita refuses to help “because it would be wrong and shameful”
all actions have the same named target (in red):
- Schiatta’s wife [nameless] (whose services as wet nurse are offered and refused, ID 0003 in the table below)
all actions have “secondary actors” (in yellow between parenthesis):
- no one wants to employ Schiatta’s wife (unspecified, no ID)
- no one contacts Margherita to employ Schiatta’s wife (unspecified, no ID)
- Margherita reports all three actions to her husband, Francesco Datini (ID 0004)
the third action is followed by emic labels and attributes (in maroon between brackets) used to qualify Schiatta’s wife (ID 0003)