| layout | default |
|---|---|
| title | Multi-Word Tokens |
Previous discussion can be found here.
German
1-2 im _ _
1 in in PREP
1 dem der DET
Czech
4-5 abych _ _
4 aby aby SCONJ
5 bych bÿt AUX
- Put in separate views
- Put in same view different annotation types
- Put in same view differnt tokenTypes
- Single token with features
{
"text": {
"@value": "im",
"@language": "de"
},
"views": [
{
"id": "v1",
"metadata": {
"contains": {
"http://vocab.lappsgrid.org/Token": {
"type": "lumped"
}
}
},
"annotations": [
{
"@type": "Token",
"id": "tk0",
"start": 0,
"end": 2
}
]
},
{
"id": "v2",
"metadata": {
"contains": {
"http://vocab.lappsgrid.org/Token": {
"type": "split"
}
}
},
"annotations": [
{
"@type": "Token",
"id": "tk0",
"targets": "v1:tk0"
},
{
"@type": "Token",
"id": "tk1",
"targets": "v1:tk0"
}
]
}
]
}Issues
- Complicates processing as tools will need to look in two (or more) views to reconcile all information. Naive tools may end up with the wrong token view.
The surface token is annotated with http://vocab.lappsgrid.org/Token and the component tokens with http://vocab.lappsgrid.org/Word
{
"text": {
"@value": "im",
"@language": "de"
},
"views": [
{
"id": "v1",
"metadata": {
"contains": {
"http://vocab.lappsgrid.org/Token": {
"type": "lumped"
},
"http://vocab.lappsgrid.org/Word": {
"type": "lumped"
}
}
},
"annotations": [
{
"@type": "Token",
"id": "tk0",
"start": 0,
"end": 2
},
{
"@type": "Word",
"id": "w0",
"features": {
"targets": "tk0",
"position": "1"
}
},
{
"@type": "Word",
"id": "w1",
"features": {
"targets": "tk0",
"position": "2"
}
}
]
}
]
}Issues
- How to annotate the Token with pos and lemma annotations.
The surface token and component tokens are annotated with http://vocab.lappsgrid.org/Token and the component tokens have the tokenType feature set.
{
"id": "tok4-5",
"start": 177,
"end": 182,
"@type": "http://vocab.lappsgrid.org/Token",
"features": {
"word": "abych",
"targets": [
"mwt-4",
"mwt-5"
]
}
},
{
"id": "mwt-4",
"@type": "http://vocab.lappsgrid.org/Token",
"features": {
"word": "aby",
"lemma": "aby",
"pos": "SCONJ",
"targets": [
"tok4-5"
],
"tokenType": "http://vocab.lappsgrid.org/ns/syntax/mwt"
}
},
{
"id": "mwt-5",
"@type": "http://vocab.lappsgrid.org/Token",
"features": {
"word": "bych",
"lemma": "b\u00fdt",
"pos": "AUX",
"targets": [
"tok4-5"
],
"tokenType": "http://vocab.lappsgrid.org/ns/syntax/mwt"
}
},The surface token is annotated with http://vocab.lappsgrid.org/Token and the component tokens are features of the Token.
{
"id": "tok4-5",
"start": 177,
"end": 182,
"@type": "http://vocab.lappsgrid.org/Token",
"features": {
"word": "abych",
"components": [
{
"word": "aby",
"lemma": "aby",
"pos": "SCONJ"
},
{
"word": "bych",
"lemma": "b\u00fdt",
"pos": "AUX"
}
]
}
}Issues
- What should really be an annotation is now the feature of another annotation.