From 101732c504f1272d0bbe27b064549eab9b70e239 Mon Sep 17 00:00:00 2001 From: Yan Wong Date: Wed, 26 Nov 2025 17:34:37 +0000 Subject: [PATCH 1/5] Update glossary.md Reference ARGs in the glossary --- docs/glossary.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/docs/glossary.md b/docs/glossary.md index 3317c9a27b..f2148eb26b 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -36,11 +36,16 @@ tree (sec_data_model_definitions_tree_sequence)= tree sequence -: A "succinct tree sequence" (or tree sequence, for brevity) is an efficient - encoding of a sequence of correlated trees, such as one encounters looking - at the gene trees along a genome. A tree sequence efficiently captures the - structure shared by adjacent trees, (essentially) storing only what differs - between them. +: A "succinct tree sequence" (or tree sequence, for brevity) is an object + that stores the genetic ancestry and mutational history of a set of + aligned DNA sequences. The name reflects the idea that a common + way to treat genetic ancestry is as a sequence of correlated + "trees" along the genome; a tree sequence provides an efficient + way to store differences between these trees. Technically, ancestry + is encoded by linking *nodes* (genomes) via *edges*, forming a network + or graph. Graphs of this sort are sometimes known as ancestral + recombination graphs (ARGs), so that tree sequences provide a + flexible way to encode multiple types of ARG. (sec_data_model_definitions_node)= From 64d5bea3fea92e48a7bf35679a58641029afefa8 Mon Sep 17 00:00:00 2001 From: Yan Wong Date: Wed, 3 Dec 2025 14:45:04 +0000 Subject: [PATCH 2/5] Tidy and add more to the glossary --- docs/glossary.md | 44 ++++++++++++++++++++++++++++++-------------- 1 file changed, 30 insertions(+), 14 deletions(-) diff --git a/docs/glossary.md b/docs/glossary.md index f2148eb26b..926a0a21de 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -30,30 +30,36 @@ Here are some definitions of some key ideas encountered in this documentation. tree : A "gene tree", i.e., the genealogical tree describing how a collection of genomes (usually at the tips of the tree) are related to each other at some - chromosomal location. See {ref}`sec_nodes_or_individuals` for discussion - of what a "genome" is. + chromosomal {ref}`position ` or location. + As the trees may vary depending on this location, they are also known as "local + trees". See {ref}`sec_nodes_or_individuals` for discussion of what a "genome" is. (sec_data_model_definitions_tree_sequence)= tree sequence : A "succinct tree sequence" (or tree sequence, for brevity) is an object that stores the genetic ancestry and mutational history of a set of - aligned DNA sequences. The name reflects the idea that a common + aligned DNA sequences or genomes. The name reflects the idea that a common way to treat genetic ancestry is as a sequence of correlated - "trees" along the genome; a tree sequence provides an efficient - way to store differences between these trees. Technically, ancestry - is encoded by linking *nodes* (genomes) via *edges*, forming a network - or graph. Graphs of this sort are sometimes known as ancestral - recombination graphs (ARGs), so that tree sequences provide a + {ref}`trees ` at different chromosomal + {ref}`positions `. + Branches that are shared between these trees are efficiently stored as a + single {ref}`edge `, and adjacent trees + may differ by only a few such edges. These edges connect + {ref}`nodes ` (genomes) in + the tree sequence, forming a structure which is technically known as a + network or graph. Graphs of this sort are sometimes called ancestral + recombination graphs (ARGs), hence tree sequences provide a flexible way to encode multiple types of ARG. (sec_data_model_definitions_node)= node -: Each branching point in each tree is associated with a particular genome +: Any point in a tree can be associated with a particular genome in a particular ancestor, called a "node". Since each node represents a - specific genome it has a unique `time`, thought of as its birth time, - which determines the height of any branching points it is associated with. + specific genome it has a unique `time`, thought of as its birth time. Any + branching point in a tree much be associated with a node; that node's time + determines the height of the branching point. See {ref}`sec_nodes_or_individuals` for discussion of what a "node" is. (sec_data_model_definitions_individual)= @@ -71,7 +77,7 @@ individual sample : The focal nodes of a tree sequence, usually thought of as those from which we have obtained data. The specification of these affects various - methods: (1) {meth}`TreeSequence.variants` and + methods: {meth}`TreeSequence.variants` and {meth}`TreeSequence.haplotypes` will output the genotypes of the samples, and {attr}`Tree.roots` only return roots ancestral to at least one sample. @@ -86,13 +92,15 @@ edge : The topology of a tree sequence is defined by a set of **edges**. Each edge is a tuple `(left, right, parent, child)`, which records a parent-child relationship among a pair of nodes on the - on the half-open interval of chromosome `[left, right)`. + on the half-open interval `[left, right)` along the genome. The difference + between `left` and `right` is known as the "span" of the edge. (sec_data_model_definitions_site)= site : Tree sequences can define the mutational state of nodes as well as their - topological relationships. A **site** is thought of as some position along + topological relationships. A **site** is thought of as some + {ref}`position ` along the genome at which variation occurs. Each site is associated with a unique position and ancestral state. @@ -119,6 +127,14 @@ migration population : A grouping of nodes, e.g., by sampling location. +(sec_data_model_definitions_position)= + +position +: A location along the genome, from 0 to the + {ref}`sequence length`. In `tskit` + positions are stored as floating-point numbers, although it is common to + restrict positions to occur at discrete integer locations. + (sec_data_model_definitions_provenance)= provenance From 19e66805a354828e18c4154b189b1efdf4197f2a Mon Sep 17 00:00:00 2001 From: Yan Wong Date: Wed, 3 Dec 2025 16:08:56 +0000 Subject: [PATCH 3/5] Update docs/glossary.md Co-authored-by: Gregor Gorjanc --- docs/glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/glossary.md b/docs/glossary.md index 926a0a21de..58771a7e32 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -58,7 +58,7 @@ node : Any point in a tree can be associated with a particular genome in a particular ancestor, called a "node". Since each node represents a specific genome it has a unique `time`, thought of as its birth time. Any - branching point in a tree much be associated with a node; that node's time + branching point in a tree must be associated with a node; that node's time determines the height of the branching point. See {ref}`sec_nodes_or_individuals` for discussion of what a "node" is. From 27d6c305c77436d9447ba1294a8da3e44750fdae Mon Sep 17 00:00:00 2001 From: Yan Wong Date: Thu, 4 Dec 2025 13:31:45 +0000 Subject: [PATCH 4/5] Update docs/glossary.md Co-authored-by: Jerome Kelleher --- docs/glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/glossary.md b/docs/glossary.md index 58771a7e32..a9dca30641 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -47,7 +47,7 @@ tree sequence single {ref}`edge `, and adjacent trees may differ by only a few such edges. These edges connect {ref}`nodes ` (genomes) in - the tree sequence, forming a structure which is technically known as a + the tree sequence, forming a network or graph. Graphs of this sort are sometimes called ancestral recombination graphs (ARGs), hence tree sequences provide a flexible way to encode multiple types of ARG. From bc42f9e147001585c3db3ac4edc96b30640e751f Mon Sep 17 00:00:00 2001 From: Yan Wong Date: Thu, 4 Dec 2025 13:39:40 +0000 Subject: [PATCH 5/5] Remove reference to node heights --- docs/glossary.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/glossary.md b/docs/glossary.md index a9dca30641..20cca9204f 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -57,10 +57,12 @@ tree sequence node : Any point in a tree can be associated with a particular genome in a particular ancestor, called a "node". Since each node represents a - specific genome it has a unique `time`, thought of as its birth time. Any - branching point in a tree must be associated with a node; that node's time - determines the height of the branching point. - See {ref}`sec_nodes_or_individuals` for discussion of what a "node" is. + specific genome it has a unique `time`, thought of as its birth time. Nodes + may or may not correspond to branching points, either in a local + {ref}`tree ` or in the whole graph. + However a branching point must always be associated with a node. + See {ref}`sec_nodes_or_individuals` for discussion of what a "node" + represents. (sec_data_model_definitions_individual)=