Skip to content

astropenguin/typespecs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

131 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Typespecs

Release Python Downloads DOI Tests

Data specifications by type hints

Overview

Typespecs is a lightweight Python library that leverages typing.Annotated to embed, extract, and manage metadata (such as units, categories, and descriptions) directly within your data structures. It keeps your code clean by binding specifications directly to your type hints. The extracted specifications are returned as a transparent subclass of pandas.DataFrame, making it instantly compatible with the rich PyData ecosystem.

Installation

pip install typespecs

Basic Usage

You can attach metadata to your class fields using Annotated and the typespecs.Spec object. The Spec object acts as a read-only dictionary, ensuring your metadata remains immutable and safe from runtime modifications. Once your data structure is defined, use typespecs.from_annotated to parse the instance and extract both the actual data and its associated metadata into a DataFrame object.

from dataclasses import dataclass
from typespecs import ITSELF, Spec, from_annotated
from typing import Annotated as Ann, TypeVar


@dataclass
class Weather:
    temp: Ann[list[float], Spec(category="data", name="Temperature", units="K")]
    wind: Ann[list[float], Spec(category="data", name="Wind speed", units="m/s")]
    loc: Ann[str, Spec(category="info", name="Observed location")]


weather = Weather([273.15, 280.15], [5.0, 10.0], "Tokyo")
specs = from_annotated(weather)
print(specs)
      category              data               name           type units
temp      data  [273.15, 280.15]        Temperature    list[float]     K
wind      data       [5.0, 10.0]         Wind speed    list[float]   m/s
loc       info             Tokyo  Observed location  <class 'str'>  <NA>

Advanced Usage

Handling Sub-annotations

Typespecs simplifies working with nested types. You can easily create reusable type aliases with built-in specifications. Furthermore, by using the special typespecs.ITSELF object, the library dynamically captures the subtype (e.g., float in list[float]) as one of metadata.

T = TypeVar("T")
Dtype = Ann[T, Spec(dtype=ITSELF)]


@dataclass
class Weather:
    temp: Ann[list[Dtype[float]], Spec(category="data", name="Temperature", units="K")]
    wind: Ann[list[Dtype[float]], Spec(category="data", name="Wind speed", units="m/s")]
    loc: Ann[str, Spec(category="info", name="Observed location")]


weather = Weather([273.15, 280.15], [5.0, 10.0], "Tokyo")
specs = from_annotated(weather)
print(specs)
      category              data            dtype               name           type units
temp      data  [273.15, 280.15]  <class 'float'>        Temperature    list[float]     K
wind      data       [5.0, 10.0]  <class 'float'>         Wind speed    list[float]   m/s
loc       info             Tokyo             <NA>  Observed location  <class 'str'>  <NA>

Handling Missing Values

By default, missing metadata values are filled with pandas.NA. You can override this behavior and specify a custom fallback value by using the default parameter in from_annotated.

specs = from_annotated(weather, default=None)
print(specs)
      category              data            dtype               name           type units
temp      data  [273.15, 280.15]  <class 'float'>        Temperature    list[float]     K
wind      data       [5.0, 10.0]  <class 'float'>         Wind speed    list[float]   m/s
loc       info             Tokyo             None  Observed location  <class 'str'>  None

Handling Full Specification

By default, typespecs neatly merges nested metadata (e.g., float in list[float]) into a single parent row. If you need to inspect the exact structural hierarchy of your annotations, set merge=False in from_annotated. This unpacks the tree, distinguishing between the parent collection and its elements.

specs = from_annotated(weather, merge=False)
print(specs)
        category              data            dtype               name             type units
temp        data  [273.15, 280.15]             <NA>        Temperature      list[float]     K
temp/0      <NA>              <NA>  <class 'float'>               <NA>  <class 'float'>  <NA>
wind        data       [5.0, 10.0]             <NA>         Wind speed      list[float]   m/s
wind/0      <NA>              <NA>  <class 'float'>               <NA>  <class 'float'>  <NA>
loc         info             Tokyo             <NA>  Observed location    <class 'str'>  <NA>