Skip to content

Commit 8aaaaf0

Browse files
jenniferjiangkellsJennifer Jiang-Kells
andauthored
Improve function io type checking (#38)
* Dynamically infer service func input type and return Card model instead of Dict * Renamed DataGenerator -> CDSDatagenerator and output model -> GeneratedFhirData to make it more clear for users when importing * Update tests * Rename GeneratedFhirData -> CdsFhirData * Update documentation * Make save file name specific and shorter * Add error checking to cds io * Fix tests * Added error check for non-Card return type in cds service function * Added tests * Rollback timestamp changes --------- Co-authored-by: Jennifer Jiang-Kells <[email protected]>
1 parent f4fc383 commit 8aaaaf0

File tree

14 files changed

+203
-131
lines changed

14 files changed

+203
-131
lines changed

docs/quickstart.md

Lines changed: 69 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -45,21 +45,24 @@ A client is a healthcare system object that requests information and processing
4545

4646
A client is typically an EHR system, but we may also support other health objects in the future such as a CPOE (Computerized Ohysician Order Entry).
4747

48-
We can mark a client by using the decorator `@hc.ehr`. You **must** declare a **workflow** for EHR clients, which informs the sandbox how your data will be formatted (See [Use Cases](usecases.md)).
48+
We can mark a client by using the decorator `@hc.ehr`. You must declare a particular **workflow** for the EHR client, which informs the sandbox how your data will be formatted (See [Use Cases](usecases.md)).
49+
50+
Data returned from the client should be wrapped in a [Pydantic](https://docs.pydantic.dev/latest/) model depending on use case, e.g. `CdsFhirData`.
4951

5052
You can optionally specify if you want more than 1 request generated with the `num` parameter.
5153

5254
```python
5355
import healthchain as hc
5456
from healthchain.use_cases import ClinicalDecisionSupport
57+
from healthchain.models import CdsFhirData
5558

5659
@hc.sandbox
5760
class MyCoolSandbox(ClinicalDecisionSupport):
5861
def __init__(self) -> None:
5962
pass
6063

6164
@hc.ehr(workflow="patient-view", num=10)
62-
def load_data_in_client(self):
65+
def load_data_in_client(self) -> CdsFhirData:
6366
# Do things here to load in your data
6467
pass
6568

@@ -69,9 +72,9 @@ class MyCoolSandbox(ClinicalDecisionSupport):
6972
### Data Generator
7073
Healthcare data is interoperable, but not composable - every deployment site will have different ways of configuring data and terminology. This matters when you develop applications that need to integrate into these systems, especially when you need to reliably extract data for your model to consume.
7174

72-
The aim of the Data Generator is not to generate realistic data suitable for use cases such as patient population studies, but rather to generate data that is structurally compliant with what is expected of EHR configurations, and to be able to test and handle variations in this.
75+
The aim of the data generator is not to generate realistic data suitable for use cases such as patient population studies, but rather to generate data that is structurally compliant with what is expected of EHR configurations, and to be able to test and handle variations in this.
7376

74-
For this reason the data generator is opiniated by use case and workflow. See [Use Cases](usecases.md).
77+
For this reason the data generator is opiniated by use case and workflow. See [Use Cases](usecases.md) for more information.
7578

7679
!!! note
7780
We're aware we may not cover everyone's use cases, so if you have strong opinions about this, please [reach out](https://discord.gg/jG4UWCUh)!
@@ -80,43 +83,54 @@ On the synthetic data spectrum defined by [this UK ONS methodology working paper
8083

8184
![Synthetic data](assets/synthetic_data_ons.png)
8285

83-
You can use the data generator within a Client function or on its own. The `.data` attribute contains a Pydantic class containing `context` and `resources`.
86+
You can use the data generator within a client function or on its own. The `.generate()` is dependent on workflow. For CDS use cases, it will return a `CdsFhirData` model with the `prefetch` field populated with a [Bundle](https://www.hl7.org/fhir/bundle.html) of generated structural synthetic FHIR data.
8487

8588
=== "Within client"
8689
```python
8790
import healthchain as hc
88-
from healthchain.data_generator import DataGenerator
8991
from healthchain.use_cases import ClinicalDecisionSupport
92+
from healthchain.models import CdsFhirData
93+
from healthchain.data_generator import CdsDataGenerator
9094

9195
@hc.sandbox
9296
class MyCoolSandbox(ClinicalDecisionSupport):
9397
def __init__(self) -> None:
94-
self.data_generator = DataGenerator()
98+
self.data_generator = CdsDataGenerator()
9599

96100
@hc.ehr(workflow="patient-view")
97-
def load_data_in_client(self):
98-
self.data_generator.generate()
99-
return self.data_generator.data
101+
def load_data_in_client(self) -> CdsFhirData:
102+
data = self.data_generator.generate()
103+
return data
100104

101105
@hc.api
102-
def my_server(self, text):
106+
def my_server(self, request) -> None:
103107
pass
104108
```
105109

106110

107111
=== "On its own"
108112
```python
109-
from healthchain.data_generator import DataGenerator
113+
from healthchain.data_generator import CdsDataGenerator
110114
from healthchain.base import Workflow
111115

112116
# Initialise data generator
113-
data_generator = DataGenerator()
117+
data_generator = CdsDataGenerator()
114118

115119
# Generate FHIR resources for use case workflow
116120
data_generator.set_workflow(Workflow.encounter_discharge)
117-
data_generator.generate()
118-
119-
print(data_generator.data.resources.model_dump(by_alias=True, exclude_unset=True))
121+
data = data_generator.generate()
122+
123+
print(data.model_dump())
124+
125+
# {
126+
# "prefetch": {
127+
# "entry": [
128+
# {
129+
# "resource": ...
130+
# }
131+
# ]
132+
# }
133+
#}
120134
```
121135

122136
<!-- You can pass in parameters in `contraint` argument to limit the general form of the FHIR resources you get back, but this feature is experimental. Arguments supported are:
@@ -147,10 +161,12 @@ data_generator.generate(free_text_csv="./dir/to/csv/file")
147161

148162

149163
### Service API
150-
A service is typically an API of an external AI/NLP system that returns data to the client. This is where you define your application logic - it can be anything from a simple regex to a highly sophisticated LLM agentic workflow. The only constraint is that you have to return your data as a `Dict` that your workflow expects.
164+
A service is typically an API of an external AI/NLP system that returns data to the client. This is where you define your application logic - it can be anything from a simple regex to a highly sophisticated LLM agentic workflow.
151165

152166
When you decorate a function with `@hc.api` in a sandbox, the function is mounted to a HL7-compliant service endpoint an EHR client can make requests to. This is usually a set of standardised API routes depending on the use case. HealthChain will start a [FastAPI](https://fastapi.tiangolo.com/) server with these APIs pre-defined for you.
153167

168+
Your service function must accept and return models appropriate for your use case. Typically the service function should accept a `Request` model and return a use case specific model, such as a list of `Card` for CDS.
169+
154170
If you are using a model that requires initialisation steps, we recommend you initialise this in your class `__init__`.
155171

156172
=== "Transformers"
@@ -161,35 +177,34 @@ If you are using a model that requires initialisation steps, we recommend you in
161177
import healthchain as hc
162178

163179
from healthchain.use_cases import ClinicalDecisionSupport
164-
from healthchain.data_generator import DataGenerator
180+
from healthchain.data_generator import CdsDataGenerator
181+
from healthchain.models import Card, CDSRequest, CdsFhirData
165182
from transformers import pipeline
166183

167-
from typing import Dict
184+
from typing import List
168185

169186
@hc.sandbox
170187
class MyCoolSandbox(ClinicalDecisionSupport):
171188
def __init__(self):
172-
self.data_generator = DataGenerator()
189+
self.data_generator = CdsDataGenerator()
173190
self.pipeline = pipeline('summarization')
174191

175-
@hc.ehr(workflow="patient-view")
176-
def load_data_in_client(self):
177-
self.data_generator.generate()
178-
return self.data_generator.data
192+
@hc.ehr(workflow="patient-view") -> CdsFhirData
193+
def load_data_in_client(self) -> CdsFhirData:
194+
data = self.data_generator.generate()
195+
return data
179196

180197
@hc.api
181-
def my_service(self, text: str):
182-
results = self.pipeline(text)
183-
return {
184-
"cards": [
185-
{
186-
"summary": "Patient summary",
187-
"indicator": "info",
188-
"source": {"label": "transformer"},
189-
"detail": results[0]['summary_text']
190-
}
191-
]
192-
}
198+
def my_service(self, request: CDSRequest) -> List[Card]:
199+
results = self.pipeline(str(request.prefetch))
200+
return [
201+
Card(
202+
summary="Patient summary",
203+
indicator="info",
204+
source={"label": "transformers"},
205+
detail=results[0]['summary_text'],
206+
)
207+
]
193208

194209
if __name__ == "__main__":
195210
cds = MyCoolSandbox()
@@ -203,49 +218,47 @@ If you are using a model that requires initialisation steps, we recommend you in
203218
import healthchain as hc
204219

205220
from healthchain.use_cases import ClinicalDecisionSupport
206-
from healthchain.data_generator import DataGenerator
221+
from healthchain.data_generator import CdsDataGenerator
222+
from healthchain.models import Card, CdsFhirData, CDSRequest
207223

208224
from langchain_openai import ChatOpenAI
209225
from langchain_core.prompts import PromptTemplate
210226
from langchain_core.output_parsers import StrOutputParser
211227

212-
from typing import Dict
228+
from typing import List
213229

214230
@hc.sandbox
215231
class MyCoolSandbox(ClinicalDecisionSupport):
216232
def __init__(self):
217233
self.chain = self._init_llm_chain()
218-
self.data_generator = DataGenerator()
234+
self.data_generator = CdsDataGenerator()
219235

220236
def _init_llm_chain(self):
221237
prompt = PromptTemplate.from_template(
222238
"Summarize the text below {text}"
223-
)
239+
)
224240
model = ChatOpenAI(model="gpt-4o")
225241
parser = StrOutputParser()
226242

227243
chain = prompt | model | parser
228-
229244
return chain
230245

231246
@hc.ehr(workflow="patient-view")
232-
def load_data_in_client(self):
233-
self.data_generator.generate()
234-
return self.data_generator.data
247+
def load_data_in_client(self) -> CdsFhirData:
248+
data = self.data_generator.generate()
249+
return data
235250

236251
@hc.api
237-
def my_service(self, text: str) -> Dict:
238-
result = self.chain.invoke(text)
239-
return {
240-
"cards": [
241-
{
242-
"summary": "Patient summary",
243-
"indicator": "info",
244-
"source": {"label": "openai"},
245-
"detail": result
246-
}
247-
]
248-
}
252+
def my_service(self, request: CDSRequest) -> List[Card]:
253+
result = self.chain.invoke(str(request.prefetch))
254+
return [
255+
Card(
256+
summary="Patient summary",
257+
indicator="info",
258+
source={"label": "openai"},
259+
detail=result,
260+
)
261+
]
249262

250263
if __name__ == "__main__":
251264
cds = MyCoolSandbox()

healthchain/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
import logging
22
from .utils.logger import add_handlers
33
from healthchain.decorators import ehr, api, sandbox
4-
from healthchain.data_generator.data_generator import DataGenerator
4+
from healthchain.data_generator.data_generator import CdsDataGenerator
55
from healthchain.models.requests.cdsrequest import CDSRequest
66

77
logger = logging.getLogger(__name__)
88
add_handlers(logger)
99
logger.setLevel(logging.INFO)
1010

1111
# Export them at the top level
12-
__all__ = ["ehr", "api", "sandbox", "DataGenerator", "CDSRequest"]
12+
__all__ = ["ehr", "api", "sandbox", "CdsDataGenerator", "CDSRequest"]

healthchain/data_generator/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from .procedure_generators import ProcedureGenerator
66
from .medication_administration_generators import MedicationAdministrationGenerator
77
from .medication_request_generators import MedicationRequestGenerator
8-
from .data_generator import DataGenerator
8+
from .data_generator import CdsDataGenerator
99

1010
__all__ = [
1111
"EncounterGenerator",
@@ -15,5 +15,5 @@
1515
"ProcedureGenerator",
1616
"MedicationAdministrationGenerator",
1717
"MedicationRequestGenerator",
18-
"DataGenerator",
18+
"CdsDataGenerator",
1919
]

healthchain/data_generator/data_generator.py

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,20 @@
1+
import random
2+
import csv
3+
import logging
4+
5+
from pydantic import BaseModel
16
from typing import Callable, Optional
7+
from pathlib import Path
8+
9+
from healthchain.base import Workflow
210
from healthchain.fhir_resources.bundle_resources import BundleModel, Bundle_EntryModel
311
from healthchain.data_generator.base_generators import generator_registry
412
from healthchain.fhir_resources.document_reference_resources import (
513
DocumentReferenceModel,
614
)
715
from healthchain.fhir_resources.general_purpose_resources import NarrativeModel
8-
from healthchain.base import Workflow
9-
from pydantic import BaseModel
10-
from pathlib import Path
11-
16+
from healthchain.models.data.cdsfhirdata import CdsFhirData
1217

13-
import random
14-
import csv
15-
16-
import logging
1718

1819
logger = logging.getLogger(__name__)
1920

@@ -36,16 +37,11 @@
3637
# TODO: Some of the resources should be allowed to be multiplied
3738

3839

39-
class OutputDataModel(BaseModel):
40-
context: dict = {}
41-
resources: BundleModel
42-
43-
44-
class DataGenerator:
40+
class CdsDataGenerator:
4541
def __init__(self):
4642
self.registry = generator_registry
4743
self.mappings = workflow_mappings
48-
self.data = []
44+
self.data: CdsFhirData = None
4945

5046
def fetch_generator(self, generator_name: str) -> Callable:
5147
return self.registry.get(generator_name)
@@ -78,7 +74,7 @@ def generate(
7874
)
7975
if parsed_free_text:
8076
results.append(Bundle_EntryModel(resource=random.choice(parsed_free_text)))
81-
output = OutputDataModel(context={}, resources=BundleModel(entry=results))
77+
output = CdsFhirData(prefetch=BundleModel(entry=results))
8278
self.data = output
8379
return output
8480

healthchain/decorators.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
from .base import BaseUseCase, Workflow, UseCaseType
1515
from .clients import EHRClient
1616
from .service.service import Service
17-
from .data_generator.data_generator import DataGenerator
17+
from .data_generator.data_generator import CdsDataGenerator
1818
from .utils.apimethod import APIMethod
1919
from .utils.urlbuilder import UrlBuilder
2020

@@ -26,12 +26,12 @@
2626

2727
def generate_filename(prefix: str, unique_id: str, index: int):
2828
timestamp = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")
29-
filename = f"{timestamp}_sandbox_{unique_id}_{prefix}_{index}.json"
29+
filename = f"{timestamp}_sandbox_{unique_id[:8]}_{prefix}_{index}.json"
3030
return filename
3131

3232

3333
def save_as_json(data, prefix, sandbox_id, index, save_dir):
34-
save_name = generate_filename(prefix, sandbox_id, index)
34+
save_name = generate_filename(prefix, str(sandbox_id), index)
3535
file_path = save_dir / save_name
3636
with open(file_path, "w") as outfile:
3737
json.dump(data, outfile, indent=4)
@@ -131,7 +131,7 @@ def wrapper(self, *args: Any, **kwargs: Any) -> EHRClient:
131131
)
132132

133133
# Set workflow in data generator if configured
134-
data_generator_attributes = find_attributes_of_type(self, DataGenerator)
134+
data_generator_attributes = find_attributes_of_type(self, CdsDataGenerator)
135135
for i in range(len(data_generator_attributes)):
136136
attribute_name = data_generator_attributes[i]
137137
try:

healthchain/models/__init__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
from .requests.cdsrequest import CDSRequest
2+
from .responses.cdsresponse import Card
3+
from .responses.cdsresponse import CDSResponse
4+
from .responses.cdsdiscovery import CDSService
5+
from .data.cdsfhirdata import CdsFhirData
6+
7+
__all__ = ["CDSRequest", "Card", "CDSResponse", "CDSService", "CdsFhirData"]
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
from pydantic import BaseModel, Field
2+
from typing import Dict
3+
4+
from ...fhir_resources.bundle_resources import BundleModel
5+
6+
7+
class CdsFhirData(BaseModel):
8+
context: Dict = Field(default={})
9+
prefetch: BundleModel
10+
11+
def model_dump(self, *args, **kwargs):
12+
kwargs.setdefault("exclude_unset", True)
13+
kwargs.setdefault("exclude_none", True)
14+
kwargs.setdefault("by_alias", True)
15+
16+
return super().model_dump(*args, **kwargs)

0 commit comments

Comments
 (0)