Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
451cd77
Add tutorials to .gitignore
RodriguesRBruno Apr 22, 2025
aca297e
Standardise notation to Data Owner
RodriguesRBruno Apr 22, 2025
7e1b561
Explicitly mention what is the workspace directory in Section 2, simi…
RodriguesRBruno Apr 22, 2025
2e51ebb
Mention venv in local server installation
RodriguesRBruno Apr 22, 2025
925a57f
Fix medperf_tutorial in .gitignore
RodriguesRBruno Apr 22, 2025
f5e4139
Conditional logout if already logged in for tutorial scripts
RodriguesRBruno Apr 22, 2025
fb2fffb
Single cleanup script. Also fixes pointing to wrong directory in cleanup
RodriguesRBruno Apr 22, 2025
936eeda
Merge branch 'main' of https://github.com/mlcommons/medperf into upda…
RodriguesRBruno Apr 23, 2025
e33a5f7
Explicitly mention cleanup refers to the local
RodriguesRBruno Apr 23, 2025
e286a4c
Close code block
RodriguesRBruno Apr 23, 2025
fa3cf2d
Fix final cd to properly call medperf_login script
RodriguesRBruno Apr 23, 2025
e7f6a0f
Change task name so it matches what is in the MLCube
RodriguesRBruno Apr 23, 2025
87d6055
Revert "Change task name so it matches what is in the MLCube"
RodriguesRBruno Apr 23, 2025
c6d77d1
Update model UID
RodriguesRBruno Apr 23, 2025
957fee3
Fix user for association approval simulation
RodriguesRBruno Apr 23, 2025
3cb878b
Update MLCube IDs
RodriguesRBruno Apr 23, 2025
0676462
Better formattingat the end of section 4
RodriguesRBruno Apr 23, 2025
b81e581
Update MLCube hashes to match DockerHub
RodriguesRBruno Apr 23, 2025
b8310dd
Revert "Update MLCube hashes to match DockerHub"
RodriguesRBruno Apr 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,6 @@ server/keys
!examples/fl/mock_cert/project/ca/cert/root.crt
!flca/dev_assets/intermediate_ca.crt
!flca/dev_assets/root_ca.crt

# Medperf Tutorials
medperf_tutorial/
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Additionally, here you can see how others used MedPerf already: [https://scholar

## Pilot Studies ##

MedPerf was also further utilized to support academic medical research on both public and private data through efforts across Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, University of Pennsylvania, Penn Medicine, University of Pennsylvania Health System, University of Strasbourg, Institute of Image-Guided Surgery (IHU Strasbourg), Fondazione Policlinico Universitario Agostino Gemelli IRCCS, University of California San Francisco, and other academic institutions. The figure below displays the data provider locations used in all pilot experiments. 🟢: Pilot 1 - Brain Tumor Segmentation Pilot Experiment; 🔴: Pilot 2 - Pancreas Segmentation Pilot Experiment. 🔵: Pilot 3 - Surgical Workflow Phase Recognition Pilot Experiment. Pilot 4 - Cloud Experiments, used data and processes from Pilot 1 and 2.
MedPerf was also further utilized to support academic medical research on both public and private data through efforts across Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, University of Pennsylvania, Penn Medicine, University of Pennsylvania Health System, University of Strasbourg, Institute of Image-Guided Surgery (IHU Strasbourg), Fondazione Policlinico Universitario Agostino Gemelli IRCCS, University of California San Francisco, and other academic institutions. The figure below displays the data owner locations used in all pilot experiments. 🟢: Pilot 1 - Brain Tumor Segmentation Pilot Experiment; 🔴: Pilot 2 - Pancreas Segmentation Pilot Experiment. 🔵: Pilot 3 - Surgical Workflow Phase Recognition Pilot Experiment. Pilot 4 - Cloud Experiments, used data and processes from Pilot 1 and 2.

![image](https://user-images.githubusercontent.com/25375373/163238058-6cf16f00-5238-4c80-8b58-d86f291a5bcf.png)

Expand Down
14 changes: 7 additions & 7 deletions docs/getting_started/benchmark_owner_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ A demo dataset is a small reference dataset. It contains a few data records and

2. When a model owner wants to participate in the benchmark, the MedPerf client tests the compatibility of their model with the benchmark's data preparation cube and metrics cube. The test is run using the benchmark's demo dataset as input.

For this tutorial, you are provided with a demo dataset for the chest X-ray classification workflow. The dataset can be found in your workspace folder under `demo_data`. It is a small dataset comprising two chest X-ray images and corresponding thoracic disease labels.
For this tutorial, you are provided with a demo dataset for the chest X-ray classification workflow. The dataset can be found in your workspace folder (`medperf_tutorial`) under `demo_data`. It is a small dataset comprising two chest X-ray images and corresponding thoracic disease labels.

You can test the workflow now that you have the three MLCubes and the demo data. Testing the workflow before submitting any asset to the MedPerf server is usually recommended.

Expand Down Expand Up @@ -277,9 +277,9 @@ You need to keep at hand the following information:
```

- For this tutorial, the UIDs are as follows:
- Data preparator UID: `1`
- Reference model UID: `2`
- Evaluator UID: `3`
- Data preparator UID: `2`
- Reference model UID: `3`
- Evaluator UID: `4`

You can create and submit your benchmark using the following command:

Expand All @@ -288,9 +288,9 @@ medperf benchmark submit \
--name tutorial_bmk \
--description "MedPerf demo bmk" \
--demo-url "{{ demo_url }}" \
--data-preparation-mlcube 1 \
--reference-model-mlcube 2 \
--evaluator-mlcube 3 \
--data-preparation-mlcube 2 \
--reference-model-mlcube 3 \
--evaluator-mlcube 4 \
--operational
```

Expand Down
18 changes: 11 additions & 7 deletions docs/getting_started/data_owner_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,11 @@ _For the sake of continuing the tutorial only_, run the following to simulate th
sh tutorials_scripts/simulate_data_association_approval.sh
```

You can verify if your association request has been approved by running `medperf association ls -bd`.
You can verify if your association request has been approved by running the following command:

```bash
medperf association ls -bd
```

## 5. Execute the Benchmark

Expand All @@ -151,8 +155,8 @@ After running the command, you will receive a summary of the executions. You wil
```text
model local result UID partial result from cache error
------- ------------------ ---------------- ------------ -------
2 b1m2d1 False True
4 b1m4d1 False False
5 b1m5d1 False False
3 b1m3d1 False True
Total number of models: 2
1 were skipped (already executed), of which 0 have partial results
0 failed
Expand All @@ -164,12 +168,12 @@ Total number of models: 2
This means that the benchmark has two models:

- A model that you already ran when you requested the association. This explains why it was skipped.
- Another model that ran successfully. Its result generated UID is `b1m4d1`.
- Another model that ran successfully. Its result generated UID is `b1m5d1`.

You can view the results by running the following command with the specific local result UID. For example:

```bash
medperf result view b1m4d1
medperf result view b1m5d1
```

For now, your results are only local. Next, you will learn how to submit the results.
Expand All @@ -179,10 +183,10 @@ For now, your results are only local. Next, you will learn how to submit the res
![Dataset Owner submits evaluation results](../tutorial_images/do-6-do-submits-eval-results.png){class="tutorial-sticky-image-content"}
After executing the benchmark, you will submit a result to the MedPerf server. To do so, you have to find the target result generated UID.

As an example, you will be submitting the result of UID `b1m4d1`. To do this, run the following command:
As an example, you will be submitting the result of UID `b1m5d1`. To do this, run the following command:

```bash
medperf result submit --result b1m4d1
medperf result submit --result b1m5d1
```

The information that is going to be submitted will be printed to the screen and you will be prompted to confirm that you want to submit.
Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started/model_owner_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,12 +117,12 @@ Benchmark workflows are run by Data Owners, who will get notified when a new mod
To initiate an association request, you need to collect the following information:

- The target benchmark ID, which is `1`
- The server UID of your MLCube, which is `4`.
- The server UID of your MLCube, which is `5`.

Run the following command to request associating your MLCube with the benchmark:

```bash
medperf mlcube associate --benchmark 1 --model_uid 4
medperf mlcube associate --benchmark 1 --model_uid 5
```

This command will first run the benchmark's workflow on your model to ensure your model is compatible with the benchmark workflow. Then, the association request information is printed on the screen, which includes an executive summary of the test mentioned. You will be prompted to confirm sending this information and initiating this association request.
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ If this is your first time using MedPerf, install the MedPerf client library as

For this tutorial, you should spawn a local MedPerf server for the MedPerf client to communicate with. Note that this server will be hosted on your `localhost` and not on the internet.

1. Install the server requirements ensuring you are in MedPerf's root folder:
1. Install the server requirements ensuring you are in MedPerf's root folder. If a virtual environment was created when installing the MedPerf client (see [this section](installation.md#install-medperf)), make sure the same virtual environment is used when installing the local server dependencies:

```bash
pip install -r server/requirements.txt
Expand Down
19 changes: 3 additions & 16 deletions docs/getting_started/shared/cleanup.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,8 @@ You have reached the end of the tutorial! If you are planning to rerun any of th

- To shut down the local MedPerf server: press `CTRL`+`C` in the terminal where the server is running.

- To cleanup the downloaded files workspace (make sure you are in the MedPerf's root directory):
- To cleanup the downloaded files workspace, local MedPerf server database and local test storage, run the following script (make sure you are in the MedPerf's root directory):

```bash
rm -fr medperf_tutorial
```

- To cleanup the local MedPerf server database: (make sure you are in the MedPerf's root directory)

```bash
cd server
sh reset_db.sh
```

- To cleanup the test storage:

```bash
rm -fr ~/.medperf/localhost_8000
```
sh tutorials_scripts/tutorials_cleanup.sh
```
4 changes: 2 additions & 2 deletions docs/roles.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ Here we introduce user roles at MedPerf. Depending on the objectives and expecta

## Benchmark Committee

May include healthcare stakeholders (e.g., hospitals, clinicians, patient advocacy groups, payors, etc.), regulatory bodies, data providers and model owners wishing to drive the evaluation of AI models on real world data. While the *Benchmark Committee* does not have admin privileges on MedPerf, they have elevated permissions regarding benchmark assets (e.g., task, evaluation metrics, etc.) and policies (e.g., participation of model owners, data providers, anonymizations)
May include healthcare stakeholders (e.g., hospitals, clinicians, patient advocacy groups, payors, etc.), regulatory bodies, data owners and model owners wishing to drive the evaluation of AI models on real world data. While the *Benchmark Committee* does not have admin privileges on MedPerf, they have elevated permissions regarding benchmark assets (e.g., task, evaluation metrics, etc.) and policies (e.g., participation of model owners, data owners, anonymizations)

![](./images/benchmark_committee.png)

## Data Providers
## Data Owners

May include hospitals, medical practices, research organizations, and healthcare payors that own medical data, register medical data, and execute benchmarks.

Expand Down
4 changes: 2 additions & 2 deletions docs/what_is_medperf.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ td, th {
border: none!important;
}
</style>
MedPerf is an open-source framework for benchmarking medical ML models. It uses *Federated Evaluation* a method in which medical ML models are securely distributed to multiple global facilities for evaluation prioritizing patient privacy to mitigate legal and regulatory risks. The goal of *Federated Evaluation* is to make it simple and reliable to share ML models with many data providers, evaluate those ML models against their data in controlled settings, then aggregate and analyze the findings.
MedPerf is an open-source framework for benchmarking medical ML models. It uses *Federated Evaluation* a method in which medical ML models are securely distributed to multiple global facilities for evaluation prioritizing patient privacy to mitigate legal and regulatory risks. The goal of *Federated Evaluation* is to make it simple and reliable to share ML models with many data owners, evaluate those ML models against their data in controlled settings, then aggregate and analyze the findings.

The MedPerf approach empowers healthcare stakeholders through neutral governance to assess and verify the performance of ML models in an efficient and human-supervised process without sharing any patient data across facilities during the process.

Expand All @@ -19,7 +19,7 @@ The MedPerf approach empowers healthcare stakeholders through neutral governance

MedPerf aims to identify bias and generalizability issues of medical ML models by evaluating them on diverse medical data across the world. This process allows developers of medical ML to efficiently identify performance and reliability issues on their models while healthcare stakeholders (e.g., hospitals, practices, etc.) can validate such models against clinical efficacy.

Importantly, MedPerf supports technology for **neutral governance** in order to enable **full trust** and **transparency** among participating parties (e.g., AI vendor, data provider, regulatory body, etc.). This is all encapsulated in the benchmark committee which is the overseeing body on a benchmark.
Importantly, MedPerf supports technology for **neutral governance** in order to enable **full trust** and **transparency** among participating parties (e.g., AI vendor, data owner, regulatory body, etc.). This is all encapsulated in the benchmark committee which is the overseeing body on a benchmark.

| ![benchmark_committee.gif](images/benchmark_committee.gif) |
|:--:|
Expand Down
14 changes: 7 additions & 7 deletions docs/workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

<!-- ## Creating a User

Currently, the MedPerf administration is the only one able to create users, controlling access to the system and permissions to own a benchmark. For example, if a hospital (Data Provider), a model owner, or a benchmark committee wants to have access to MedPerf, they need to contact the MedPerf administrator to add a user. -->
Currently, the MedPerf administration is the only one able to create users, controlling access to the system and permissions to own a benchmark. For example, if a hospital (Data Owner), a model owner, or a benchmark committee wants to have access to MedPerf, they need to contact the MedPerf administrator to add a user. -->


<style>
Expand All @@ -14,7 +14,7 @@ td, th {
</style>


A benchmark in MedPerf is a collection of assets that are developed by the benchmark committee that aims to evaluate medical ML on decentralized data providers.
A benchmark in MedPerf is a collection of assets that are developed by the benchmark committee that aims to evaluate medical ML on decentralized data owners.

The process is simple yet effective enabling scalability.

Expand All @@ -24,22 +24,22 @@ The benchmarking process starts with establishing a benchmark committee of healt

<!-- ## Step 2. Recruit Data and Model Owners

The benchmark committee recruits Data Providers and Model Owners either by inviting trusted parties or by making an open call for participation. A higher number of dataset providers recruited can maximize diversity on a global scale. -->
The benchmark committee recruits Data Owners and Model Owners either by inviting trusted parties or by making an open call for participation. A higher number of dataset providers recruited can maximize diversity on a global scale. -->

## Step 2. Register Benchmark

[MLCubes](mlcubes/mlcubes.md) are the building blocks of an experiment and are required in order to create a benchmark. Three MLCubes (Data Preparator MLCube, Reference Model MLCube, and Metrics MLCube) need to be submitted. After submitting the three MLCubes, alongside with a sample reference dataset, the Benchmark Committee is capable of creating a benchmark. Once the benchmark is submitted, the Medperf admin must approve it before it can be seen by other users. Follow our [Hands-on Tutorial](getting_started/benchmark_owner_demo.md) for detailed step-by-step guidelines.

## Step 3. Register Dataset

Data Providers that want to be part of the benchmark can [register their own datasets, prepare them, and associate them](getting_started/data_owner_demo.md) with the benchmark. A dataset will be prepared using the benchmark's Data Preparator MLCube and the dataset's **metadata** is registered within the MedPerf server.
Data Owners that want to be part of the benchmark can [register their own datasets, prepare them, and associate them](getting_started/data_owner_demo.md) with the benchmark. A dataset will be prepared using the benchmark's Data Preparator MLCube and the dataset's **metadata** is registered within the MedPerf server.

| ![flow_preparation.gif](images/flow_preparation_association_folders.PNG) |
|:--:|
| *Data Preparation* |


The data provider then can request to participate in the benchmark with their dataset. Requesting the association will run the benchmark's reference workflow to assure the compatibility of the prepared dataset structure with the workflow. Once the association request is approved by the Benchmark Committee, then the dataset becomes a part of the benchmark.
The data owner then can request to participate in the benchmark with their dataset. Requesting the association will run the benchmark's reference workflow to assure the compatibility of the prepared dataset structure with the workflow. Once the association request is approved by the Benchmark Committee, then the dataset becomes a part of the benchmark.

![](./images/dataset_preparation_association.png)

Expand All @@ -51,9 +51,9 @@ Once a benchmark is submitted by the Benchmark Committee, any user can [submit t

## Step 5. Execute Benchmark

The Benchmark Committee may notify Data Providers that models are available for benchmarking. Data Providers can then [run the benchmark models](getting_started/data_owner_demo.md#5-execute-the-benchmark) locally on their data.
The Benchmark Committee may notify Data Owners that models are available for benchmarking. Data Owners can then [run the benchmark models](getting_started/data_owner_demo.md#5-execute-the-benchmark) locally on their data.

This procedure retrieves the model MLCubes associated with the benchmark and runs them on the indicated prepared dataset to generate predictions. The Metrics MLCube of the benchmark is then retrieved to evaluate the predictions. Once the evaluation results are generated, the data provider can [submit them](getting_started/data_owner_demo.md#6-submit-a-result) to the platform.
This procedure retrieves the model MLCubes associated with the benchmark and runs them on the indicated prepared dataset to generate predictions. The Metrics MLCube of the benchmark is then retrieved to evaluate the predictions. Once the evaluation results are generated, the data owner can [submit them](getting_started/data_owner_demo.md#6-submit-a-result) to the platform.

![](./images/execution_flow_folders.PNG)

Expand Down
13 changes: 13 additions & 0 deletions tutorials_scripts/medperf_login.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
LOGIN_EMAIL=$1
AUTH_STATUS=$(medperf auth status)

ALREADY_LOGGED_EMAIL=$(echo $AUTH_STATUS | grep -Eho "[[:graph:]]+@[[:graph:]]+")

if [ ! -z $ALREADY_LOGGED_EMAIL ]
then
echo "Logging out of current logged in e-mail $ALREADY_LOGGED_EMAIL"
medperf auth logout
fi

echo "Logging into email $LOGIN_EMAIL for this tutorial"
medperf auth login -e $LOGIN_EMAIL
4 changes: 3 additions & 1 deletion tutorials_scripts/setup_benchmark_tutorial.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Create a workspace
original_dir=$(echo $PWD)
mkdir -p medperf_tutorial
cd medperf_tutorial

Expand Down Expand Up @@ -26,4 +27,5 @@ sh download.sh
rm download.sh

## Login locally as benchmark owner
medperf auth login -e [email protected]
cd $original_dir
sh tutorials_scripts/medperf_login.sh [email protected]
3 changes: 2 additions & 1 deletion tutorials_scripts/setup_data_tutorial.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ tar -xf $filename
rm $filename

## Login locally as data owner
medperf auth login -e [email protected]
cd ..
sh tutorials_scripts/medperf_login.sh [email protected]
4 changes: 3 additions & 1 deletion tutorials_scripts/setup_model_tutorial.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Create a workspace
original_dir=$(echo $PWD)
mkdir -p medperf_tutorial
cd medperf_tutorial

Expand All @@ -11,4 +12,5 @@ sh download.sh
rm download.sh

## Login locally as model owner
medperf auth login -e [email protected]
cd $original_dir
sh tutorials_scripts/medperf_login.sh [email protected]
4 changes: 2 additions & 2 deletions tutorials_scripts/simulate_data_association_approval.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
medperf auth login -e [email protected]
sh tutorials_scripts/medperf_login.sh [email protected]
medperf association approve -b 1 -d 1
medperf auth login -e [email protected]
sh tutorials_scripts/medperf_login.sh [email protected]
25 changes: 25 additions & 0 deletions tutorials_scripts/tutorials_cleanup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# This script should be run from the medperf too directory, ie the parent directory to where this file is
# sh tutorials_scripts/tutorials_cleanup.sh

# Remove medperf_tutorial directory created by tutorials
echo "Removing medperf_tutorial directory from tutorials..."
rm -rf medperf_tutorial

# Cleanup local server database
echo "Reseting local server database..."
cd server
sh reset_db.sh

# Clean up test storage

echo "Removing local storage from tutorials..."
for dir in ~/.medperf/*
do
if [ -d "$dir" ]
then
rm -rf "$dir"/localhost_8000
fi
done

# Also delete demo directory
rm -rf ~/.medperf/demo
Loading