mlcommons · RodriguesRBruno · Apr 22, 2025 · Apr 22, 2025 · Apr 22, 2025 · Apr 22, 2025
@@ -154,3 +154,6 @@ server/keys
 !examples/fl/mock_cert/project/ca/cert/root.crt
 !flca/dev_assets/intermediate_ca.crt
 !flca/dev_assets/root_ca.crt
+
+# Medperf Tutorials
+medperf_tutorial/
@@ -29,7 +29,7 @@ Additionally, here you can see how others used MedPerf already: [https://scholar
 
 ## Pilot Studies ##
 
-MedPerf was also further utilized to support academic medical research on both public and private data through efforts across Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, University of Pennsylvania, Penn Medicine, University of Pennsylvania Health System, University of Strasbourg, Institute of Image-Guided Surgery (IHU Strasbourg), Fondazione Policlinico Universitario Agostino Gemelli IRCCS, University of California San Francisco, and other academic institutions. The figure below displays the data provider locations used in all pilot experiments. 🟢: Pilot 1 - Brain Tumor Segmentation Pilot Experiment; 🔴: Pilot 2 - Pancreas Segmentation Pilot Experiment. 🔵: Pilot 3 - Surgical Workflow Phase Recognition Pilot Experiment. Pilot 4 - Cloud Experiments, used data and processes from Pilot 1 and 2.
+MedPerf was also further utilized to support academic medical research on both public and private data through efforts across Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, University of Pennsylvania, Penn Medicine, University of Pennsylvania Health System, University of Strasbourg, Institute of Image-Guided Surgery (IHU Strasbourg), Fondazione Policlinico Universitario Agostino Gemelli IRCCS, University of California San Francisco, and other academic institutions. The figure below displays the data owner locations used in all pilot experiments. 🟢: Pilot 1 - Brain Tumor Segmentation Pilot Experiment; 🔴: Pilot 2 - Pancreas Segmentation Pilot Experiment. 🔵: Pilot 3 - Surgical Workflow Phase Recognition Pilot Experiment. Pilot 4 - Cloud Experiments, used data and processes from Pilot 1 and 2.
 
 ![image](https://user-images.githubusercontent.com/25375373/163238058-6cf16f00-5238-4c80-8b58-d86f291a5bcf.png)
 

@@ -54,7 +54,7 @@ A demo dataset is a small reference dataset. It contains a few data records and
 
 2. When a model owner wants to participate in the benchmark, the MedPerf client tests the compatibility of their model with the benchmark's data preparation cube and metrics cube. The test is run using the benchmark's demo dataset as input.
 
-For this tutorial, you are provided with a demo dataset for the chest X-ray classification workflow. The dataset can be found in your workspace folder under `demo_data`. It is a small dataset comprising two chest X-ray images and corresponding thoracic disease labels.
+For this tutorial, you are provided with a demo dataset for the chest X-ray classification workflow. The dataset can be found in your workspace folder (`medperf_tutorial`) under `demo_data`. It is a small dataset comprising two chest X-ray images and corresponding thoracic disease labels.
 
 You can test the workflow now that you have the three MLCubes and the demo data. Testing the workflow before submitting any asset to the MedPerf server is usually recommended.
 
@@ -277,9 +277,9 @@ You need to keep at hand the following information:
 ```
 
 - For this tutorial, the UIDs are as follows:
-  - Data preparator UID: `1`
-  - Reference model UID: `2`
-  - Evaluator UID: `3`
+  - Data preparator UID: `2`
+  - Reference model UID: `3`
+  - Evaluator UID: `4`
 
 You can create and submit your benchmark using the following command:
 
@@ -288,9 +288,9 @@ medperf benchmark submit \
    --name tutorial_bmk \
    --description "MedPerf demo bmk" \
    --demo-url "{{ demo_url }}" \
-   --data-preparation-mlcube 1 \
-   --reference-model-mlcube 2 \
-   --evaluator-mlcube 3 \
+   --data-preparation-mlcube 2 \
+   --reference-model-mlcube 3 \
+   --evaluator-mlcube 4 \
    --operational
 ```
 

@@ -130,7 +130,11 @@ _For the sake of continuing the tutorial only_, run the following to simulate th
 sh tutorials_scripts/simulate_data_association_approval.sh
 ```
 
-You can verify if your association request has been approved by running `medperf association ls -bd`.
+You can verify if your association request has been approved by running the following command:
+
+```bash
+medperf association ls -bd
+```
 
 ## 5. Execute the Benchmark
 
@@ -151,8 +155,8 @@ After running the command, you will receive a summary of the executions. You wil
 ```text
   model  local result UID    partial result    from cache    error
 -------  ------------------  ----------------  ------------  -------
-      2  b1m2d1              False             True
-      4  b1m4d1              False             False
+      5  b1m5d1              False             False
+      3  b1m3d1              False             True
 Total number of models: 2
         1 were skipped (already executed), of which 0 have partial results
         0 failed
@@ -164,12 +168,12 @@ Total number of models: 2
 This means that the benchmark has two models:
 
 - A model that you already ran when you requested the association. This explains why it was skipped.
-- Another model that ran successfully. Its result generated UID is `b1m4d1`.
+- Another model that ran successfully. Its result generated UID is `b1m5d1`.
 
 You can view the results by running the following command with the specific local result UID. For example:
 
 ```bash
-medperf result view b1m4d1
+medperf result view b1m5d1
 ```
 
 For now, your results are only local. Next, you will learn how to submit the results.
@@ -179,10 +183,10 @@ For now, your results are only local. Next, you will learn how to submit the res
 ![Dataset Owner submits evaluation results](../tutorial_images/do-6-do-submits-eval-results.png){class="tutorial-sticky-image-content"}
 After executing the benchmark, you will submit a result to the MedPerf server. To do so, you have to find the target result generated UID.
 
-As an example, you will be submitting the result of UID `b1m4d1`. To do this, run the following command:
+As an example, you will be submitting the result of UID `b1m5d1`. To do this, run the following command:
 
 ```bash
-medperf result submit --result b1m4d1
+medperf result submit --result b1m5d1
 ```
 
 The information that is going to be submitted will be printed to the screen and you will be prompted to confirm that you want to submit.

@@ -117,12 +117,12 @@ Benchmark workflows are run by Data Owners, who will get notified when a new mod
 To initiate an association request, you need to collect the following information:
 
 - The target benchmark ID, which is `1`
-- The server UID of your MLCube, which is `4`.
+- The server UID of your MLCube, which is `5`.
 
 Run the following command to request associating your MLCube with the benchmark:
 
 ```bash
-medperf mlcube associate --benchmark 1 --model_uid 4
+medperf mlcube associate --benchmark 1 --model_uid 5
 ```
 
 This command will first run the benchmark's workflow on your model to ensure your model is compatible with the benchmark workflow. Then, the association request information is printed on the screen, which includes an executive summary of the test mentioned. You will be prompted to confirm sending this information and initiating this association request.

@@ -10,7 +10,7 @@ If this is your first time using MedPerf, install the MedPerf client library as
 
 For this tutorial, you should spawn a local MedPerf server for the MedPerf client to communicate with. Note that this server will be hosted on your `localhost` and not on the internet.
 
-1. Install the server requirements ensuring you are in MedPerf's root folder:
+1. Install the server requirements ensuring you are in MedPerf's root folder. If a virtual environment was created when installing the MedPerf client (see [this section](installation.md#install-medperf)), make sure the same virtual environment is used when installing the local server dependencies:
 
     ```bash
     pip install -r server/requirements.txt

@@ -4,21 +4,8 @@ You have reached the end of the tutorial! If you are planning to rerun any of th
 
 - To shut down the local MedPerf server: press `CTRL`+`C` in the terminal where the server is running.
 
-- To cleanup the downloaded files workspace (make sure you are in the MedPerf's root directory):
+- To cleanup the downloaded files workspace, local MedPerf server database and local test storage, run the following script (make sure you are in the MedPerf's root directory):
 
 ```bash
-rm -fr medperf_tutorial
-```
-
-- To cleanup the local MedPerf server database: (make sure you are in the MedPerf's root directory)
-
-```bash
-cd server
-sh reset_db.sh
-```
-
-- To cleanup the test storage:
-
-```bash
-rm -fr ~/.medperf/localhost_8000
-```
+sh tutorials_scripts/tutorials_cleanup.sh
+```
@@ -4,11 +4,11 @@ Here we introduce user roles at MedPerf. Depending on the objectives and expecta
 
 ## Benchmark Committee
 
-May include healthcare stakeholders (e.g., hospitals, clinicians, patient advocacy groups, payors, etc.), regulatory bodies, data providers and model owners wishing to drive the evaluation of AI models on real world data. While the *Benchmark Committee* does not have admin privileges on MedPerf, they have elevated permissions regarding benchmark assets (e.g., task, evaluation metrics, etc.) and policies (e.g., participation of model owners, data providers, anonymizations)
+May include healthcare stakeholders (e.g., hospitals, clinicians, patient advocacy groups, payors, etc.), regulatory bodies, data owners and model owners wishing to drive the evaluation of AI models on real world data. While the *Benchmark Committee* does not have admin privileges on MedPerf, they have elevated permissions regarding benchmark assets (e.g., task, evaluation metrics, etc.) and policies (e.g., participation of model owners, data owners, anonymizations)
 
 ![](./images/benchmark_committee.png)
 
-## Data Providers
+## Data Owners
 
 May include hospitals, medical practices, research organizations, and healthcare payors that own medical data, register medical data, and execute benchmarks.
 

@@ -4,7 +4,7 @@ td, th {
    border: none!important;
 }
 </style>
-MedPerf is an open-source framework for benchmarking medical ML models. It uses *Federated Evaluation* a method in which medical ML models are securely distributed to multiple global facilities for evaluation prioritizing patient privacy to mitigate legal and regulatory risks. The goal of *Federated Evaluation* is to make it simple and reliable to share ML models with many data providers, evaluate those ML models against their data in controlled settings, then aggregate and analyze the findings. 
+MedPerf is an open-source framework for benchmarking medical ML models. It uses *Federated Evaluation* a method in which medical ML models are securely distributed to multiple global facilities for evaluation prioritizing patient privacy to mitigate legal and regulatory risks. The goal of *Federated Evaluation* is to make it simple and reliable to share ML models with many data owners, evaluate those ML models against their data in controlled settings, then aggregate and analyze the findings. 
 
 The MedPerf approach empowers healthcare stakeholders through neutral governance to assess and verify the performance of ML models in an efficient and human-supervised process without sharing any patient data across facilities during the process.
 
@@ -19,7 +19,7 @@ The MedPerf approach empowers healthcare stakeholders through neutral governance
 
 MedPerf aims to identify bias and generalizability issues of medical ML models by evaluating them on diverse medical data across the world. This process allows developers of medical ML to efficiently identify performance and reliability issues on their models while healthcare stakeholders (e.g., hospitals, practices, etc.) can validate such models against clinical efficacy.
 
-Importantly, MedPerf supports technology for **neutral governance** in order to enable **full trust** and **transparency** among participating parties (e.g., AI vendor, data provider, regulatory body, etc.). This is all encapsulated in the benchmark committee which is the overseeing body on a benchmark.
+Importantly, MedPerf supports technology for **neutral governance** in order to enable **full trust** and **transparency** among participating parties (e.g., AI vendor, data owner, regulatory body, etc.). This is all encapsulated in the benchmark committee which is the overseeing body on a benchmark.
 
 | ![benchmark_committee.gif](images/benchmark_committee.gif) | 
 |:--:| 

@@ -4,7 +4,7 @@
 
 <!-- ## Creating a User
 
-Currently, the MedPerf administration is the only one able to create users, controlling access to the system and permissions to own a benchmark. For example, if a hospital (Data Provider), a model owner, or a benchmark committee wants to have access to MedPerf, they need to contact the MedPerf administrator to add a user. -->
+Currently, the MedPerf administration is the only one able to create users, controlling access to the system and permissions to own a benchmark. For example, if a hospital (Data Owner), a model owner, or a benchmark committee wants to have access to MedPerf, they need to contact the MedPerf administrator to add a user. -->
 
 
 <style>
@@ -14,7 +14,7 @@ td, th {
 </style>
 
 
-A benchmark in MedPerf is a collection of assets that are developed by the benchmark committee that aims to evaluate medical ML on decentralized data providers.
+A benchmark in MedPerf is a collection of assets that are developed by the benchmark committee that aims to evaluate medical ML on decentralized data owners.
 
 The process is simple yet effective enabling scalability.
 
@@ -24,22 +24,22 @@ The benchmarking process starts with establishing a benchmark committee of healt
 
 <!-- ## Step 2. Recruit Data and Model Owners
 
-The benchmark committee recruits Data Providers and Model Owners either by inviting trusted parties or by making an open call for participation. A higher number of dataset providers recruited can maximize diversity on a global scale. -->
+The benchmark committee recruits Data Owners and Model Owners either by inviting trusted parties or by making an open call for participation. A higher number of dataset providers recruited can maximize diversity on a global scale. -->
 
 ## Step 2. Register Benchmark
 
 [MLCubes](mlcubes/mlcubes.md) are the building blocks of an experiment and are required in order to create a benchmark. Three MLCubes (Data Preparator MLCube, Reference Model MLCube, and Metrics MLCube) need to be submitted. After submitting the three MLCubes, alongside with a sample reference dataset, the Benchmark Committee is capable of creating a benchmark. Once the benchmark is submitted, the Medperf admin must approve it before it can be seen by other users. Follow our [Hands-on Tutorial](getting_started/benchmark_owner_demo.md) for detailed step-by-step guidelines.
 
 ## Step 3. Register Dataset
 
-Data Providers that want to be part of the benchmark can [register their own datasets, prepare them, and associate them](getting_started/data_owner_demo.md) with the benchmark. A dataset will be prepared using the benchmark's Data Preparator MLCube and the dataset's **metadata** is registered within the MedPerf server.
+Data Owners that want to be part of the benchmark can [register their own datasets, prepare them, and associate them](getting_started/data_owner_demo.md) with the benchmark. A dataset will be prepared using the benchmark's Data Preparator MLCube and the dataset's **metadata** is registered within the MedPerf server.
 
 | ![flow_preparation.gif](images/flow_preparation_association_folders.PNG) | 
 |:--:| 
 | *Data Preparation* |
 
 
-The data provider then can request to participate in the benchmark with their dataset. Requesting the association will run the benchmark's reference workflow to assure the compatibility of the prepared dataset structure with the workflow. Once the association request is approved by the Benchmark Committee, then the dataset becomes a part of the benchmark.
+The data owner then can request to participate in the benchmark with their dataset. Requesting the association will run the benchmark's reference workflow to assure the compatibility of the prepared dataset structure with the workflow. Once the association request is approved by the Benchmark Committee, then the dataset becomes a part of the benchmark.
 
 ![](./images/dataset_preparation_association.png)
 
@@ -51,9 +51,9 @@ Once a benchmark is submitted by the Benchmark Committee, any user can [submit t
 
 ## Step 5. Execute Benchmark
 
-The Benchmark Committee may notify Data Providers that models are available for benchmarking. Data Providers can then [run the benchmark models](getting_started/data_owner_demo.md#5-execute-the-benchmark) locally on their data.
+The Benchmark Committee may notify Data Owners that models are available for benchmarking. Data Owners can then [run the benchmark models](getting_started/data_owner_demo.md#5-execute-the-benchmark) locally on their data.
 
-This procedure retrieves the model MLCubes associated with the benchmark and runs them on the indicated prepared dataset to generate predictions. The Metrics MLCube of the benchmark is then retrieved to evaluate the predictions. Once the evaluation results are generated, the data provider can [submit them](getting_started/data_owner_demo.md#6-submit-a-result) to the platform.
+This procedure retrieves the model MLCubes associated with the benchmark and runs them on the indicated prepared dataset to generate predictions. The Metrics MLCube of the benchmark is then retrieved to evaluate the predictions. Once the evaluation results are generated, the data owner can [submit them](getting_started/data_owner_demo.md#6-submit-a-result) to the platform.
 
 ![](./images/execution_flow_folders.PNG)
 

@@ -0,0 +1,13 @@
+LOGIN_EMAIL=$1
+AUTH_STATUS=$(medperf auth status)
+
+ALREADY_LOGGED_EMAIL=$(echo $AUTH_STATUS | grep -Eho "[[:graph:]]+@[[:graph:]]+")
+
+if [ ! -z $ALREADY_LOGGED_EMAIL ]
+then
+    echo "Logging out of current logged in e-mail $ALREADY_LOGGED_EMAIL"
+    medperf auth logout
+fi
+
+echo "Logging into email $LOGIN_EMAIL for this tutorial"
+medperf auth login -e $LOGIN_EMAIL
@@ -1,4 +1,5 @@
 # Create a workspace
+original_dir=$(echo $PWD)
 mkdir -p medperf_tutorial
 cd medperf_tutorial
 
@@ -26,4 +27,5 @@ sh download.sh
 rm download.sh
 
 ## Login locally as benchmark owner
-medperf auth login -e [email protected]
+cd $original_dir
+sh tutorials_scripts/medperf_login.sh [email protected]
@@ -15,4 +15,5 @@ tar -xf $filename
 rm $filename
 
 ## Login locally as data owner
-medperf auth login -e [email protected]
+cd ..
+sh tutorials_scripts/medperf_login.sh [email protected]
@@ -1,4 +1,5 @@
 # Create a workspace
+original_dir=$(echo $PWD)
 mkdir -p medperf_tutorial
 cd medperf_tutorial
 
@@ -11,4 +12,5 @@ sh download.sh
 rm download.sh
 
 ## Login locally as model owner
-medperf auth login -e [email protected]
+cd $original_dir
+sh tutorials_scripts/medperf_login.sh [email protected]
@@ -1,3 +1,3 @@
-medperf auth login -e [email protected]
+sh tutorials_scripts/medperf_login.sh [email protected]
 medperf association approve -b 1 -d 1
-medperf auth login -e [email protected]
+sh tutorials_scripts/medperf_login.sh [email protected]
@@ -0,0 +1,25 @@
+# This script should be run from the medperf too directory, ie the parent directory to where this file is
+# sh tutorials_scripts/tutorials_cleanup.sh
+
+# Remove medperf_tutorial directory created by tutorials
+echo "Removing medperf_tutorial directory from tutorials..."
+rm -rf medperf_tutorial
+
+# Cleanup local server database
+echo "Reseting local server database..."
+cd server
+sh reset_db.sh
+
+# Clean up test storage
+
+echo "Removing local storage from tutorials..."
+for dir in ~/.medperf/*
+do
+    if [ -d "$dir" ]
+        then
+            rm -rf "$dir"/localhost_8000
+        fi
+done
+
+# Also delete demo directory
+rm -rf ~/.medperf/demo