GitHub - mp-access/Categorization-Experiments-Re

This repository contains the code that was used to evaluate which categorization approach performed best at categorizing student implementations.
To assess categorization quality, an ideal categorization of the "shirt-size"-task was crafted. It is shown in the "sample-data/shirt-size/optimal-categories" folder.

In the "approaches" folder, all the tested approaches are listed. They are seperated by approach type, currently "jaccard", "llm", and "tsed".
For each approach, there is one file that performs the offline clustering experiments and one that performs the online clustering experiments.
- Within those files, the relevant environment variables are set (e.g. data of which task is used. Since evaluation is only possible for the "shirt-size" task (because only this task has a ground-truth categorization), it is currently set everywhere).
- Running these files performs the clustering and outputs the result (as well as the evaluation if the "shirt-size" task data was used).
  - The results are persisted in a file and stored in the "results" folder of the approach.

Anonymized data of three old ACCESS tasks are contained in this repository ("arithmetic-expression", "invert-dictionary", and "shirt-size"). They can be found in the "sample-data" folder and can be used in the experiments.
For each task, there's one .json file containing all the submissions that students made, and one .json file that contains only the first submissions students made.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
approaches		approaches
sample-data		sample-data
README.md		README.md
helper.py		helper.py
ideal_clustering_shirt_size.py		ideal_clustering_shirt_size.py

Provide feedback