Things left to do on this dataset are as follows: - [x] Finish code to convert data to `.tfrecord` - [x] Finish code to handle sending and receiving data. - [x] Review Instruction validation code (possibly against existing implementation). - [x] Test Accuracy and Performance metrics. - [x] Potentially move accuracy calculation away from output processing (similar suggestion in TinyMMLU)