Team members can access the running notes for meetings, which provide details of the project goals and decisions.
Data source: https://results2021.ref.ac.uk/ (accessed 2023-08-10)
Information on submission system data requirements: https://ref.ac.uk/guidance-and-criteria-on-submissions/guidance/submission-system-data-requirements/ (accessed 2023-08-30).
Local copy of the download page: Submission_system_data_requirements-REF2021.pdf (accessed 2023-08-10)
Streamlit data explorer hosted on Azure is available at [https://ref2021explorer.azurewebsites.net].
Follow these steps to set up the environment for this project:
-
Install Python 3.x on your system if it is not already installed.
-
Clone this project from GitHub.
-
Navigate to the project root directory in your terminal or command prompt.
-
Create a new virtual environment with
python3 -m venv venv
This will create a new virtual environment named
venvin the current directory. -
Activate the virtual environment with:
On Windows
venv\Scripts\activate.batOn Unix/Linux/MacOS:
source venv/bin/activateThis will activate the virtual environment and change your prompt to indicate that you are now working inside the virtual environment.
-
Install the project dependencies with:
pip install -r requirements.txt
This will install all of the required packages and their versions listed in the
requirements.txtfile.
The raw PDF format environment statements have been processed with pdftotext(1) tool from poppler-utils
Package used to convert to text is poppler-utils 22.12.0.
The conversion was done on a Debian bookworm system on a x86_64 architecture.
The script is not dockerised but can be done based on the debian:bookworm-slim image if required.
To convert the PDFs to *.txt files run this script in the folder containing the PDFs
#!/bin/sh
for i in *.pdf; do
pdftotext -layout "$i"
doneThen the *.txt files are then copied into data/processed/environment_statements folder.