This project aims to process and analyze school data, primarily from data/school_data_final.json (which is a processed version of school_data_comprehensive.json and other sources).
- Data Cleaning (
clean_data.py): Normalizes school names, attempts to determine school type (Public, Catholic, Independent/Private), and maps postcodes to suburb names using an external CSV. - Geocoding (
add_geocodes.py): Adds latitude and longitude coordinates to schools. It primarily uses a lookup table generated from Australian postcode data. If a school's location isn't found in the lookup, it falls back to using the Nominatim geocoding service. - Manual Data Merging (
merge_manual_data.py): Merges manually collected or corrected school data with the automatically processed data. - Comprehensive Scraping (
comprehensive_scraper.py): (Details to be added) - Data Comparison (
data_comparison.py): Compares the processed data against official sources and identifies discrepancies or missing schools. - School Ranking (
rank_schools.py): Loads combined academic metrics fromdata/downloads/combined_school_metrics.csv, preprocesses the data (handles missing values, standardizes names), normalizes scores, applies a weighted scoring mechanism to calculate an overall rank for each school, tracks year-over-year rank changes, and saves the output todata/processed_rankings.csv.
- Display school rankings, types, levels (Primary, Secondary, etc.), locations, scores (e.g., VCE), and distance from a searched suburb/postcode.
- Interactive map to show school locations.
- Filter schools by maximum rank (with unranked schools included at max setting), school type, school level, and maximum distance from a searched location.
The Schoolify dashboard employs a differentiated ranking system for primary and secondary schools:
Secondary Schools:
- Ranked based on a Combined Weighted Average Score (CWAS) derived from key VCE performance metrics:
- Median VCE Score
- Percentage of VCE Scores 40+
- VCE Completion Rate
- Tertiary Application Rate
- Default weights for these metrics are 40%, 40%, 10%, and 10% respectively. Users can adjust these weights using the "Advanced Metrics" sliders.
- The CWAS for each school is then normalized (scaled from 0 to 1) across all ranked secondary schools. The final rank is based on this normalized score (higher is better).
- Rank Change: Calculation for secondary school rank change based on historical CWAS performance is a planned future enhancement. Currently, rank change for secondary schools may show "N/A".
Primary Schools:
- Ranked based on their government-published overall ranks. These ranks are re-indexed specifically within the subset of primary schools (e.g., the primary school with the best government rank gets rank #1 among primaries).
- Rank Change: Calculated based on the difference in their government-published rank from the previous year to the current year (if historical data is available).
Other School Levels:
- Schools categorized as 'Other' (e.g., Combined, Special) are generally ranked based on their
overall_rankif available.
(Instructions will be added later)
(Contribution guidelines will be added later)
(License information will be added later)