diff --git a/statvar_imports/ipeds/ipeds_graduationrates_national/README.md b/statvar_imports/ipeds/ipeds_graduationrates_national/README.md new file mode 100644 index 0000000000..2c5ead33bd --- /dev/null +++ b/statvar_imports/ipeds/ipeds_graduationrates_national/README.md @@ -0,0 +1,73 @@ +# IPEDS GraduationRates National Dataset +## Overview +This dataset contains national-level graduation rate statistics for students who started as full-time, first-time (FTFT) degree or certificate-seeking undergraduates. +Specifically, it provides graduation rates at three different time intervals: 100%, 150%, and 200% of the "normal time" to completion. +It captures these key metrics: +- 100% Graduation Rate: Students finishing within the standard program length +- 150% Graduation Rate: The standard reporting benchmark (e.g., 6 years for a Bachelor's) +- 200% Graduation Rate: The extended benchmark (e.g., 8 years for a Bachelor's) + +The cohort year in the data refers to the specific time, a group of students who first entered an institution or started a degree. For some cohort year 2018-2022, the data refers to the graduation rates in 2022 for the students who enrolled in 2018. + +type of place: Country. +years: 2009-2024 +## Data Source +**Source URL:** +https://nces.ed.gov/ipeds/search/ + +**Provenance Description:** +The data comes from U.S. Department of Education, National Center for Education Statistics (NCES). Specifically, the data is drawn from the Integrated Postsecondary Education Data System (IPEDS), which is a comprehensive system of interrelated surveys that gathers institutional-level data from colleges, universities, and technical/vocational schools across the United States. + +## Refresh Type +Semi-Automatic Refresh + +For refresh of the data, the import is set up for semi automation with a manual download step to download the data into a gcs path. + +##Data Publish Frequency +Release Frequency = Annual +Provisional data is released during the early fall (Sep-Oct). + +## How To Download Input Data +To download the data, you'll need to use the provided source link. The source link leads to the IPEDS Data Explorer, which is a search tool provided by NCES. Here you need to filter the Graduation Rates as: +- Go to the source link which leads to data explorer +- Under the 'Surveys' dropdown, select 'Graduation Rates 200% (GR200)' +- By default, the data now will be visible for the latest year +- To fetch data for specific years, or all years, select the data year/years from the 'Data Year' dropdown +- Once the table opens, from the page header, select the 'Excel' option, which downloads the data in the .xlsx format +- The downloaded data is now avaialble for processing. +- Move the data to the path: **gs://unresolved_mcf/IPEDS/graduation_rates_national/input_files/** +- Process the data using the stat_var_processor script and the GCS bucket path for input as shown in below section. + +## Processing Instructions +To process the IPEDS Graduation Rate data and generate statistical variables, use the following command from the "data" directory: + +**For Data Run** +```bash +python ../../tools/statvar_importer/stat_var_processor.py \ + --input_data=gs://unresolved_mcf/IPEDS/graduation_rates_national/input_files/*.csv \ + --pv_map=graduation_rates_ipeds_pvmap.csv \ + --output_path=output/graduation_rates_ipeds_output \ + --config_file=graduation_rates_ipeds_metadata.csv \ + --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf +``` + +This generates the following output files: +- output csv +- output_stat_vars_scehma.mcf +- output_stat_vars.mcf +- output.tmcf + +**For Data Quality Checks and validation** +Validation of the data is done using the lint flag in the java tool present. + +```bash +java -jar datacommons-import-tool-0.1-jar-with-dependencies.jar lint graduation_rates_ipeds_output_stat_vars_schema.mcf graduation_rates_ipeds_output.csv graduation_rates_ipeds_output.tmcf graduation_rates_ipeds_output_stat_vars.mcf +``` + +This generates the following output files: +- report.json +- summary_report.csv +- summary_report.html + +The report files can be analysed to check for errors and warnings. +Further, Linting is performed on the generated output files using the DataCommons import tool. diff --git a/statvar_imports/ipeds/ipeds_graduationrates_national/graduation_rates_ipeds_metadata.csv b/statvar_imports/ipeds/ipeds_graduationrates_national/graduation_rates_ipeds_metadata.csv new file mode 100644 index 0000000000..15a131f187 --- /dev/null +++ b/statvar_imports/ipeds/ipeds_graduationrates_national/graduation_rates_ipeds_metadata.csv @@ -0,0 +1,3 @@ +parameter,value +output_columns,"observationDate,observationPeriod,value,unit,observationAbout,variableMeasured" +dc_api_root,https://api.datacommons.org \ No newline at end of file diff --git a/statvar_imports/ipeds/ipeds_graduationrates_national/graduation_rates_ipeds_pvmap.csv b/statvar_imports/ipeds/ipeds_graduationrates_national/graduation_rates_ipeds_pvmap.csv new file mode 100644 index 0000000000..4d49431e11 --- /dev/null +++ b/statvar_imports/ipeds/ipeds_graduationrates_national/graduation_rates_ipeds_pvmap.csv @@ -0,0 +1,48 @@ +key,p1,v1,p2,v2,p3,v3,p4,v4,p5,v5 +National Center for Education Statistics,statType,measuredValue,populationType,Student,observationAbout,country/USA,,,, +Total,value,{Number},institutionType,"""""",statType,measuredValue,measuredProperty,graduationRate,unit,Percent +Overall,value,{Number},institutionType,"""""",statType,measuredValue,measuredProperty,graduationRate,unit,Percent +All institutions,value,{Number},institutionType,"""""",statType,measuredValue,measuredProperty,graduationRate,unit,Percent +Public,value,{Number},institutionType,PublicInstitute,statType,measuredValue,measuredProperty,graduationRate,unit,Percent +not-for-profit,value,{Number},institutionType,PrivateNotForProfitInstitute,statType,measuredValue,measuredProperty,graduationRate,unit,Percent +nonprofit,value,{Number},institutionType,PrivateNotForProfitInstitute,statType,measuredValue,measuredProperty,graduationRate,unit,Percent +for-profit,value,{Number},institutionType,PrivateForProfitInstitute,statType,measuredValue,measuredProperty,graduationRate,unit,Percent +attending 4-year,educationalAttainment,BachelorsDegree,#Header,educationalAttainment,,,,,, +attending 2-year,educationalAttainment,AssociateDegreeOrCertificate,#Header,educationalAttainment,,,,,, +attending 2-year,educationalAttainment,AssociateDegreeOrCertificate,#Header,educationalAttainment,,,,,, +less-than- 2-year,educationalAttainment,PostSecondaryCertificate,#Header,educationalAttainment,,,,,, +less-than-2-year,educationalAttainment,PostSecondaryCertificate,#Header,educationalAttainment,,,,,, +attending less-than-,educationalAttainment,PostSecondaryCertificate,#Header,educationalAttainment,,,,,, +within 100%,courseCompletionTime,CourseCompletedWithin100PercentOfNormalTime,,,,,,,, +within 150%,courseCompletionTime,CourseCompletedWithin150PercentOfNormalTime,,,,,,,, +within 200%,courseCompletionTime,CourseCompletedWithin200PercentOfNormalTime,,,,,,,, +Within 100 percent,courseCompletionTime,CourseCompletedWithin100PercentOfNormalTime,,,,,,,, +Within 150 percent,courseCompletionTime,CourseCompletedWithin150PercentOfNormalTime,,,,,,,, +Within 200 percent,courseCompletionTime,CourseCompletedWithin200PercentOfNormalTime,,,,,,,, +cohort years 2000 and 2004,observationDate,2004,observationPeriod,P4Y,,,,,, +cohort years 2001 and 2005,observationDate,2005,observationPeriod,P4Y,,,,,, +cohort years 2002 and 2006,observationDate,2006,observationPeriod,P4Y,,,,,, +cohort years 2003 and 2007,observationDate,2007,observationPeriod,P4Y,,,,,, +cohort years 2004 and 2008,observationDate,2008,observationPeriod,P4Y,,,,,, +cohort years 2005 and 2009,observationDate,2009,observationPeriod,P4Y,,,,,, +cohort years 2006 and 2010,observationDate,2010,observationPeriod,P4Y,,,,,, +cohort years 2007 and 2011,observationDate,2011,observationPeriod,P4Y,,,,,, +cohort years 2008 and 2012,observationDate,2012,observationPeriod,P4Y,,,,,, +cohort years 2009 and 2013,observationDate,2013,observationPeriod,P4Y,,,,,, +cohort years 2010 and 2014,observationDate,2014,observationPeriod,P4Y,,,,,, +cohort years 2011 and 2015,observationDate,2015,observationPeriod,P4Y,,,,,, +cohort years 2012 and 2016,observationDate,2016,observationPeriod,P4Y,,,,,, +cohort years 2013 and 2017,observationDate,2017,observationPeriod,P4Y,,,,,, +cohort years 2014 and 2018,observationDate,2018,observationPeriod,P4Y,,,,,, +cohort years 2015 and 2019,observationDate,2019,observationPeriod,P4Y,,,,,, +cohort years 2016 and 2020,observationDate,2020,observationPeriod,P4Y,,,,,, +cohort years 2017 and 2021,observationDate,2021,observationPeriod,P4Y,,,,,, +cohort years 2018 and 2022,observationDate,2022,observationPeriod,P4Y,,,,,, +cohort years 2019 and 2023,observationDate,2023,observationPeriod,P4Y,,,,,, +cohort years 2020 and 2024,observationDate,2024,observationPeriod,P4Y,,,,,, +cohort years 2021 and 2025,observationDate,2025,observationPeriod,P4Y,,,,,, +cohort years 2022 and 2026,observationDate,2026,observationPeriod,P4Y,,,,,, +cohort years 2023 and 2027,observationDate,2027,observationPeriod,P4Y,,,,,, +cohort years 2024 and 2028,observationDate,2028,observationPeriod,P4Y,,,,,, +cohort years 2025 and 2029,observationDate,2029,observationPeriod,P4Y,,,,,, +cohort years 2026 and 2030,observationDate,2030,observationPeriod,P4Y,,,,,, \ No newline at end of file diff --git a/statvar_imports/ipeds/ipeds_graduationrates_national/manifest.json b/statvar_imports/ipeds/ipeds_graduationrates_national/manifest.json new file mode 100644 index 0000000000..0a61f76f9d --- /dev/null +++ b/statvar_imports/ipeds/ipeds_graduationrates_national/manifest.json @@ -0,0 +1,21 @@ +{ + "import_specifications": [ + { + "import_name": "IPEDS_GraduationRates_National", + "curator_emails": ["support@datacommons.org"], + "provenance_url": "https://nces.ed.gov/ipeds/search", + "provenance_description": "", + "scripts": [ + "../../tools/statvar_importer/stat_var_processor.py --input_data=gs://unresolved_mcf/IPEDS/graduation_rates_national/input_files/*.csv --pv_map=graduation_rates_ipeds_pvmap.csv --config_file=graduation_rates_ipeds_metadata.csv --output_path=output/graduation_rates_ipeds_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf" + ], + "import_inputs": [ + { + "template_mcf": "output/graduation_rates_ipeds_output.tmcf", + "cleaned_csv": "output/graduation_rates_ipeds_output.csv" + } + ], + "source_files": ["input_files/*.csv"], + "cron_schedule": "0 0 15 8 *" + } + ] +} diff --git a/statvar_imports/ipeds/ipeds_graduationrates_national/test_data/graduation_rates_ipeds_data2019.csv b/statvar_imports/ipeds/ipeds_graduationrates_national/test_data/graduation_rates_ipeds_data2019.csv new file mode 100644 index 0000000000..add32cb6e9 --- /dev/null +++ b/statvar_imports/ipeds/ipeds_graduationrates_national/test_data/graduation_rates_ipeds_data2019.csv @@ -0,0 +1,29 @@ +National Center for Education Statistics,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6 +"Table 10. Graduation rates within 100, 150, and 200 percent of normal program completion time at Title IV institutions among the students who started as full-time, first-time degree/certificate-seeking undergraduate students, by control of institution, degree or certificate sought, and level of institution: United States, cohort years 2010 and 2014",,,,,, +,,,,,, +,,,,,, +,,,,,, +,,,,,, +,,,,,, +Degree or certificate sought and level of institution,,,,,Private , +,,All institutions,Public,,Nonprofit,For-profit +,,,,,, +Bachelor’s or equivalent degree-seeking students attending 4-year institutions and completing bachelor’s or equivalent degree (cohort year 2010),,,,,, +,,,,,, +Within 100 percent of normal program completion time,,40.9,35.7,,53.8,13 +Within 150 percent of normal program completion time,,60,58.9,,65.9,20.3 +Within 200 percent of normal program completion time,,62,61.5,,67,21.3 +,,,,,, +Degree- or certificate-seeking students attending 2-year institutions and completing a degree or certificate (cohort year 2014),,,,,, +,,,,,, +Within 100 percent of normal program completion time,,18.5,14.5,,26.8,41 +Within 150 percent of normal program completion time,,33.1,26.7,,62.2,63.5 +Within 200 percent of normal program completion time,,37.8,32.3,,63.7,64.4 +,,,,,, +Degree- or certificate-seeking students attending less-than-2-year institutions and completing a degree or certificate (cohort year 2014),,,,,, +,,,,,, +Within 100 percent of normal program completion time,,45.8,64.5,,58.3,42.4 +Within 150 percent of normal program completion time,,69.3,73.6,,72.6,68.5 +Within 200 percent of normal program completion time,,70.2,74.5,,73.1,69.5 +"NOTE: Title IV institutions are those with a written agreement with the U.S. Department of Education that allows the institution to participate in any of the Title IV federal student financial assistance programs. United States includes the 50 states and the District of Columbia. The four U.S. service academies that are not Title IV eligible are included in the Integrated Postsecondary Education Data System (IPEDS) universe because they are federally funded and open to the public and are included in this table. The rates in this table reflect graduation rates at institutions regardless of the length of programs, unless otherwise indicated. The graduation rate was calculated as required for disclosure and reporting purposes under the Student Right-to-Know Act. This rate was calculated as the total number of completers within 100, 150, or 200 percent of normal time (e.g. “normal” program completion time for a bachelor’s degree would be 4 years) divided by the adjusted cohort (revised cohort minus any allowable exclusions). The revised cohort is the number of students entering the institution as full-time, first-time degree- or certificate-seeking undergraduates in the reference year. Allowable exclusions include those students who died or were totally and permanently disabled; students who left school to serve in the armed forces (or have been called up to active duty); those who left to serve with a foreign aid service of the federal government, such as the Peace Corps; and those who left to serve on official church missions. Definitions for terms used in this table may be found in the collection year’s archived downloadable glossary located at https://nces.ed.gov/ipeds/use-the-data/annual-survey-forms-packages-archived?year=2018.",,,,,, +"SOURCE: U.S. Department of Education, National Center for Education Statistics, IPEDS, Winter 2018–19, 200 Percent Graduation Rates component (final data).",,,,,, diff --git a/statvar_imports/ipeds/ipeds_graduationrates_national/test_data/graduation_rates_ipeds_output.csv b/statvar_imports/ipeds/ipeds_graduationrates_national/test_data/graduation_rates_ipeds_output.csv new file mode 100644 index 0000000000..67fc7f3139 --- /dev/null +++ b/statvar_imports/ipeds/ipeds_graduationrates_national/test_data/graduation_rates_ipeds_output.csv @@ -0,0 +1,37 @@ +observationDate,observationPeriod,value,unit,observationAbout,variableMeasured +2014,P4Y,40.9,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentBachelorsDegree +2014,P4Y,35.7,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentBachelorsDegree_PublicInstitute +2014,P4Y,53.8,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentBachelorsDegree_PrivateNotForProfitInstitute +2014,P4Y,13,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentBachelorsDegree_PrivateForProfitInstitute +2014,P4Y,60,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentBachelorsDegree +2014,P4Y,58.9,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentBachelorsDegree_PublicInstitute +2014,P4Y,65.9,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentBachelorsDegree_PrivateNotForProfitInstitute +2014,P4Y,20.3,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentBachelorsDegree_PrivateForProfitInstitute +2014,P4Y,62,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentBachelorsDegree +2014,P4Y,61.5,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentBachelorsDegree_PublicInstitute +2014,P4Y,67,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentBachelorsDegree_PrivateNotForProfitInstitute +2014,P4Y,21.3,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentBachelorsDegree_PrivateForProfitInstitute +2014,P4Y,18.5,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate +2014,P4Y,14.5,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate_PublicInstitute +2014,P4Y,26.8,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate_PrivateNotForProfitInstitute +2014,P4Y,41,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate_PrivateForProfitInstitute +2014,P4Y,33.1,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate +2014,P4Y,26.7,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate_PublicInstitute +2014,P4Y,62.2,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate_PrivateNotForProfitInstitute +2014,P4Y,63.5,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate_PrivateForProfitInstitute +2014,P4Y,37.8,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate +2014,P4Y,32.3,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate_PublicInstitute +2014,P4Y,63.7,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate_PrivateNotForProfitInstitute +2014,P4Y,64.4,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentAssociateDegreeOrCertificate_PrivateForProfitInstitute +2014,P4Y,45.8,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate +2014,P4Y,64.5,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate_PublicInstitute +2014,P4Y,58.3,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate_PrivateNotForProfitInstitute +2014,P4Y,42.4,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin100PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate_PrivateForProfitInstitute +2014,P4Y,69.3,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate +2014,P4Y,73.6,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate_PublicInstitute +2014,P4Y,72.6,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate_PrivateNotForProfitInstitute +2014,P4Y,68.5,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin150PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate_PrivateForProfitInstitute +2014,P4Y,70.2,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate +2014,P4Y,74.5,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate_PublicInstitute +2014,P4Y,73.1,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate_PrivateNotForProfitInstitute +2014,P4Y,69.5,Percent,country/USA,dcid:GraduationRate_Student_CourseCompletedWithin200PercentOfNormalTime_EducationalAttainmentPostSecondaryCertificate_PrivateForProfitInstitute diff --git a/statvar_imports/ipeds/ipeds_graduationrates_national/test_data/graduation_rates_ipeds_output.tmcf b/statvar_imports/ipeds/ipeds_graduationrates_national/test_data/graduation_rates_ipeds_output.tmcf new file mode 100644 index 0000000000..88bade15c6 --- /dev/null +++ b/statvar_imports/ipeds/ipeds_graduationrates_national/test_data/graduation_rates_ipeds_output.tmcf @@ -0,0 +1,8 @@ +Node: E:graduation_rates_ipeds_output->E0 +observationDate: C:graduation_rates_ipeds_output->observationDate +observationPeriod: C:graduation_rates_ipeds_output->observationPeriod +value: C:graduation_rates_ipeds_output->value +unit: C:graduation_rates_ipeds_output->unit +observationAbout: C:graduation_rates_ipeds_output->observationAbout +variableMeasured: C:graduation_rates_ipeds_output->variableMeasured +typeOf: dcs:StatVarObservation