Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions copydata.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,7 @@ for folder in ${!foldermap[@]}; do
# In GCS, there are /internal folders that should not be overwritten.
# TODO(shifucun): Handle copy GCS reference. In this case, user would
# store a link but not actual data in this repo.
gsutil cp -r "$folder/$import_group/data/*" "$gcs_path/$import_group/data/"
gcloud storage cp --recursive "$folder/$import_group/data/*" "$gcs_path/$import_group/data/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For copying the contents of a local directory, it's more robust to specify the source directory with a trailing slash instead of using a wildcard (*). This avoids reliance on shell globbing, which can behave unexpectedly if the directory is empty. Using a trailing slash is the idiomatic way to copy directory contents with gcloud storage cp.

Suggested change
gcloud storage cp --recursive "$folder/$import_group/data/*" "$gcs_path/$import_group/data/"
gcloud storage cp --recursive "$folder/$import_group/data/" "$gcs_path/$import_group/data/"

done
fi
done

2 changes: 1 addition & 1 deletion scripts/us_census/decennial/download.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
mkdir -p scratch
gsutil -m cp -R gs://datcom-csv/census/decennial/* scratch/
gcloud storage cp --recursive gs://datcom-csv/census/decennial/* scratch/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and consistency, it's better to specify the GCS source directory path without a trailing /* when copying its contents recursively. The gcloud storage cp --recursive command will correctly copy the contents of the specified GCS directory.

Suggested change
gcloud storage cp --recursive gs://datcom-csv/census/decennial/* scratch/
gcloud storage cp --recursive gs://datcom-csv/census/decennial/ scratch/


mkdir -p output
2 changes: 1 addition & 1 deletion scripts/us_nces/demographics/private_school/run.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
mkdir -p gcs_folder/input_files
gsutil -m cp -r gs://unresolved_mcf/us_nces/demographics/private_school/semi_automation_input_files/* gcs_folder/input_files/
gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/private_school/semi_automation_input_files/* gcs_folder/input_files/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and consistency, it's better to specify the GCS source directory path without a trailing /* when copying its contents recursively. The gcloud storage cp --recursive command will correctly copy the contents of the specified GCS directory.

Suggested change
gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/private_school/semi_automation_input_files/* gcs_folder/input_files/
gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/private_school/semi_automation_input_files/ gcs_folder/input_files/

2 changes: 1 addition & 1 deletion scripts/us_nces/demographics/public_school/run.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
mkdir -p gcs_folder/input_files
gsutil -m cp -r gs://unresolved_mcf/us_nces/demographics/public_school/semi_automation_input_files/* gcs_folder/input_files/
gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/public_school/semi_automation_input_files/* gcs_folder/input_files/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and consistency, it's better to specify the GCS source directory path without a trailing /* when copying its contents recursively. The gcloud storage cp --recursive command will correctly copy the contents of the specified GCS directory.

Suggested change
gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/public_school/semi_automation_input_files/* gcs_folder/input_files/
gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/public_school/semi_automation_input_files/ gcs_folder/input_files/

2 changes: 1 addition & 1 deletion scripts/us_nces/demographics/school_district/run.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
mkdir -p gcs_folder/input_files
gsutil -m cp -r gs://unresolved_mcf/us_nces/demographics/school_district/semi_automation_input_files/* gcs_folder/input_files/
gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/school_district/semi_automation_input_files/* gcs_folder/input_files/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and consistency, it's better to specify the GCS source directory path without a trailing /* when copying its contents recursively. The gcloud storage cp --recursive command will correctly copy the contents of the specified GCS directory.

Suggested change
gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/school_district/semi_automation_input_files/* gcs_folder/input_files/
gcloud storage cp --recursive gs://unresolved_mcf/us_nces/demographics/school_district/semi_automation_input_files/ gcs_folder/input_files/

4 changes: 2 additions & 2 deletions statvar_imports/cdc/social_vulnerability_index/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Copies all latest input .csv files from GCS using:

bash

gsutil cp gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/* source_files/
gcloud storage cp gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/* source_files/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve clarity in the documentation, you can remove the trailing /* from the GCS path. When copying objects from the top level of a GCS directory (non-recursively), specifying the directory path is sufficient and cleaner.

Suggested change
gcloud storage cp gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/* source_files/
gcloud storage cp gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/ source_files/


Adds a year column using add_years.py:

Expand All @@ -59,7 +59,7 @@ Download the latest .csv/xlsx file manually from the CDC SVI source page.

bash

gsutil cp <local_csv_file> gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/
gcloud storage cp <local_csv_file> gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/

## Ensure the file is named using the format:
SVI_<year>_US_county.csv (e.g., SVI_2024_US_county.csv)
Expand Down
3 changes: 1 addition & 2 deletions statvar_imports/cdc/social_vulnerability_index/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,9 @@ set -e # Exit on error

echo "Step 1: Copying SVI files from GCS..."
mkdir -p source_files
gsutil cp gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/*.csv source_files/
gcloud storage cp gs://unresolved_mcf/cdc/social_vulnerability_index/latest/input_files/*.csv source_files/

echo "Step 2: Adding 'year' column to all files except 2010..."
python3 add_years.py

echo "Preprocessing done. StatVar processing will now continue as per manifest."

2 changes: 1 addition & 1 deletion statvar_imports/southkorea_statistics/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ GCS_PATH=$1
DESTINATION_DIR="source_files"

mkdir -p "${DESTINATION_DIR}"
gsutil -m cp "${GCS_PATH}*.csv" "${DESTINATION_DIR}/"
gcloud storage cp "${GCS_PATH}*.csv" "${DESTINATION_DIR}/"
2 changes: 1 addition & 1 deletion statvar_imports/us_cdc/single_race/download.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
set -e -o pipefail

mkdir -p input_files
gsutil -m cp gs://unresolved_mcf/cdc/UnderlyingCause/Single_Race/latest/input_files/*.csv input_files/
gcloud storage cp gs://unresolved_mcf/cdc/UnderlyingCause/Single_Race/latest/input_files/*.csv input_files/
3 changes: 1 addition & 2 deletions statvar_imports/us_newyork/ny_diabetes/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ set -e # Exit on error

echo "Step 1: Copying SVI files from GCS..."
mkdir -p input_files
gsutil cp gs://unresolved_mcf/nyu_diabetes/adult_diabetes/input_files/*.csv input_files/
gcloud storage cp gs://unresolved_mcf/nyu_diabetes/adult_diabetes/input_files/*.csv input_files/

echo "Preprocessing done. StatVar processing will now continue as per manifest."

Loading