|
14 | 14 | "\n", |
15 | 15 | "In previous tutorials we've loaded single lat-lon-cap NetCDF tile files (granules) for ECCO state estimate variables and model grid parameters. Here we will demonstrate merging `Datasets` together. Some benefits of merging `Datasets` include having a tidier workspace and simplifying subsetting operations (e.g., using ``xarray.isel`` or ``xarray.sel`` as shown in the [previous tutorial](https://ecco-v4-python-tutorial.readthedocs.io/ECCO_v4_Loading_the_ECCOv4_state_estimate_fields_on_the_native_model_grid.html)). \n", |
16 | 16 | "\n", |
17 | | - "First, we'll load three ECCOv4 NetCDF state estimate variables (each centered on different coordinates) as well as the model grid file. For this, you will need to download 2 datasets of monthly mean fields for the year 2010. The ShortNames for the 2 datasets are:\n", |
| 17 | + "First, we'll load three ECCOv4 NetCDF state estimate variables (each centered on different coordinates) as well as the model grid file. For this, you will need 2 datasets of monthly mean fields for the year 2010, as well as the grid parameters file. The ShortNames for the datasets are:\n", |
18 | 18 | "\n", |
19 | | - "- **ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4**\n", |
20 | | - "- **ECCO_L4_OCEAN_3D_TEMPERATURE_FLUX_LLC0090GRID_MONTHLY_V4R4**\n", |
| 19 | + "- **ECCO_L4_GEOMETRY_LLC0090GRID_V4R4** (no time dimension)\n", |
| 20 | + "- **ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4** (Jan-Dec 2010)\n", |
| 21 | + "- **ECCO_L4_OCEAN_3D_TEMPERATURE_FLUX_LLC0090GRID_MONTHLY_V4R4** (Jan-Dec 2010)\n", |
21 | 22 | "\n", |
22 | | - "If you did the previous tutorial you already have the SSH files.\n", |
| 23 | + "The `ecco_access` library used in the notebook will handle download or retrieval of the necessary data, if you have set up the library [in your Python path](https://ecco-v4-python-tutorial.readthedocs.io/ECCO_access_intro.html#Setting-up-ecco_access).\n", |
23 | 24 | "\n", |
24 | | - "Once you have the required ECCOv4 output downloaded, let's define our environment." |
| 25 | + "Let's define our environment:" |
25 | 26 | ] |
26 | 27 | }, |
27 | 28 | { |
|
36 | 37 | "import matplotlib.pyplot as plt\n", |
37 | 38 | "import json\n", |
38 | 39 | "\n", |
| 40 | + "import ecco_access as ea\n", |
39 | 41 | "\n", |
40 | | - "# indicate whether you are working in a cloud instance (True if yes, False otherwise)\n", |
41 | | - "incloud_access = False" |
| 42 | + "# indicate mode of access\n", |
| 43 | + "# options are:\n", |
| 44 | + "# 'download': direct download from internet to your local machine\n", |
| 45 | + "# 'download_ifspace': like download, but only proceeds \n", |
| 46 | + "# if your machine have sufficient storage\n", |
| 47 | + "# 's3_open': access datasets in-cloud from an AWS instance\n", |
| 48 | + "# 's3_open_fsspec': use jsons generated with fsspec and \n", |
| 49 | + "# kerchunk libraries to speed up in-cloud access\n", |
| 50 | + "# 's3_get': direct download from S3 in-cloud to an AWS instance\n", |
| 51 | + "# 's3_get_ifspace': like s3_get, but only proceeds if your instance \n", |
| 52 | + "# has sufficient storage\n", |
| 53 | + "access_mode = 'download_ifspace'" |
42 | 54 | ] |
43 | 55 | }, |
44 | 56 | { |
|
72 | 84 | "## Set top-level file directory for the ECCO NetCDF files\n", |
73 | 85 | "## =================================================================\n", |
74 | 86 | "\n", |
75 | | - "## currently set to ~/Downloads/ECCO_V4r4_PODAAC, \n", |
76 | | - "## the default if ecco_podaac_download was used to download dataset granules\n", |
77 | | - "ECCO_dir = join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC')" |
| 87 | + "## currently set to /Downloads/ECCO_V4r4_PODAAC\n", |
| 88 | + "ECCO_dir = join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC')\n", |
| 89 | + "\n", |
| 90 | + "# # for access_mode = 's3_open_fsspec', need to specify the root directory \n", |
| 91 | + "# # containing the jsons\n", |
| 92 | + "# jsons_root_dir = join('/efs_ecco','mzz-jsons')" |
78 | 93 | ] |
79 | 94 | }, |
80 | 95 | { |
|
83 | 98 | "metadata": {}, |
84 | 99 | "outputs": [], |
85 | 100 | "source": [ |
86 | | - "## if working in the AWS cloud, access datasets needed for this tutorial\n", |
| 101 | + "## Access datasets needed for this tutorial\n", |
87 | 102 | "\n", |
88 | 103 | "ShortNames_list = [\"ECCO_L4_GEOMETRY_LLC0090GRID_V4R4\",\\\n", |
89 | 104 | " \"ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4\",\\\n", |
90 | 105 | " \"ECCO_L4_OCEAN_3D_TEMPERATURE_FLUX_LLC0090GRID_MONTHLY_V4R4\"]\n", |
91 | | - "if incloud_access == True:\n", |
92 | | - " from ecco_s3_retrieve import ecco_podaac_s3_get_diskaware\n", |
93 | | - " files_dict = ecco_podaac_s3_get_diskaware(ShortNames=ShortNames_list,\\\n", |
| 106 | + "\n", |
| 107 | + "ds_dict = ea.ecco_podaac_to_xrdataset(ShortNames_list,\\\n", |
94 | 108 | " StartDate='2010-01',EndDate='2010-12',\\\n", |
95 | | - " max_avail_frac=0.5,\\\n", |
96 | | - " download_root_dir=ECCO_dir)" |
| 109 | + " mode=access_mode,\\\n", |
| 110 | + " download_root_dir=ECCO_dir,\\\n", |
| 111 | + " max_avail_frac=0.5)" |
97 | 112 | ] |
98 | 113 | }, |
99 | 114 | { |
|
110 | 125 | "outputs": [], |
111 | 126 | "source": [ |
112 | 127 | "# load dataset containing monthly SSH in 2010\n", |
113 | | - "if incloud_access == True:\n", |
114 | | - " # use list comprehension to list file path(s)\n", |
115 | | - " file_paths = [filepath for filepath in files_dict[ShortNames_list[1]] if '_2010-' in filepath]\n", |
116 | | - " ecco_dataset_A = xr.open_mfdataset(file_paths)\n", |
117 | | - "else:\n", |
118 | | - " ecco_dataset_A = xr.open_mfdataset(join(ECCO_dir,'*SSH*MONTHLY*','*_2010-??_*.nc'))" |
| 128 | + "ecco_dataset_A = ds_dict[ShortNames_list[1]]" |
119 | 129 | ] |
120 | 130 | }, |
121 | 131 | { |
|
224 | 234 | } |
225 | 235 | ], |
226 | 236 | "source": [ |
227 | | - "# load dataset containing monthly mean 3D temperature fluxes in 2010\n", |
228 | | - "if incloud_access == True:\n", |
229 | | - " file_paths = [filepath for filepath in files_dict[ShortNames_list[2]] if '_2010-' in filepath]\n", |
230 | | - " ecco_dataset_B = xr.open_mfdataset(file_paths)\n", |
231 | | - "else:\n", |
232 | | - " ecco_dataset_B = xr.open_mfdataset(join(ECCO_dir,'*3D_TEMPERATURE_FLUX_LLC0090GRID_MONTHLY*','*_2010-??_*.nc'))\n", |
| 237 | + "# open dataset containing monthly mean 3D temperature fluxes in 2010\n", |
| 238 | + "ecco_dataset_B = ds_dict[ShortNames_list[2]]\n", |
233 | 239 | "\n", |
234 | 240 | "ecco_dataset_B.data_vars" |
235 | 241 | ] |
|
1568 | 1574 | }, |
1569 | 1575 | "outputs": [], |
1570 | 1576 | "source": [ |
1571 | | - "# merge together\n", |
| 1577 | + "# merge together and load into memory\n", |
1572 | 1578 | "ecco_dataset_AB = xr.merge([ecco_dataset_A['SSH'], ecco_dataset_B[['ADVx_TH','ADVy_TH']]]).compute()" |
1573 | 1579 | ] |
1574 | 1580 | }, |
|
2499 | 2505 | ], |
2500 | 2506 | "source": [ |
2501 | 2507 | "# Load the llc90 grid parameters\n", |
2502 | | - "if incloud_access == True:\n", |
2503 | | - " grid_dataset = xr.open_dataset(files_dict[ShortNames_list[0]])\n", |
2504 | | - "else:\n", |
2505 | | - " import glob\n", |
2506 | | - " grid_dataset = xr.open_dataset(glob.glob(join(ECCO_dir,'*GEOMETRY*','*.nc'))[0])\n", |
| 2508 | + "grid_dataset = ds_dict[ShortNames_list[0]].compute()\n", |
| 2509 | + "\n", |
2507 | 2510 | "grid_dataset.coords" |
2508 | 2511 | ] |
2509 | 2512 | }, |
|
3382 | 3385 | "name": "python", |
3383 | 3386 | "nbconvert_exporter": "python", |
3384 | 3387 | "pygments_lexer": "ipython3", |
3385 | | - "version": "3.11.8" |
| 3388 | + "version": "3.11.9" |
3386 | 3389 | } |
3387 | 3390 | }, |
3388 | 3391 | "nbformat": 4, |
|
0 commit comments