|
16 | 16 | "\n", |
17 | 17 | "Before you begin, make sure you have the monthly mean temperature/salinity files for 2010 downloaded. If you have done the previous tutorial about Dataset and DataArray objects, you already have these; if not, you should run at least the first code cell of that tutorial before this one.\n", |
18 | 18 | "\n", |
19 | | - "As we showed in the previous tutorial, we can use the `open_mfdataset` method from `xarray` to load multiple NetCDF files into Python as a `Dataset` object. `open_mfdataset` is very convenient because it automatically parses and concatenates NetCDF files, constructing a `Dataset` object using all of the dimensions, coordinates, variables, and metadata information. \n", |
| 19 | + "As we showed in the previous tutorial, we can use the [`ecco_access` library](https://ecco-v4-python-tutorial.readthedocs.io/ECCO_access_intro.html#Setting-up-ecco_access) which downloads (or retrieves in the AWS Cloud) the requested ECCO output. We can either use the `ecco_podaac_access` function and then explicitly call `xarray`'s `open_mfdataset` to load multiple NetCDF files into Python as a `Dataset` object, or combine both steps into one with `ecco_access`'s `ecco_podaac_to_xrdataset`. This is very convenient because it opens the requested output as an `xarray` Dataset object with all of the dimensions, coordinates, variables, and metadata information. \n", |
20 | 20 | "\n", |
21 | 21 | "In the last tutorial we analyzed the contents of the ECCOv4 monthly mean potential temperature and salinity files for the year 2010. Let's load these files up again as the `Dataset` object *theta_dataset*." |
22 | 22 | ] |
|
32 | 32 | "import sys\n", |
33 | 33 | "import matplotlib.pyplot as plt\n", |
34 | 34 | "%matplotlib inline\n", |
35 | | - "import json" |
| 35 | + "import json\n", |
| 36 | + "\n", |
| 37 | + "import ecco_access as ea\n", |
| 38 | + "\n", |
| 39 | + "# indicate mode of access\n", |
| 40 | + "# options are:\n", |
| 41 | + "# 'download': direct download from internet to your local machine\n", |
| 42 | + "# 'download_ifspace': like download, but only proceeds \n", |
| 43 | + "# if your machine have sufficient storage\n", |
| 44 | + "# 's3_open': access datasets in-cloud from an AWS instance\n", |
| 45 | + "# 's3_open_fsspec': use jsons generated with fsspec and \n", |
| 46 | + "# kerchunk libraries to speed up in-cloud access\n", |
| 47 | + "# 's3_get': direct download from S3 in-cloud to an AWS instance\n", |
| 48 | + "# 's3_get_ifspace': like s3_get, but only proceeds if your instance \n", |
| 49 | + "# has sufficient storage\n", |
| 50 | + "access_mode = 'download_ifspace'" |
36 | 51 | ] |
37 | 52 | }, |
38 | 53 | { |
|
65 | 80 | "## ================\n", |
66 | 81 | "\n", |
67 | 82 | "\n", |
68 | | - "# indicate whether you are working in a cloud instance (True if yes, False otherwise)\n", |
69 | | - "incloud_access = False\n", |
| 83 | + "## Set top-level file directory for the ECCO NetCDF files\n", |
| 84 | + "## =================================================================\n", |
70 | 85 | "\n", |
71 | | - "\n", |
72 | | - "## change ECCO_dir as needed\n", |
| 86 | + "## currently set to /Downloads/ECCO_V4r4_PODAAC\n", |
73 | 87 | "ECCO_dir = join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC')\n", |
74 | 88 | "\n", |
| 89 | + "# # for access_mode = 's3_open_fsspec', need to specify the root directory \n", |
| 90 | + "# # containing the jsons\n", |
| 91 | + "# jsons_root_dir = join('/efs_ecco','mzz-jsons')\n", |
75 | 92 | "\n", |
76 | | - "ShortNames_list = [\"ECCO_L4_TEMP_SALINITY_LLC0090GRID_MONTHLY_V4R4\"]\n", |
77 | | - "if incloud_access == True:\n", |
78 | | - " from ecco_s3_retrieve import ecco_podaac_s3_get_diskaware\n", |
79 | 93 | "\n", |
80 | | - " # retrieve files (download to instance if there is sufficient storage)\n", |
81 | | - " files_dict = ecco_podaac_s3_get_diskaware(ShortNames=ShortNames_list,\\\n", |
82 | | - " StartDate='2010-01',EndDate='2010-12',\\\n", |
83 | | - " max_avail_frac=0.5,\\\n", |
84 | | - " download_root_dir=ECCO_dir)\n", |
| 94 | + "ShortName = \"ECCO_L4_TEMP_SALINITY_LLC0090GRID_MONTHLY_V4R4\"\n", |
| 95 | + "\n", |
| 96 | + "# # Method 1: use ecco_podaac_access\n", |
| 97 | + "# # \n", |
| 98 | + "# # retrieve files\n", |
| 99 | + "# files_dict = ea.ecco_podaac_access(ShortName,\\\n", |
| 100 | + "# StartDate='2010-01',EndDate='2010-12',\\\n", |
| 101 | + "# mode=access_mode,\\\n", |
| 102 | + "# download_root_dir=ECCO_dir,\\\n", |
| 103 | + "# max_avail_frac=0.5)\n", |
| 104 | + "# # load file into workspace\n", |
| 105 | + "# theta_dataset = xr.open_mfdataset(files_dict[ShortName],parallel=True,\\\n", |
| 106 | + "# data_vars='minimal',coords='minimal',compat='override')\n", |
| 107 | + "\n", |
| 108 | + "\n", |
| 109 | + "# # Method 2: use ecco_podaac_to_xrdataset\n", |
85 | 110 | "\n", |
86 | | - " # load file into workspace\n", |
87 | | - " theta_dataset = xr.open_mfdataset(files_dict[ShortNames_list[0]],parallel=True,\\\n", |
88 | | - " data_vars='minimal',coords='minimal',compat='override') \n", |
89 | | - "else:\n", |
90 | | - " curr_dir = join(ECCO_dir,ShortNames_list[0])\n", |
91 | | - " import glob\n", |
92 | | - " \n", |
93 | | - " # find files on disk (assumes that they were downloaded in the last tutorial)\n", |
94 | | - " files_to_load = list(glob.glob(join(curr_dir,'*2010*nc')))\n", |
95 | | - " \n", |
96 | | - " # load file into workspace\n", |
97 | | - " theta_dataset = xr.open_mfdataset(files_to_load, parallel=True,\\\n", |
98 | | - " data_vars='minimal',coords='minimal',compat='override')" |
| 111 | + "theta_dataset = ea.ecco_podaac_to_xrdataset(ShortName,\\\n", |
| 112 | + " StartDate='2010-01',EndDate='2010-12',\\\n", |
| 113 | + " mode=access_mode,\\\n", |
| 114 | + " download_root_dir=ECCO_dir,\\\n", |
| 115 | + " max_avail_frac=0.5)" |
99 | 116 | ] |
100 | 117 | }, |
101 | 118 | { |
|
4205 | 4222 | "source": [ |
4206 | 4223 | "## All ECCOv4 coordinates\n", |
4207 | 4224 | "\n", |
4208 | | - "Now that we have been oriented to the dimensions and coordinates used by ECCOv4, let's examine a ``Dataset`` that uses all of them, an ECCOv4r4 NetCDF grid file. The grid file for the native LLC90 grid has ShortName **ECCO_L4_GEOMETRY_LLC0090GRID_V4R4**. It does not have time dimensions, but we can put any StartDate and EndDate between 1992-01-01 and 2018-01-01 into the `ecco_podaac_download` function (or `ecco_podaac_s3_get_diskaware` function in the cloud) and it should retrieve the file. Then the file can be opened; in this case we'll use `open_dataset` that loads a single file into memory in our workspace." |
| 4225 | + "Now that we have been oriented to the dimensions and coordinates used by ECCOv4, let's examine a ``Dataset`` that uses all of them, an ECCOv4r4 NetCDF grid file. The grid file for the native LLC90 grid has ShortName **ECCO_L4_GEOMETRY_LLC0090GRID_V4R4**. It does not have time dimensions, so we do not need to specify a StartDate or EndDate, though if they are specified any dates in the ECCOv4 data range (1992-2017) should work. The function `ecco_podaac_to_xrdataset` uses *lazy* opening of the dataset (`open_mfdataset`) by default so it is not loaded into memory; if we want to load the data into our workspace memory we can append `.compute()`." |
4209 | 4226 | ] |
4210 | 4227 | }, |
4211 | 4228 | { |
|
4251 | 4268 | "source": [ |
4252 | 4269 | "## download file containing grid parameters and load into workspace\n", |
4253 | 4270 | "\n", |
4254 | | - "ShortNames_list = [\"ECCO_L4_GEOMETRY_LLC0090GRID_V4R4\"]\n", |
4255 | | - "if incloud_access == True:\n", |
4256 | | - " from ecco_s3_retrieve import ecco_podaac_s3_get_diskaware\n", |
4257 | | - "\n", |
4258 | | - " # retrieve file (download to instance if there is sufficient storage)\n", |
4259 | | - " files_dict = ecco_podaac_s3_get_diskaware(ShortNames=ShortNames_list,\\\n", |
4260 | | - " StartDate='2010-01',EndDate='2010-12',\\\n", |
4261 | | - " max_avail_frac=0.5,\\\n", |
4262 | | - " download_root_dir=ECCO_dir)\n", |
4263 | | - "\n", |
4264 | | - " # load file into workspace\n", |
4265 | | - " grid_dataset = xr.open_dataset(files_dict[ShortNames_list[0]])\n", |
4266 | | - " \n", |
4267 | | - "else:\n", |
4268 | | - " from ecco_download import ecco_podaac_download\n", |
| 4271 | + "ShortName = \"ECCO_L4_GEOMETRY_LLC0090GRID_V4R4\"\n", |
4269 | 4272 | "\n", |
4270 | | - " # download grid file\n", |
4271 | | - " file_to_load = ecco_podaac_download(ShortName=grid_params_shortname,\\\n", |
4272 | | - " StartDate=\"2010-01-01\",EndDate=\"2010-01-01\",\\\n", |
4273 | | - " download_root_dir=ECCO_dir,n_workers=6,force_redownload=False,\\\n", |
4274 | | - " return_downloaded_files=True)\n", |
4275 | | - " \n", |
4276 | | - " # load file into workspace\n", |
4277 | | - " grid_dataset = xr.open_dataset(file_to_load)" |
| 4273 | + "# retrieve file (download to instance if there is sufficient storage)\n", |
| 4274 | + "grid_dataset = ea.ecco_podaac_to_xrdataset(ShortName,\\\n", |
| 4275 | + " mode=access_mode,\\\n", |
| 4276 | + " download_root_dir=ECCO_dir,\\\n", |
| 4277 | + " max_avail_frac=0.5).compute()" |
4278 | 4278 | ] |
4279 | 4279 | }, |
4280 | 4280 | { |
|
5387 | 5387 | "source": [ |
5388 | 5388 | "## download file containing monthly mean ocean velocities for March 2010, and load into workspace\n", |
5389 | 5389 | "\n", |
5390 | | - "ShortNames_list = [\"ECCO_L4_OCEAN_VEL_LLC0090GRID_MONTHLY_V4R4\"]\n", |
5391 | | - "if incloud_access == True:\n", |
5392 | | - " # retrieve file (download to instance if there is sufficient storage)\n", |
5393 | | - " files_dict = ecco_podaac_s3_get_diskaware(ShortNames=ShortNames_list,\\\n", |
5394 | | - " StartDate='2010-03',EndDate='2010-03',\\\n", |
5395 | | - " max_avail_frac=0.5,\\\n", |
5396 | | - " download_root_dir=ECCO_dir)\n", |
| 5390 | + "ShortName = \"ECCO_L4_OCEAN_VEL_LLC0090GRID_MONTHLY_V4R4\"\n", |
5397 | 5391 | "\n", |
5398 | | - " # load file into workspace\n", |
5399 | | - " vel_dataset = xr.open_dataset(files_dict[ShortNames_list[0]])\n", |
5400 | | - " \n", |
5401 | | - "else:\n", |
5402 | | - " # download velocity file\n", |
5403 | | - " file_to_load = ecco_podaac_download(ShortName=vel_shortname,\\\n", |
5404 | | - " StartDate='2010-03',EndDate='2010-03',\\\n", |
5405 | | - " download_root_dir=ECCO_dir,n_workers=6,force_redownload=False,re)\n", |
5406 | | - " \n", |
5407 | | - " # load file into workspace\n", |
5408 | | - " vel_dataset = xr.open_dataset(file_to_load)" |
| 5392 | + "# retrieve file (download to instance if there is sufficient storage)\n", |
| 5393 | + "vel_dataset = ea.ecco_podaac_to_xrdataset(ShortName,\\\n", |
| 5394 | + " StartDate='2010-03',EndDate='2010-03',\\\n", |
| 5395 | + " mode=access_mode,\\\n", |
| 5396 | + " download_root_dir=ECCO_dir,\\\n", |
| 5397 | + " max_avail_frac=0.5).compute()" |
5409 | 5398 | ] |
5410 | 5399 | }, |
5411 | 5400 | { |
|
6211 | 6200 | "name": "python", |
6212 | 6201 | "nbconvert_exporter": "python", |
6213 | 6202 | "pygments_lexer": "ipython3", |
6214 | | - "version": "3.11.8" |
| 6203 | + "version": "3.11.9" |
6215 | 6204 | } |
6216 | 6205 | }, |
6217 | 6206 | "nbformat": 4, |
|
0 commit comments