output at flexible time levels#217
output at flexible time levels#217guoqing-noaa wants to merge 9 commits intoufs-community:gsl/developfrom
Conversation
…ttribute Introduce a new `output_timelevels` attribute for MPAS streams that enables variable output intervals. With this capability, we may outoput every 15 minutes in the first hour, every hour in the first 3 days, every 3 hours for the next 4 days, and every 6 hours in the last 3 days. We can also use this to only write out forecast files during a given period, such as: output_timelevels="6-12h" Check the PR description for details on how to specify the time levles. Here is a quick example: output_timelevels="0-3-15m 4-72 75-168-3 174-240-6"
|
|
The following situations have been tested: |
output_timelevels stream attrbute|
For a conus12km 12 hours forecasts, I tried different settings for the the |
SamuelTrahanNOAA
left a comment
There was a problem hiding this comment.
These code changes are overcomplicated and will be difficult to extend or maintain. Any significant changes to the functionality will require a complicated rewrite.
More specific overview comments:
- Most of the parser code is a reimplementation of the Fortran standard
splitfunction. It would be better to callsplit. - Much of the time-related code reimplements the MPAS and ESMF alarm functionality. It would be better to extend the alarm functionality.
- Parser errors aren't reported. If you type something wrong, you don't know what it was.
- There are out-of-bounds array reads due to off-by-one errors.
- The new syntax deviates from the MPAS days_HH:MM:SS time specification.
- Unlike the rest of the MPAS I/O interface, this uses a cryptic string instead of XML tags. If you used XML tags, the parser code wouldn't be necessary at all.
If you want to use a parser generator, the original grammar specification should be in the repository, not the automatically-generated code. |
No, I don't intend to use a parser generator. I put EBNF there to make sure the overall timelevel specification logic is consistent and complete, and there are no hidden surprises. |
@SamuelTrahanNOAA Thanks for the comments and discussions! This is the first version to get things work. I also feel the implementation is kind of complicated and I would like to descope to meet our current needs only. My thought is to limit the timelevel specifications to only support minutes and hours (may also limit the output to have to start at 0h). I intentionally try to avoid using the same time format |
It is, indeed, a finely-polished EBNF with perfect indentation. However, it doesn't describe the code. Your |
What is more user-friendly is splitting them into individual XML tags so it is clear what is going on. That would use the existing MPAS parsing code (eliminating the parser). Also, it'll be less confusing to experienced MPAS users. <output_interval start="01:15:00" stop="02:30:00" step="00:15:00"/> <!-- every 15 minutes from 1:15 to 2:30 -->
<output_interval start="06:00:00"/> <!-- Only at hour 6 -->I can see how your string would be easier to pass through the rrfs-workflow bash scripts, but bash limitations shouldn't take precedence over MPAS code consistency and quality. |
|
Can you please provide:
|
|
@clark-evans I am sorry, this may not be ready for an official review yet. I should have put this in the draft mode since the beginning. |
This looks good. Does current MPAS code support this? |
No. I'm suggesting it as an alternative implementation to your recursive descent parser. |
|
@SamuelTrahanNOAA Thanks for lots of great discussions! For the moment, this PR is mainly for a test/demo purpose. I think my initial need is much simpler that the goal in this PR. We want to output For our current needs, I think NCAR Andy Stokely's changes may work. I will test that first. I was not aware of Andy's changes until 2 days ago. |
|
WRF was limited to start...step...end for each stream. We were able to work around that by having multiple output streams. (Recall that the operational models HRRR, HWRF, NAM, and RAP were all WRF-based, along with other quasi-operational models.) |
|
@guoqing-noaa @SamuelTrahanNOAA For example, output_fh = 0.2 0.4 0.6 0.8 1 2 3 6 9 12, would provide output every 12min for the first hour, every hour for the next three hours, and every three hours for the remainder of the simulation. |
@dustinswales It is good to know this capability in UFS.
Thanks! |
I believe output_fh would be just need to be really long in this case,output_fh: 0.25 0.50 0.75 1 2 3 4 5 .... 72 75 78 .... 168 174 180 ... 240. I'm sure we could find a way to distill this information if it become a pain.
This is mostly true.
This is going to be the biggest change for Standalone users when moving to inline MPAS. We use the MPAS framework for native input/output, but we do not use the MPAS stream manager as is done in MPAS standalone. |
|
FYI: NOAA GSL doesn't do climate or seasonal research. Hence, the capability to output on multi-week timescales won't affect our work. It is a good question to ask, though, since others in NOAA do seasonal forecasts. I'd expect forecasts out to 90 days for some applications, just not ours. (I just wanted to clarify things since the word "climate" showed up in discussion.) |
@dustinswales I think it is totally fine to enumerate all output hours inside the model. However, it will be very beneficial for users for specify this using a simplified interface, for example, |
|
This PR is ready for review now.
(Fortran 2023's All the previous tests listed in post#3 worked as expected. We plan to use this PR for the upcoming Spring Forecast Experiment. Thanks! |
(currently the only output immutable stream is restart. We don't need flexilbe output for restart files. But add this capability to be complete)
- improve the parse_output_timelevel_spec(...) function using mpas_split_string(...) - use ABSOLUTE time from start_time to avoid floating-point drift to ensure discrete times like 0m 15m 1h3m 2 3 hit exactly
|
I will comment on the changes directly hoping to help with the review process, but here is a summary: MPAS-Model uses alarms to trigger outputs, each stream registers its own alarmID. all other code changes are helper functions to fulfill the |
| character(len=StrKIND) :: filename_template | ||
| character(len=StrKIND) :: filename_interval | ||
| character(len=StrKIND) :: output_timelevels = '' | ||
| type (MPAS_timelevel_spec_type) :: timelevel_spec |
There was a problem hiding this comment.
In this file, we introduced two variables output_timelevels and timelevel_spec.
output_timelevelsis the newly added stream attribute to allow users to specify flexible output time levels.timelevel_specis to store segments parsed from theoutput_timelevelsspecification. It uses a derive data typeMPAS_timelevel_spec_type, which contains the start, end hours of segments, as well as the interval minutes (for itemized time levels, start_hour=end_hour and interval_minutes is ignored). This allows us to parseoutput_timelevelsonly once at the beginning and store the results for subsequent use.
| call update_variable_output_alarm(manager, stream, alarm_cursor % name, ierr=local_ierr) | ||
| else | ||
| call mpas_reset_clock_alarm(manager % streamClock, alarm_cursor % name, ierr=local_ierr) | ||
| end if |
There was a problem hiding this comment.
Lines 1513-1520:
If we use output time levels, call update_variable_output_alarm(...)
otherwise, nothing changes, call mpas_reset_clock_alarm(...) as before
| ! But ONLY if no variable stream already handled it | ||
| if (.not. resetAlarms) then | ||
| call mpas_reset_clock_alarm(manager % streamClock, alarm_cursor % name, ierr=local_ierr) | ||
| end if |
There was a problem hiding this comment.
lines 1549-1566:
This part is similar to lines 1513-1520, but deals with the situation when MPAS_stream_mgr_reset_alarms(...) is called without a streamID.
This usually means we may have different alarms ringing at the same time for a given alarmID. So we check all linked streams for a given alarmID, if using the output time levels, call update_variable_output_alarm(...). Otherwise, if none of the linked streams using the the output time levels, nothing changes, call mpas_reset_clock_alarm(...) as before.
| ! Saft guard: reset timelevel_spec if output_timelevels is somehow cleared | ||
| stream_cursor % timelevel_spec % n_segments = 0 | ||
| stream_cursor % timelevel_spec % is_parsed = .false. | ||
| end if |
There was a problem hiding this comment.
Lines 1888-1901: This part is inside MPAS_stream_mgr_set_property_char(...) and easy to understand.
if users specify output_timelevels, assign it to stream_cursor % output_timelevels and if it is a non-empty string, call parse_all_timelevels(...) to parse this specification to the timelevel_spec derived data type so that we only do the parser one time and use it easily later.
| end do | ||
| return | ||
| end if | ||
| end if |
There was a problem hiding this comment.
Lines 3193-3233: this part is in write_stream(...).
If we don't use ouptut_timelevels, this part is skipped and nothing changes.
If we use output_timelevels, we do some additional check to see whether if current forecast hour is within defined range to prevent output when the alarm rings but we're past the timelevels range. And it will schedule a new alarm if we still have segments in timelevels_spec
| ! Use actual write time for filename (each output gets unique file) | ||
| call mpas_get_time(writeTime, dateTimeString=time_string) | ||
| call mpas_expand_string(time_string, blockID, stream % filename_template, stream % filename) | ||
| else if ( stream % filename_interval /= 'none' ) then |
There was a problem hiding this comment.
Lines 3263-3267: if we use output_timelevels, stream % filename_interval is set to output internally and then the filename will be generated here before each output.
If we don't use output_timelevels, nothing changes and we will the previous method to generate filenames based on filename_interval which is assigned from output_interval.
| ! Use actual write time for filename (each output gets unique file) | ||
| call mpas_get_time(writeTime, dateTimeString=time_string) | ||
| call mpas_expand_string(time_string, blockID, stream % filename_template, temp_filename) | ||
| else if ( stream % filename_interval /= 'none' ) then |
There was a problem hiding this comment.
Lines 3342-3346: same as lines 3263-3267, but handle the situation when a stream has been created (and no need to start from scratch)
| else | ||
| call mpas_set_timeInterval(filename_interval, timeString=stream % filename_interval) | ||
| call mpas_build_stream_filename(stream % referenceTime, filenameTime, filename_interval, stream % filename_template, blockID_local, test_filename, ierr=local_ierr) | ||
| end if |
There was a problem hiding this comment.
Lines 3777-3960: all these changes are in read_stream(...). It allows we read a stream correctly for an input/output stream with output_timelevels configured.
| ! Use actual time for filename | ||
| call mpas_get_time(now_time, dateTimeString=when_string, ierr=err_local) | ||
| call mpas_expand_string(when_string, blockID_local, streamCursor % filename_template, filename) | ||
| else if ( streamCursor % filename_interval /= 'none' ) then |
There was a problem hiding this comment.
Lines 4191-4195: similar as before, now in mpas_get_stream_filename(...)
| ! Write output_interval to stream | ||
| ! | ||
| IF (stream %filename_interval(1:4) /= "none") THEN | ||
| IF (stream %filename_interval(1:4) /= "none" .and. stream %filename_interval /= "output") THEN |
There was a problem hiding this comment.
If we use output_timelevels, we don't have a fix output_interval, so skip writing output_interval to stream
| ierr = 1 | ||
| end if | ||
|
|
||
| end subroutine parse_all_timelevels!}}} |
There was a problem hiding this comment.
lines 6037-6123: Parse the entire output_timelevels string into a timelevel_spec structure
This is a helper function and rewritten using mpas_split_string(...). I hope it is much easier to read now.
| return | ||
| end if | ||
|
|
||
| end subroutine parse_time_string!}}} |
There was a problem hiding this comment.
lines 6126-6215: This is also a helper function.
Parses time strings like "1h30m", "45m", "6", "90s", "1d" into total minutes
Supported units: d/D (days), h (hours), m (minutes), s (seconds)
Plain integers without units are interpreted as hours
| 'expected format is start-stop-step, but appears to be start-step-stop', MPAS_LOG_ERR) | ||
| end if | ||
|
|
||
| end subroutine parse_output_timelevel_spec!}}} |
There was a problem hiding this comment.
lines 6218-6341: This is also a helper function.
Parses a single segment using format: start, start-stop, or start-stop-step
Time strings can be integers (hours) or duration format (e.g., "1h30m", "45m")
Examples: "6", "0-3", "0-1h-15m", "1h30m-2h-15m"
| ierr = MPAS_STREAM_MGR_ERROR ! Signal no more output | ||
| end if | ||
|
|
||
| end subroutine get_output_interval_from_timelevels!}}} |
There was a problem hiding this comment.
lines 6344-6441: this is also a helper function:
Given a pre-parsed timelevel_spec and current forecast hour, determines the appropriate output interval in minutes. It handles both the situations of itemized timelevels or time level ranges.
| ierr = MPAS_STREAM_MGR_ERROR ! No future ranges | ||
| end if | ||
|
|
||
| end subroutine get_next_timelevel_start!}}} |
There was a problem hiding this comment.
lines 6444-6502: a helper function again.
Given a time levels string and current forecast hour, finds the start hour of the next timelevel range (if any). Returns -1 if no future ranges exist.
|
|
||
| if (present(ierr)) ierr = local_ierr | ||
|
|
||
| end subroutine update_variable_output_alarm!}}} |
There was a problem hiding this comment.
lines 6505-6597: This is the core function of this PR, update_variable_output_alarm
It updates a variable output alarm after it has rung by:
- Checking if the stream uses output_timelevels (via timelevel_spec)
- Calculating current forecast hour (current time - start time)
- Getting the next interval from get_output_interval_from_timelevels
- Removing the old alarm and creating a new one with the updated interval
| ierr_c = 1 | ||
| end if | ||
|
|
||
| end subroutine stream_mgr_add_variable_output_alarm_c !}}} |
There was a problem hiding this comment.
lines 7022-7166: This part provides two functions stream_mgr_set_property_c and stream_mgr_add_variable_output_alarm_c for the xml_stream_parser.c to use at the beginning time when read the streams.atmosphere XML file: decode the output_timelevles part, set the stream properties and create an initial alarm if output_timelevles is defined.
| *status = 1; | ||
| return; | ||
| } | ||
| snprintf(msgbuf, MSGSIZE, " %-20s%s", "output alarm:", "variable (from output_timelevels)"); |
There was a problem hiding this comment.
The changes on xml_stream_parser.c are more straightforward, most are copy and paste and then minor changes to match the output_timelevels situation. Feel free to add comments for any questions.
In a nutshell, this PR attaches to the main code base a new orphan island, which does not affect any existing capabilities if output_timelevels is not used. When output_timelevels is used, it enables the capability to output at flexible time levels.
SamuelTrahanNOAA
left a comment
There was a problem hiding this comment.
I wouldn't approve this for merge to the NCAR authoritative MPAS, but I am willing to approve it for merge to the GSL branch as a temporary workaround. GSL JEDI/MPAS data assimilation development needs this feature, and we can't stall much longer. This is our only available solution, it works, and the fork will probably still be manageable.
Note that the UFS MPAS has a totally different I/O subsystem, so it will replace this workaround eventually.
My objections are:
- The overall length and complexity of the changes will make it harder to maintain the fork.
- Using a custom string format necessitates an over-complicated parser since Fortran is ill-suited as a string processing language.
- A string list instead of XML tags deviates from the MPAS design in a way that reduces flexibility and end-user understandability.
- The implementation restricts the number of output time specifications to a fixed number rather than using proper data storage, such as a linked list.
- Sending the output intervals from C to Fortran after the rest of the tag adds lots of code to send that information later. Had the output interval been sent at the same time as the other attributes, this would have been much shorter.
Introduce a new
output_timelevelsattribute for MPAS streams that enables variable output intervals.With this capability, we may outoput every 15 minutes in the first hour, every hour in the first 3 days, every 3 hours for the next 4 days, and every 6 hours in the last 3 days.
We can also use this to only write out forecast files during a given period, such as:
output_timelevels="6-12h"Check the PR description below for details on how to specify the time levles.
Here is a quick example:
output_timelevels="0-3-15m 4-72 75-168-3 174-240-6"FYI, with this PR, we may write out mpasout files at
"0 1 2 3 6 9 12", while history files at"4 5 7 8 11 13-99999"to avoid duplicate output of both mpasout and history files at the same time levels, which are almost the same.This PR solves issue #214
Priority Reviewers