Skip to content

Timed infinite loop fix#80

Open
strohel wants to merge 4 commits into
mainfrom
timed-infinite-loop-fix
Open

Timed infinite loop fix#80
strohel wants to merge 4 commits into
mainfrom
timed-infinite-loop-fix

Conversation

@strohel
Copy link
Copy Markdown
Member

@strohel strohel commented Jan 12, 2024

timed tests: rename test, assert only after shutdown, extend timeline comments

Add test for issue #79

Not currently failing, just demonstrates the unfortunate behavior.

timed: fix #79 by enqueueing (rather than handling) delayed messages when due

Also rename fire_at to enqueue_at to be explicit about the fact.

This is a trade-off that prevents 2 sorts of bad behavior:

See the tweaked tests for the change of behavior.


Checking how the test results change commit-by-commit should give a good story.

Not currently failing, just demonstrates the unfortunate behavior.
…when due

Also rename `fire_at` to `enqueue_at` to be explicit about the fact.

This is a trade-off that prevents 2 sorts of bad behavior:
- actors with send-sending messages never handling any delayed/recurring messages (#72)
- actors with recurring messages that are slower to handle than their interval eventually not processing any outside messages (#79)

See the tweaked tests for the change of behavior.
Copy link
Copy Markdown
Member

@bschwind bschwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I have a small question, and a spelling fix.

Comment thread src/timed.rs Outdated

Ok(())
}
// Message 2 is a recurring one, sleep based on a paremeter.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Message 2 is a recurring one, sleep based on a paremeter.
// Message 2 is a recurring one, sleep based on a parameter.

Comment thread src/timed.rs Outdated
Comment on lines +204 to +205
// Recurring and Delayed messages are only added to the queue when handled, and then go
// through actors priority inboxes again when actually enqueued.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble visualizing the message flow based on this comment.

only added to the queue when handled

In my mind, "handled" means the actor has run the handle() function on the message, so it's already out of the queue.

Is it possible to be more specific in the language here about the flow of the message from when send_delayed is called to when it actually ends up in the actor handle() function?

Copy link
Copy Markdown
Member

@goodhoko goodhoko Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. What about

Suggested change
// Recurring and Delayed messages are only added to the queue when handled, and then go
// through actors priority inboxes again when actually enqueued.
// These priorities apply to the set-up of Delayed and Recurring messages and we want to handle that pronto.
// The resulting inner message then comes back as `Instant` and is prioritized per its internal priority.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goodhoko you suggestion reads very good to me using it (with very small tweaks)

Copy link
Copy Markdown
Member

@goodhoko goodhoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for materialising our long discussion into an actual fix @strohel!

Behaviorally this looks all correct! I only have some suggestion for improving the comments. Let's give them good amount of attention as they're important for understanding the convoluted logic of Timed wrapper. .)

Comment thread src/timed.rs Outdated
Comment thread src/timed.rs Outdated
Comment on lines +204 to +205
// Recurring and Delayed messages are only added to the queue when handled, and then go
// through actors priority inboxes again when actually enqueued.
Copy link
Copy Markdown
Member

@goodhoko goodhoko Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. What about

Suggested change
// Recurring and Delayed messages are only added to the queue when handled, and then go
// through actors priority inboxes again when actually enqueued.
// These priorities apply to the set-up of Delayed and Recurring messages and we want to handle that pronto.
// The resulting inner message then comes back as `Instant` and is prioritized per its internal priority.

@strohel
Copy link
Copy Markdown
Member Author

strohel commented Jan 16, 2024

@bschwind @goodhoko I've pushed a fixup commit that extends comments as suggested (+ also extends the docs directly on TimedMessage variants). PTAL.

@strohel strohel force-pushed the timed-infinite-loop-fix branch from 6e473f3 to 294b065 Compare January 16, 2024 21:50
@strohel strohel force-pushed the timed-infinite-loop-fix branch from 294b065 to b4b0bc2 Compare January 16, 2024 21:50
Copy link
Copy Markdown
Member

@goodhoko goodhoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perfect! Especially the doc comments on TimedMessage variants are very helpfull. Good to merge IMO!

I suppose this can be released with a mere patch bump. Or do we consider the change in delivery order to be breaking?

@strohel
Copy link
Copy Markdown
Member Author

strohel commented Jan 17, 2024

I suppose this can be released with a mere patch bump. Or do we consider the change in delivery order to be breaking?

Good point. The rename of the TimedMessage::Delayed fire_at field to enqueue_at is technically semver-incompatible, so we should bump the minor version. It will at make it easier for donwstreams to downgrate actor version shuold they need it.

I'll prepare portal version that uses this PR before merging to get some real-life testing. It doesn't technically require any code changes (the renamed fields are not directly used). Also opportunity for @bschwind to check clarity of comments of the latest version.

@goodhoko
Copy link
Copy Markdown
Member

goodhoko commented Jan 17, 2024

The rename of the TimedMessage::Delayed fire_at field to enqueue_at is technically semver-incompatible

Ahaaa! I assumed that the enum field is private by default (as are struct fields) but actually enum variants and their fields always share the visibility of the enum itself. I.e. the field is part of the crate's public API.

so we should bump the minor version. It will at make it easier for donwstreams to downgrate actor version shuold they need it.

Per the semver spec, if it's backward incompatible we should bump the major version. Minor version bumps are for additions (backward compatible changes). It feels excessive but hey, it's just a version number.

@strohel
Copy link
Copy Markdown
Member Author

strohel commented Jan 17, 2024

Per the semver spec, if it's backward incompatible we should bump the major version. Minor version bumps are for additions (backward compatible changes). It feels excessive but hey, it's just a version number.

Yup, but 0.x.y versions are special, in both semver and cargo's semantics https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-cratesio

  • 0.x.y means "pre-production" in both. actor is arguably production-ready now, though I don't feel like doing that step (of updating from 0.x to 1.0) now
  • SemVer says e.g. 0.3.0 and 0.3.1 are never compatible
  • cargo says that 0.3.1 can be used when "0.3.0" or "0.3" are specified (patch version backward compatibility)
    • 0.3 and 0.4 are always incompatible even with cargo (while 1.4 can be used when 1.3 is requsted)

@goodhoko
Copy link
Copy Markdown
Member

@strohel Got it. I didn't realize actor is still pre-1.0. Thanks!

@strohel
Copy link
Copy Markdown
Member Author

strohel commented Jul 29, 2024

Tonari internal note: if we decide to revive this, the logic in https://github.com/tonarino/portal/pull/3692 may stop working.

@strohel
Copy link
Copy Markdown
Member Author

strohel commented Sep 19, 2025

This full version of the PR seems to cause some other issues like https://github.com/tonarino/portal/pull/2958#issuecomment-1901172737

But there may be a more minimal and less controversial version: "just" change TimedActor::process_queue() to use

if self.queue.peek().map(|m| m.enqueue_at <= Instant::now()).unwrap_or(false)

instead of

while self.queue.peek().map(|m| m.enqueue_at <= Instant::now()).unwrap_or(false)

I.e. process only one message of the queue at a time. The existing code should then schedule processing of the next one right away, but still yield back to the actor system in the mean time. Thoough the suggestion needs some scrutiny and a lot of testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inifinite loop when Timed actor's handle() is slower than recurring message interval

3 participants