Skip to content

Conversation

@parzivale
Copy link
Contributor

@parzivale parzivale commented Sep 28, 2025

The main objective behind this pr is to simplify working with streaming responses from text generation models, however the design should be flexible enough for other applications.

For now the Model and StreamableModel are left to the user to implement for their use case, but the ideal end goal for this pr is to define types for all the I/O interfaces workers-ai uses and then seal Model and StreamableModel.

@guybedford
Copy link
Collaborator

Very interesting approach. It would be great to see some tests for the sorts of workflows you're seeking to enable here, and that would also help for review too.

@parzivale
Copy link
Contributor Author

Sure! should I create a new entry in the test folder for this? would there be a better place to put the tests?

@guybedford
Copy link
Collaborator

Yes, in the tests folder, we have one big test app that is built altogether and tested as one.

@parzivale
Copy link
Contributor Author

There isn't currently an AI binding in the tests wrangler.toml, I can test locally on my account via a new binding but this will break CI right?

@guybedford
Copy link
Collaborator

You should just be able to add a new binding there, if there's issues with that happy to look into it further.

@parzivale
Copy link
Contributor Author

@guybedford I've added new test cases to show a very simple example implementation

Copy link
Collaborator

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work, this looks incredible. Thank you for working on this. Just a couple of points about standardizing the input and output options - I know this gets into the deeper bindings questions, but would be good to have a plan at least.

test/src/ai.rs Outdated

use crate::SomeSharedData;

pub struct Llama4Scout17b16eInstruct;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this seems like the right approach!

@parzivale
Copy link
Contributor Author

Hey! I'm eager to merge this, do you have any feedback on what needs to be changed?

Copy link
Collaborator

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parzivale thanks for your patience here, the implementation looks really clean although I am a little bit concerned we may want to adapt the API in future.

We are okay with making breaking changes though so it would certainly be better to land and release than not.

I'm still not quite sure I follow why we need a custom Stream type for AI specifically over just improving the lower-level streaming primitives to support the use case. I'm also unclear on why we need a new type of typed array binding.

@parzivale
Copy link
Contributor Author

parzivale commented Oct 31, 2025

I'm still not quite sure I follow why we need a custom Stream type for AI specifically over just improving the lower-level streaming primitives to support the use case.

The idea behind that was to split out the different return types, for streaming replies we don't expect a type T that implements Deserialize/Serialize like we would normally do with a model response. Instead we expect a stream of T. I would be happy to break out the streaming implementation into something else, but I do think it is important to distinguish between streaming and non-streaming responses.

@guybedford
Copy link
Collaborator

The idea behind that was to split out the different return types, for streaming replies we don't expect a type T that implements Deserialize/Serialize like we would normally do with a model response. Instead we expect a stream of T. I would be happy to break out the streaming implementation into something else, but I do think it is important to distinguish between streaming and non-streaming responses.

Would typed ReadableStream solve this as an alternative if we had that generic?

@parzivale
Copy link
Contributor Author

parzivale commented Oct 31, 2025

The idea behind that was to split out the different return types, for streaming replies we don't expect a type T that implements Deserialize/Serialize like we would normally do with a model response. Instead we expect a stream of T. I would be happy to break out the streaming implementation into something else, but I do think it is important to distinguish between streaming and non-streaming responses.

Would typed ReadableStream solve this as an alternative if we had that generic?

No as there needs to be an additional wrapper as workers ai stream returns see events rather than just the T we need, so there needs to be a bit of postprocessing for things like tool calls when streaming.

@guybedford
Copy link
Collaborator

No as there needs to be an additional wrapper as workers ai stream returns see events rather than just the T we need, so there needs to be a bit of postprocessing for things like tool calls when streaming.

Could this be interpreted as a transform stream?

@parzivale
Copy link
Contributor Author

parzivale commented Oct 31, 2025

Yes it could! That would be a pretty good solution, the only problem is documentation, could we type alias it so in docs we can describe usage?

@parzivale
Copy link
Contributor Author

My normal first stop when dealing with return types is to go to definition and using a transform stream would make that harder

Copy link
Collaborator

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your patience. It's looking really great and so close to the ideal output that we want.

Feedback is all again for alignment with the ideal bindgen we are aiming for with upstream changes.

As much as we can get right now is a great help.

Comment on lines 20 to 48
impl RoleScopedChatInputArray {
pub fn custom_role(&self, role: &str, content: &str) -> &Self {
let message = RoleScopedChatInput::new();
message.set_role_inner(role);
message.set_content_inner(content);
self.push(&message);
self
}

pub fn user(self, content: &str) -> Self {
self.custom_role("user", content);
self
}

pub fn assistant(self, content: &str) -> Self {
self.custom_role("assistant", content);
self
}

pub fn system(self, content: &str) -> Self {
self.custom_role("system", content);
self
}

pub fn tool(self, content: &str) -> Self {
self.custom_role("tool", content);
self
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than putting these on the array, let's put these on the message builder itself.

So we would rather use directly:

messages.push(
    RoleScopedChatInput::builder()
        .role("user")
        .content("hello")
        .build()?
);

We may yet go either way on array generics - wrapper types or not. I don't want to assume one direction yet though. We definitely want builders at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a generic array builder would be a useful pattern I suppose, but the trick with this would be constructing a builder API that can in theory support optimized creation of the whole array in a single buffer serialization (as an optimization, we don't actually have to do it yet).

That is, something more like:

messages.builder()
  .push_builder(|b| b.role("user").content("hello"))
  .push_builder(|b| b.role("user").content("hello"))
  .build()

where the push_builder method on TypedArray<T> is works for T impl Builder for some definition of a builder trait?

Or something along those lines. Let me know if that make sense where I'm trying to go with this?

Copy link
Contributor Author

@parzivale parzivale Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a generic array builder. I'm a little confused with the single buffer serialization optimization as the items in the array are jsvalues and are not serializeable

Copy link
Contributor Author

@parzivale parzivale Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also currently the generic array isn't really a generic array just a helper macro which generates an array with typed helper methods, I can refactor to make a proper generic array. Though that might come with complications as I would need to cast every time an access is made rather than just use wasm-bindgen to define the return type

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In due course we will be getting a generic array upstream, just using that term here for now. I understand your macro is a custom type and I'm fine with this approach unflagged with the assumption that the change won't be too destructive down the line.

If we could do a generic array that would in theory be preferable (TypedArray<T> wrapping Array), but I've set enough requirements on this PR here I feel! Separately I would be interested to dig into your cast concern more as I didn't follow that, and it would help the upstream conversation there.

As for the optimization - the idea is that every call from Wasm to JS incurs a performance overhead. If we can batch those calls through an optimization as a single call that should be more performant, turning an array of N values from N calls out to JS to 1. Again not a hard constraint, just sharing the design space thinking.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically just that a Rust builder could theoretically implement such optimizations, but we don't actually need to do them now of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the casting concerns that was due to me misunderstanding what you wanted and assuming you wanted a new array wrapper which looked like this

pub struct Array<T> {
   array: js_sys::Array,
   phantom: PhantomData<T>
}

Which wrapped all the associated methods on array with an extra casting step to T.

Regarding the optimization, I see what you mean I got a little confused due to the serialized wording and assumed you wanted something like a serde_wasm_bindgen call as part of the array builder build call.

Copy link
Contributor Author

@parzivale parzivale Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed an implementation of this, I tried to keep the bulk move to js in mind when constructing the builder, the push functionality to the array builder should probably have some refinement but I wanted to get it in front of your eyes first. Finally I would like to move the typedArray macro over to a proc macro before merging as I'm reaching for dirtier and dirtier hacks to avoid polluting the name space (that's why the exported typed array is referenced as RoleScopedChatInputArray::RoleScopedChatInputArray


#[derive(Debug)]
#[pin_project]
pub struct AiStream<T: StreamableModel> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said I'm still not clear why AiStream is unique from a generic stream. Couldn't we just have TypedReadeableStream like you did for array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep we can I'll do what I said previously here and type alias it so we can document the output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on this tommorow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants