-
Notifications
You must be signed in to change notification settings - Fork 354
feat: create trait definitions for model and streamable model #833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Very interesting approach. It would be great to see some tests for the sorts of workflows you're seeking to enable here, and that would also help for review too. |
|
Sure! should I create a new entry in the test folder for this? would there be a better place to put the tests? |
|
Yes, in the tests folder, we have one big test app that is built altogether and tested as one. |
|
There isn't currently an AI binding in the tests wrangler.toml, I can test locally on my account via a new binding but this will break CI right? |
|
You should just be able to add a new binding there, if there's issues with that happy to look into it further. |
|
@guybedford I've added new test cases to show a very simple example implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work, this looks incredible. Thank you for working on this. Just a couple of points about standardizing the input and output options - I know this gets into the deeper bindings questions, but would be good to have a plan at least.
test/src/ai.rs
Outdated
|
|
||
| use crate::SomeSharedData; | ||
|
|
||
| pub struct Llama4Scout17b16eInstruct; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this seems like the right approach!
…associated methods exsist
|
Hey! I'm eager to merge this, do you have any feedback on what needs to be changed? |
guybedford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parzivale thanks for your patience here, the implementation looks really clean although I am a little bit concerned we may want to adapt the API in future.
We are okay with making breaking changes though so it would certainly be better to land and release than not.
I'm still not quite sure I follow why we need a custom Stream type for AI specifically over just improving the lower-level streaming primitives to support the use case. I'm also unclear on why we need a new type of typed array binding.
The idea behind that was to split out the different return types, for streaming replies we don't expect a type T that implements Deserialize/Serialize like we would normally do with a model response. Instead we expect a stream of T. I would be happy to break out the streaming implementation into something else, but I do think it is important to distinguish between streaming and non-streaming responses. |
Would typed ReadableStream solve this as an alternative if we had that generic? |
No as there needs to be an additional wrapper as workers ai stream returns see events rather than just the T we need, so there needs to be a bit of postprocessing for things like tool calls when streaming. |
Could this be interpreted as a transform stream? |
|
Yes it could! That would be a pretty good solution, the only problem is documentation, could we type alias it so in docs we can describe usage? |
|
My normal first stop when dealing with return types is to go to definition and using a transform stream would make that harder |
guybedford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for your patience. It's looking really great and so close to the ideal output that we want.
Feedback is all again for alignment with the ideal bindgen we are aiming for with upstream changes.
As much as we can get right now is a great help.
worker-sys/src/types/ai.rs
Outdated
| impl RoleScopedChatInputArray { | ||
| pub fn custom_role(&self, role: &str, content: &str) -> &Self { | ||
| let message = RoleScopedChatInput::new(); | ||
| message.set_role_inner(role); | ||
| message.set_content_inner(content); | ||
| self.push(&message); | ||
| self | ||
| } | ||
|
|
||
| pub fn user(self, content: &str) -> Self { | ||
| self.custom_role("user", content); | ||
| self | ||
| } | ||
|
|
||
| pub fn assistant(self, content: &str) -> Self { | ||
| self.custom_role("assistant", content); | ||
| self | ||
| } | ||
|
|
||
| pub fn system(self, content: &str) -> Self { | ||
| self.custom_role("system", content); | ||
| self | ||
| } | ||
|
|
||
| pub fn tool(self, content: &str) -> Self { | ||
| self.custom_role("tool", content); | ||
| self | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than putting these on the array, let's put these on the message builder itself.
So we would rather use directly:
messages.push(
RoleScopedChatInput::builder()
.role("user")
.content("hello")
.build()?
);
We may yet go either way on array generics - wrapper types or not. I don't want to assume one direction yet though. We definitely want builders at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually a generic array builder would be a useful pattern I suppose, but the trick with this would be constructing a builder API that can in theory support optimized creation of the whole array in a single buffer serialization (as an optimization, we don't actually have to do it yet).
That is, something more like:
messages.builder()
.push_builder(|b| b.role("user").content("hello"))
.push_builder(|b| b.role("user").content("hello"))
.build()where the push_builder method on TypedArray<T> is works for T impl Builder for some definition of a builder trait?
Or something along those lines. Let me know if that make sense where I'm trying to go with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a generic array builder. I'm a little confused with the single buffer serialization optimization as the items in the array are jsvalues and are not serializeable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also currently the generic array isn't really a generic array just a helper macro which generates an array with typed helper methods, I can refactor to make a proper generic array. Though that might come with complications as I would need to cast every time an access is made rather than just use wasm-bindgen to define the return type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In due course we will be getting a generic array upstream, just using that term here for now. I understand your macro is a custom type and I'm fine with this approach unflagged with the assumption that the change won't be too destructive down the line.
If we could do a generic array that would in theory be preferable (TypedArray<T> wrapping Array), but I've set enough requirements on this PR here I feel! Separately I would be interested to dig into your cast concern more as I didn't follow that, and it would help the upstream conversation there.
As for the optimization - the idea is that every call from Wasm to JS incurs a performance overhead. If we can batch those calls through an optimization as a single call that should be more performant, turning an array of N values from N calls out to JS to 1. Again not a hard constraint, just sharing the design space thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically just that a Rust builder could theoretically implement such optimizations, but we don't actually need to do them now of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the casting concerns that was due to me misunderstanding what you wanted and assuming you wanted a new array wrapper which looked like this
pub struct Array<T> {
array: js_sys::Array,
phantom: PhantomData<T>
}Which wrapped all the associated methods on array with an extra casting step to T.
Regarding the optimization, I see what you mean I got a little confused due to the serialized wording and assumed you wanted something like a serde_wasm_bindgen call as part of the array builder build call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed an implementation of this, I tried to keep the bulk move to js in mind when constructing the builder, the push functionality to the array builder should probably have some refinement but I wanted to get it in front of your eyes first. Finally I would like to move the typedArray macro over to a proc macro before merging as I'm reaching for dirtier and dirtier hacks to avoid polluting the name space (that's why the exported typed array is referenced as RoleScopedChatInputArray::RoleScopedChatInputArray
|
|
||
| #[derive(Debug)] | ||
| #[pin_project] | ||
| pub struct AiStream<T: StreamableModel> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said I'm still not clear why AiStream is unique from a generic stream. Couldn't we just have TypedReadeableStream like you did for array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep we can I'll do what I said previously here and type alias it so we can document the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on this tommorow
The main objective behind this pr is to simplify working with streaming responses from text generation models, however the design should be flexible enough for other applications.
For now the Model and StreamableModel are left to the user to implement for their use case, but the ideal end goal for this pr is to define types for all the I/O interfaces workers-ai uses and then seal Model and StreamableModel.