Skip to content

Conversation

@cpubot
Copy link
Collaborator

@cpubot cpubot commented Dec 1, 2025

Background

The struct_extensions attribute, introduced in v0.1.0, provides convenience methods for manual SchemaRead implementations. In particular, it provides the following methods on the derived struct:

uninit_<field_name>_mut: Gets a mutable MaybeUninit projection to the <field_name> slot.
read_<field_name>: Reads into a MaybeUninit’s <field_name> slot from the given Reader.
write_uninit_<field_name>: Writes a MaybeUninit’s <field_name> slot with the given value.

While this does significantly reduce boilerplate, it doesn't do anything to help users with drop safety.

Consider the following example:

#[derive(SchemaRead, SchemaWrite)]
#[wincode(struct_extensions)]
struct Bytes {
    bytes: Vec<u8>,
    ar: Vec<[u8; 32]>,
}

struct Data {
    bytes: Bytes,
}

impl<'de> SchemaRead<'de> for Data {
    type Dst = Data;

    fn read(reader: &mut impl Reader<'de>, dst: &mut MaybeUninit<Self::Dst>) -> ReadResult<()> {
        let mut bytes = MaybeUninit::<Bytes>::uninit();
        Bytes::read_bytes(reader, &mut bytes)?;
        Bytes::read_ar(reader, &mut bytes)?;
        let init = unsafe { bytes.assume_init() };
        dst.write(Data { bytes: init });
        Ok(())
    }
}

Because there is no tracking of initialized fields, an error in the middle of deserializing will not drop initialized fields. Specifically these lines from the above example:

Bytes::read_bytes(reader, &mut bytes)?;
// If the following line errors, the above Vec<u8> leaks
Bytes::read_ar(reader, &mut bytes)?;

This isn't a problem with struct_extensions per se, but rather an implicit challenge when manually initializing MaybeUninit structs.

This PR

This PR reworks struct_extensions to provide drop-safe "Builder" structs that take care of field initialization bookkeeping. In particular, it generates a struct that takes a mutable reference to the MaybeUninit<T>, and provides the same functionality as the original, but with automatic initialization tracking and a Drop implementation that drops all initialized fields. The same code as above, with the builder:

#[derive(SchemaRead, SchemaWrite)]
#[wincode(struct_extensions)]
struct Bytes {
    bytes: Vec<u8>,
    ar: Vec<[u8; 32]>,
}

struct Data {
    bytes: Bytes,
}

impl<'de> SchemaRead<'de> for Data {
    type Dst = Data;

    fn read(reader: &mut impl Reader<'de>, dst: &mut MaybeUninit<Self::Dst>) -> ReadResult<()> {
        let mut bytes = MaybeUninit::<Bytes>::uninit();
        let mut bytes_builder = BytesUninitBuilder::from_maybe_uninit_mut(&mut bytes);
        bytes_builder.read_bytes(reader)?;
        bytes_builder.read_ar(reader)?;
        bytes_builder.finish()
        let init = unsafe { bytes.assume_init() };
        dst.write(Data { bytes: init });
        Ok(())
    }
}

This code is completely drop safe. If deserialization fails, the initialized fields are dropped when the builder is dropped. builder.assume_init_forget forgets the builder so that drop logic is no longer executed, so that we can subsequently assume_init on the underlying MaybeUninit.

There are additional affordances for more complicated cases, like multiple nested structs, for example.

#[derive(SchemaRead, SchemaWrite)]
#[wincode(struct_extensions)]
struct Bytes {
    bytes: Vec<u8>,
    ar: Vec<[u8; 32]>,
}

#[derive(SchemaWrite, SchemaRead)]
#[wincode(struct_extensions)]
struct Data {
    bytes: Bytes,
    other_byte_vec: Vec<u8>,
}

struct Outer {
    data: Data,
}

impl<'de> SchemaRead<'de> for Outer {
    type Dst = Outer;

    fn read(reader: &mut impl Reader<'de>, dst: &mut MaybeUninit<Self::Dst>) -> ReadResult<()> {
        let data = unsafe { &mut *(&raw mut (*dst.as_mut_ptr()).data).cast::<MaybeUninit<Data>>() };
        let mut data_builder = DataUninitBuilder::from_maybe_uninit_mut(data);
        unsafe {
            // Marks `bytes` field of `Data` as initialized if the function succeeds.
            data_builder.init_bytes_with(|bytes| {
                // Individual fields of `Bytes` are marked as initialized as reading progresses.
                // Initialized fields are dropped in reverse order on error or panic.
                let mut bytes_builder = BytesUninitBuilder::from_maybe_uninit_mut(bytes);
                bytes_builder.read_bytes(reader)?.read_ar(reader)?;
                bytes_builder.finish();
                Ok(())
            })?;
        }
        data_builder.read_other_byte_vec(reader)?;
        data_builder.finish();
        // Outer now fully initialized.
        Ok(())
    }
}

This could alternatively be written like the following (rather than init_<field>_with function), where nested struct is manually marked as initialized:

#[derive(SchemaRead, SchemaWrite)]
#[wincode(struct_extensions)]
struct Bytes {
    bytes: Vec<u8>,
    ar: Vec<[u8; 32]>,
}

#[derive(SchemaWrite, SchemaRead)]
#[wincode(struct_extensions)]
struct Data {
    bytes: Bytes,
    other_byte_vec: Vec<u8>,
}

struct Outer {
    data: Data,
}

impl<'de> SchemaRead<'de> for Outer {
    type Dst = Outer;

    fn read(reader: &mut impl Reader<'de>, dst: &mut MaybeUninit<Self::Dst>) -> ReadResult<()> {
        let data = unsafe { &mut *(&raw mut (*dst.as_mut_ptr()).data).cast::<MaybeUninit<Data>>() };
        let mut data_builder = DataUninitBuilder::from_maybe_uninit_mut(data);
        // Project `bytes` field of `Data` into a `&mut MaybeUninit<Bytes>`.
        let mut bytes_builder =
            BytesUninitBuilder::from_maybe_uninit_mut(data_builder.uninit_bytes_mut());
        // Individual fields of `Bytes` are marked as initialized as reading progresses.
        // Initialized fields are dropped in reverse order on error or panic.
        bytes_builder.read_bytes(reader)?.read_ar(reader)?;
        // `Bytes` is fully initialized, so we forget the builder.
        bytes_builder.finish();

        // `bytes` field of `Data` can now be marked as initialized
        // so that it's dropped if the next field fails to read.
        unsafe { data_builder.assume_init_bytes() };

        data_builder.read_other_byte_vec(reader)?;

        // `Data` is fully initialized, so we forget the builder.
        data_builder.finish();
        // Outer now fully initialized.
        Ok(())
    }
}

@cpubot cpubot requested a review from alessandrod December 1, 2025 20:03
let dst = get_src_dst(args);
let impl_generics = append_de_lifetime(&args.generics);
let (_, ty_generics, where_clause) = args.generics.split_for_impl();
let builder_ident = format_ident!("{struct_ident}UninitBuilder");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't polluting the namespace a concern here? should those builder be put into some wincode_gen mod-scope... just thinking, maybe it's not necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessarily a problem because the struct name is suffixed with UninitBuilder, so unlikely to conflict. This feature is also opt-in, and the docs specify that it will be added to the same namespace, so I think it's okay personally.

quote!(#builder_bit_set_ty::MAX)
} else {
let field_bits = LitInt::new(&fields.len().to_string(), Span::call_site());
quote!(((1 as #builder_bit_set_ty) << #field_bits) - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this could be calculated as fully compile-time constant, right?

let field_init_bits = LitInt::new(((1u128 << fields.len() - 1).to_string(), Span::call_site());
quote!((#field_init_bits as #builder_bit_set_ty))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be constant folded by the compiler https://rust.godbolt.org/z/qaPE5Krbj

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if you think this makes the library code simpler, though with explicit calculation we could get rid of the branch with quote!(#builder_bit_set_ty::MAX) and just have one code always calculating the constant

Copy link
Collaborator

@kskalski kskalski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, though I do still have some concerns about naming for generated code functions.

/// Assuming the source `MaybeUninit<T>` when the content is not yet fully initialized causes undefined behavior.
/// It is up to the caller to guarantee that the `MaybeUninit<T>` really is in an initialized state.
#[inline]
#vis const unsafe fn assume_init_forget(self) {
Copy link
Collaborator

@kskalski kskalski Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like this name either... since it doesn't actually run the assume_init* on underlying MaybeUninit, which the name might suggest.
Maybe it should be something like finish_init_forget or init_done_forget ?

We could make it return the underlying &mut MaybeUnnit and call it into_inner_done, which might be convenient in some cases (this way the other into_assume_init_mut would basically be equivalent to into_inner_done().assume_init_mut()), but also be a bit confusing if user just calls it ignoring the result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could technically just call this function forget and remove unsafe as a simple convenience over mem::forget. The user still has to assume_init_mut on the original reference or assume_init on the original MaybeUninit<T> to get access to T -- the contract of both of those methods requires that they ensure the underlying T is fully initialized.

I would also be fine with just unsafe finish. Finish implies that we're done with the builder, and the fact that it's unsafe requires the user to understand the safety assumptions.

Copy link
Collaborator

@kskalski kskalski Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish seems fine, not sure it really needs unsafe, since it doesn't do anything unsafe by itself and the user should be aware of need to initialize all parts of memory when starting to use the builder in the first place. It's also a call that always needs to be made unless into_assume_init_mut is used, so forcing to add another unsafe block seems off.

I would be in fact more concerned with how to ensure user do not forget to call finished.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for finish and no unsafe.

I don't think we need to ensure it's called?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll call it finish. I like that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to ensure it's called?

We actually do -- otherwise the drop logic will execute, which will call drop on all the fields. UB if the user then calls assume_init or any of the assume_init* variants


/// Write a value to the maybe uninitialized field.
#[inline]
#vis const fn #write_uninit_field_ident(&mut self, val: #target) -> &mut Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the choice of read_field_name and write_field_name names is a bit confusing here (though I guess it's already inherited from previous design of struct_extensions), since read operates on Reader, while write doesn't have anything to do with wincode io Writer.

I would definitely prefer to change write into something like store_field_name or init_field_name, which would also convey the message that field is not manually initialized

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think write is fine, it matches the semantics of MaybeUninit::write

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair argument, there are some issues with that though - MaybeUninit also has _read method, which means something completely different than read generated for builder.
I think because in context of wincode functions doing read and write have their specific meaning, we should try to avoid naming things with different purpose that way.

            builder
                .read_a(reader)?
                .read_b(reader)?
                .write_c(test.c);

is basically an instant head-scratcher... are we reading a few things into the builder and then writing out the builder contents to Writer stored in test.c ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this too in the past and I think I've discussed it with zach.
I think write should definitely be write, it's read I'm not sure about.

That said this PR is fixing a concrete bug, and I think we should land it. We
can improve the naming later imo, this is 0.2.2 we have a long way to 1.0

Copy link
Collaborator Author

@cpubot cpubot Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's confusing if you look at the type signature or read the function documentation. Would you prefer read_a_from or write_from_reader?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I would avoid the names that have overloaded meaning around the same API / snippets of code and in those cases would opt for init_a and init_a_from_reader.

alessandrod
alessandrod previously approved these changes Dec 4, 2025
Copy link
Collaborator

@alessandrod alessandrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving as this fixes a bug. I do think the naming can probably be improved, but I don't have good proposals at the moment, don't think that calling write store is right, and just generally think we can improve the naming as we keep iterating

/// Assuming the source `MaybeUninit<T>` when the content is not yet fully initialized causes undefined behavior.
/// It is up to the caller to guarantee that the `MaybeUninit<T>` really is in an initialized state.
#[inline]
#vis const unsafe fn assume_init_forget(self) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for finish and no unsafe.

I don't think we need to ensure it's called?


/// Write a value to the maybe uninitialized field.
#[inline]
#vis const fn #write_uninit_field_ident(&mut self, val: #target) -> &mut Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this too in the past and I think I've discussed it with zach.
I think write should definitely be write, it's read I'm not sure about.

That said this PR is fixing a concrete bug, and I think we should land it. We
can improve the naming later imo, this is 0.2.2 we have a long way to 1.0

alessandrod
alessandrod previously approved these changes Dec 4, 2025
kskalski
kskalski previously approved these changes Dec 4, 2025
Copy link
Collaborator

@kskalski kskalski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we should be open to naming adjustments in the future then. Otherwise LGTM

@cpubot cpubot dismissed stale reviews from kskalski and alessandrod via 6975db9 December 4, 2025 15:31
@cpubot cpubot merged commit 2e64626 into master Dec 4, 2025
2 checks passed
@cpubot cpubot deleted the struct_extensions_builder branch December 4, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants