Skip to content

Conversation

@lzaeh
Copy link
Contributor

@lzaeh lzaeh commented Oct 13, 2025

Why?

Previously, my understanding of the type system was insufficient, which led to confusion between array and slice usage. As a result, the deserialization step of the slice serializer did not work properly in cross-language scenarios due to missing alignment rules (for example, when the types were structurally identical and should have taken the fast path, the previous version failed to correctly interpret the byte stream).

What does this PR do?

Serializer-related changes:

  • Adjusted the slice serializer to better align serialization and deserialization rules.
  • Standardized the type system to ensure that the slice serializer is consistently used where appropriate, eliminating mixed usage with array serializers.
  • Moved the serializer for primitive type arrays into primitive_array.go. If you think it’s not appropriate, I can revert the change.

Test-related changes:

  • Re-enabled previously commented-out test cases in commonSlice/Array.
  • Moved the [n]String test out of commonArray. Although it can be correctly deserialized into []String, the current test logic does not treat [n]String and []String as equivalent.
  • Updated several primitive array tests in TestXLangSerializer to use array types instead of slices.
  • Modified the check function in TestXLangSerializer to deserialize using the precise reflected type instead of interface{}.
    Rationale:
    • For example, []interface{}{"a", "b", "c"} and []string{"a", "b", "c"} are considered the same type and written via the fast path, resulting in identical byte sequences.
    • However, during deserialization in Go, the default behavior is to use []interface{} unless the target type is explicitly provided.(Just my opinion)
    • This means that if the test expects []string{}, it would fail. While we could force construction of a []T{} when the same type is detected, if the test’s starting point is []interface{}, it would still fail. Therefore, the deserialization target type is now explicitly matched via reflection.

Known Issues

  • After pulling the latest code (which includes modifications related to field order hashing in structs), the complex struct tests that passed 10 mins ago now fail due to Python-side deserialization errors.

    • A. Direct testing shows identical hashes, but deserialization in Python triggers an “invalid ID (id = 0)” error in the listSerializer—possibly related to changes in this PR.
    • B. Simplifying the struct on both the Python and Go sides (by removing fields) caused the hash values to differ.
      I will follow up on this issue to determine whether the cause lies in the list serializer or incomplete handling of struct field order consistency.
  • Cross-language list serialization tests (serializing in one language, deserializing in another—Python ↔ Go, Java ↔ Go) revealed that Go’s int and float64 types cannot be correctly deserialized in the other languages.

    • All other basic types work fine, and Go can correctly deserialize bytes from both Python and Java.
    • When encountering floating-point values, Go must use float32 to deserialize successfully.
  • The slice serializer currently mimics Python’s fast-path logic, but the matching rules for integer and floating-point types are somewhat ambiguous. My understanding here may be incomplete, and this part likely needs further refinement.

Related issues

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

@lzaeh lzaeh requested a review from chaokunyang as a code owner October 13, 2025 10:58
@pandalee99 pandalee99 self-requested a review October 13, 2025 11:15
@chaokunyang
Copy link
Collaborator

@lzaeh could you merge main branch first? there are some conflict

r = reflect.New(type_).Elem()

case reflect.Interface:
arrT := reflect.ArrayOf(length, reflect.TypeOf(int8(0))) // [length]int8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't use [n]T for deserialization, the n used by user is fixed at compile-time, but the array length can be different. We should use slice []numeric for primitive array type in fory type system.

For golang serialization, we just don't allow [n]T being deserialized. Or only allow [n]T being serialized, but this seems useless. If you can't deserialzie it, serialize it seems unmeaningful too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for telling me this.

Copy link
Contributor Author

@lzaeh lzaeh Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't use [n]T for deserialization, the n used by user is fixed at compile-time, but the array length can be different. We should use slice []numeric for primitive array type in fory type system.

For golang serialization, we just don't allow [n]T being deserialized. Or only allow [n]T being serialized, but this seems useless. If you can't deserialzie it, serialize it seems unmeaningful too

Sorry for the delay—I just wrapped up my graduation stuff. If all arrays are deserialized into slices, should arrays inside structs also be deserialized as slices? If so, I think the tests will need to be updated as well. I was considering this while planning comprehensive tests for the contents of commonArray.@chaokunyang

@lzaeh
Copy link
Contributor Author

lzaeh commented Oct 15, 2025

@lzaeh could you merge main branch first? there are some conflict

i will try asap tomorrow, sorry for the delay.

@lzaeh lzaeh closed this Oct 20, 2025
@lzaeh lzaeh force-pushed the fix/sliceSerializer-Partial branch from 597f2ae to 390b623 Compare October 20, 2025 14:53
@lzaeh lzaeh reopened this Oct 20, 2025
@lzaeh
Copy link
Contributor Author

lzaeh commented Oct 22, 2025

Scope of this change

This change removes array support only along the usage path, reflected in:

  1. getTypeInfo behavior: except for []byte (which still uses its concrete type to enable zero-copy, out-of-band storage optimizations, etc.), all other []T are treated as lists.

  2. Reflect checks: code paths that handled reflect.Array have been removed. Since arrays aren’t supported, they’re considered unavailable to users.

  3. User-supplied arrays: if a user forces an array, per (1) the system reports an error indicating [N]T is not supported.

What I did not change

I’m unsure whether we should purge array-related constructs from the type system itself. Fully removing them would touch a lot of code. Keeping the definitions around but unused might serve as a transitional/contingency option, even if arrays are effectively deprecated right now.
That said, if we do want to remove arrays from the type system entirely, let me know and I’ll follow up.

Cross-language test adjustments

For TestXLangSerializer: since other languages may interact with the same unit as Python, I removed the array-related code paths on both the Go and Python sides within this test. I then added a separate test_cross_language_serializer_go case in Python’s cross-language tests so that Go still covers this part. If this setup isn’t appropriate, please let me know.

Remaining work (Relatively higher-priority tasks, Just my opinion)

Update the struct field ordering logic accordingly. After updating the Python module, two struct tests failed due to hash mismatches. I haven’t changed the handling of complex structs, which means the code path for fields that are arrays within complex structs remains untouched—and this test includes arrays. If I tweak just one spot now, it’ll be hard to localize the issue. It’s better to make the corresponding changes after I spend time learning the new sorting/ordering optimization algorithm, so we can update everything in sync.

@chaokunyang

@pandalee99
Copy link
Contributor

Update the struct field ordering logic accordingly. After updating the Python module, two struct tests failed due to hash mismatches. I haven’t changed the handling of complex structs, which means the code path for fields that are arrays within complex structs remains untouched—and this test includes arrays. If I tweak just one spot now, it’ll be hard to localize the issue. It’s better to make the corresponding changes after I spend time learning the new sorting/ordering optimization algorithm, so we can update everything in sync.

Recently, we have updated the hash calculation algorithm. Maybe this PR(#2814) can solve this problem



@cross_language_test
def test_cross_language_serializer_go(data_file_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need a new test for go specially?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use same tests as java used. Otherwise, we still don't get full xlang serialization features

@chaokunyang
Copy link
Collaborator

chaokunyang commented Nov 11, 2025

Could you refer to both java and rust implementation? In fory rust, both array and vec are supported, they can all be serialized as a list, and deserialize a list into target type. We should do similiar thing in go too.

For fory rust, you can refer to list.rs and array.rs.

@lzaeh
Copy link
Contributor Author

lzaeh commented Nov 14, 2025

Could you refer to both java and rust implementation? In fory rust, both array and vec are supported, they can all be serialized as a list, and deserialize a list into target type. We should do similiar thing in go too.

For fory rust, you can refer to list.rs and array.rs.

Sorry—I was tied up with some real-life errands and didn’t see this message in time.

I’ve implemented a version along the lines you suggested by removing the array serializers. Details:

(1) Primitive arrays: unless the user provides a specific destination type (only array or slice are considered valid), deserialization checks the type and length and deserializes accordingly; otherwise (e.g., when the target is interface{}) we always deserialize to a slice.
This is reflected in some local tests (still supporting array as a receiver, and when the receiver is interface{} we convert and compare the produced slice against the expected array), as well as in the cross-language test(TestXLangSerializer) where arrays are deserialized as slices.

(2) I think the support in (1) is still necessary. While we don’t recommend arrays to users, if a struct field is defined as an array (rather than interface{}), deserialization will work fine because of (1).

(3) One confusing point: the complex-struct tests had been failing for a while. For various reasons I kept pulling the latest code and reinstalling pyfory. I did the same this time, and before I had a chance to dig into the complex-struct issue, the tests started passing again.

I only opened GitHub after finishing this part of the code—bad workflow on my side T_T . I’ll follow your suggestions going forward. Thanks!

@lzaeh
Copy link
Contributor Author

lzaeh commented Nov 14, 2025

Some files only appeared as modified when I synced to the new version—I didn’t actually change them. They won't be treated as new code during the merge.

I’ll keep studying those two files and figure out suitable approach for Go. This version is only based on earlier discussion and hasn’t yet incorporated learnings from the other language implementations. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants