Skip to content

Conversation

@yjmade
Copy link

@yjmade yjmade commented Nov 17, 2024

Issue

A subtle bug was discovered in the NuScenes dataset loader where the canbus input for the BEV head incorrectly assigns quaternion values for ego vehicle rotation. Specifically, when assigning the quaternion (positions 4-8), all components receive the same value as rotation.w.

Reproduction Example

In [1]: import pyquaternion

In [2]: import numpy as np
   ...: 

In [3]: q=pyquaternion.Quaternion([1,0,0,0])

In [4]: b=np.zeros([10])

In [5]: b
Out[5]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [6]: b[3:7]=q

In [7]: b
Out[7]: array([0., 0., 0., 1., 1., 1., 1., 0., 0., 0.])

Fix

The solution is to use the quaternion's underlying numpy array for assignment:

In [8]: b[3:7]=q.q

In [9]: b
Out[9]: array([0., 0., 0., 1., 0., 0., 0., 0., 0., 0.])

Why the Bug Remained Undetected

Two main factors contributed to the delayed discovery:

  1. The canbus data includes yaw angles (in degrees and radians) as the last two values, which are the most critical values for planning within rotation
  2. Since all dataset items contained the same error, models overfit to this incorrect data pattern

This allowed the model to still function in open-loop evaluation scenarios where the same dataloader was used.

Discovery Process

We tried to integrate the VAD model into our simulation pipeline but faced an issue: no matter how we supplied the rotation data from the simulator, the model couldn't generate the correct turning trajectory. To troubleshoot, we inspected the training pipeline line by line and discovered the problem. We confirmed that the VAD model behaves correctly only when we input 'w, w, w, w' as the rotation in the canbus during simulation.

Impact

We conclude that this issue will not affact the open-loop evaluation. But if people try to use the model in the close-loop environments, like simualtion or real vehicle, it will show up.

We traced the source of this issue, and it's first appearance is well-adopted BEVFormer code base. And this issue still existed there. Which means all work derived from BEVFormer share the same problem.

Required Actions

  1. Existing checkpoints will not work correctly
  2. Models need to be retrained with the fixed dataloader
  3. Systems using this in closed-loop environments need to be updated

@kaitolucifer
Copy link

@yjmade
Hi, Do you know wherecan_bus[3:7] is being used?
it seems like only can_bus[:3], can_bus[-1] and can_bus[-2] is being used, so a wrong can_bus[3:7] probably doesn't affect anything?

@kaitolucifer
Copy link

Ok, I found it.
Except using can_bus as information for pre/post processing, it is also treat as a input data of neural network.
So it make sense when close loop evaluation in simulator won't work.
can_bus = self.can_bus_mlp(can_bus)[None, :, :]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants