could you tell me why just take 32 channels feature map as attention map for fusioning?
could you tell me why just take 32 channels feature map as attention map for fusioning?