Value error when loading saved DIEN model

**Describe the bug(问题描述)**

I am training the DIEN model on a dataset with around 20 categorical features and 5 user behavior columns, all being strings. I was able to save the model with `keras.save_model` in `.h5` format, but it throws the following error when I try to load the model with `keras.load_model`:

```
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/utils/generic_utils.py", line 668, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 670, in from_config
    input_tensors, output_tensors, created_layers = reconstruct_from_config(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 1298, in reconstruct_from_config
    process_node(layer, node_data)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 1244, in process_node
    output_tensors = layer(input_tensors, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer_v1.py", line 764, in __call__
    self._maybe_build(inputs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer_v1.py", line 2086, in _maybe_build
    self.build(input_shapes)
  File "/usr/local/lib/python3.8/dist-packages/deepctr/layers/sequence.py", line 255, in build
    raise ValueError('A `AttentionSequencePoolingLayer` layer requires '
ValueError: A `AttentionSequencePoolingLayer` layer requires inputs of a 3 tensor with shape (None,1,embedding_size),(None,T,embedding_size) and (None,1) Got different shapes: [TensorShape([None, 15, 35]), TensorShape([None, 1, 35]), TensorShape([None, 1])]
```

This seems to be an issue in the model reconstruction when calling `load_model()`. More details in the additional content section.

**To Reproduce(复现步骤)**

Model:
```
    model = DIEN(
        feature_columns,
        behavior_feat_list,
        dnn_hidden_units=[256, 128, 64],
        dnn_dropout=0.5,
        gru_type='AUGRU',
        use_negsampling=False,
        att_activation='sigmoid',
    )
    model.compile(Adam(learning_rate=1e-5), 'binary_crossentropy', metrics=['binary_crossentropy'])
```
Train and save model:
```
    history = model.fit(train_inputs,
                        'click',
                        verbose=True,
                        epochs=1,
                        batch_size=32,
                        validation_split='\t'
            )
    save_model(
        model,
        'dien.h5',
        save_format='h5',
    )
```
Load model (the part that raises the exception):
```
    from deepctr.layers import custom_objects
    loaded_model = load_model('dien.h5', custom_objects)
```

**Operating environment(运行环境):**
 - python version: 3.8
 - tensorflow version: 2.2-2.5. Encounter compatibility from numpy and TF for TF version >= 2.6
 - deepctr version: 0.9.3
 - CUDA version: 11.7
 - NVIDIA driver version: 515.65.01
 - base docker image: `tensorflow/tensorflow-2.5.1-gpu`

**Additional context**

I could not try tensorflow older than 2.2 due to driver compatibility issues. DeepCTR also doesn't work with 2.6 <= TF <= 2.11.

My model has the following structure (from `model.summary()):

```
genre (InputLayer)              [(None, 1)]          0                                            
__________________________________________________________________________________________________
hist_genre (InputLayer)         [(None, 15)]         0                                            
__________________________________________________________________________________________________
...
hash_28 (Hash)                  (None, 1)            0           genre[0][0]                      
__________________________________________________________________________________________________
hash_15 (Hash)                  (None, 1)            0           genre[0][0]                      
__________________________________________________________________________________________________
hash_3 (Hash)                   (None, 15)           0           hist_genre[0][0]                 
__________________________________________________________________________________________________
...
sparse_seq_emb_hist_genre (Embe multiple             404         hash_3[0][0]                     
                                                                 hash_15[0][0]                    
                                                                 hash_28[0][0]                    
__________________________________________________________________________________________________
concat (Concat)                 (None, 15, 35)       0           sparse_seq_emb_hist_category[0][0
                                                                 sparse_seq_emb_hist_channel[0][0]
                                                                 sparse_seq_emb_hist_episode[0][0]
                                                                 sparse_seq_emb_hist_genre[0][0]  
                                                                 sparse_seq_emb_hist_part[0][0]   
                                                                 sparse_seq_emb_hist_feature0[0][
__________________________________________________________________________________________________
seq_length (InputLayer)         [(None, 1)]          0    

__________________________________________________________________________________________________
gru1 (DynamicGRU)               (None, 15, 35)       7455        concat[0][0]                     
                                                                 seq_length[0][0]                 
__________________________________________________________________________________________________
concat_2 (Concat)               (None, 1, 35)        0           sparse_seq_emb_hist_category[2][0
                                                                 sparse_seq_emb_hist_channel[2][0]
                                                                 sparse_seq_emb_hist_episode[2][0]
                                                                 sparse_seq_emb_hist_genre[2][0]  
                                                                 sparse_seq_emb_hist_part[2][0]   
                                                                 sparse_seq_emb_hist_feature0[2][
__________________________________________________________________________________________________
attention_sequence_pooling_laye (None, 1, 15)        10081       concat_2[0][0]                   
                                                                 gru1[0][0]                       
                                                                 seq_length[0][0]              
...
```

Following the stack track and a lot of extra debug messages, I believe `load_model` does not pass the inputs to embedding layers in the same order as the original model when reconstructing the model. Namely in `tensorflow/python/keras/engine/functional.py` , `reconstruct_from_config(config, custom_objects, created_layers)` builds the layers whenever all its inputs is ready. As a result, the outputs of the embedding layers in reconstructed model such as `sparse_seq_emb_hist_genre` could end up having the embedded historical behavior sequence (of shape `(None, 15)`) before the embedded sparse feature (of shape `(None, 1)`), i.e. `output[0]` is the embedded behavior sequence instead of `output[1]`.

Multiple hash layers for the same input are also created when the model initializes the key embedding and query embedding for the attention layer due to a lack of sharing mechanism. This likely does not create a real issue as the two hashes should be identical.

I was able to make a work around by changing the order of the embedding look-up initialization in the dien model `deepctr/models/sequence/dien.py`:

```
    keys_emb_list = embedding_lookup(embedding_dict, features, history_feature_columns,
                                     return_feat_list=history_fc_names, to_list=True)
    dnn_input_emb_list = embedding_lookup(embedding_dict, features, sparse_feature_columns,
                                          mask_feat_list=history_feature_list, to_list=True)
    # Move query embeddings from the first being initialized to the last.
    query_emb_list = embedding_lookup(embedding_dict, features, sparse_feature_columns,
                                      return_feat_list=history_feature_list, to_list=True)
```

This modification is definitely not safe. Please let me know if anyone has a better solution. Thank you in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value error when loading saved DIEN model #511

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Value error when loading saved DIEN model #511

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions