Skip to content

Value error when loading saved DIEN model #511

@jefflao

Description

@jefflao

Describe the bug(问题描述)

I am training the DIEN model on a dataset with around 20 categorical features and 5 user behavior columns, all being strings. I was able to save the model with keras.save_model in .h5 format, but it throws the following error when I try to load the model with keras.load_model:

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/utils/generic_utils.py", line 668, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 670, in from_config
    input_tensors, output_tensors, created_layers = reconstruct_from_config(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 1298, in reconstruct_from_config
    process_node(layer, node_data)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 1244, in process_node
    output_tensors = layer(input_tensors, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer_v1.py", line 764, in __call__
    self._maybe_build(inputs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer_v1.py", line 2086, in _maybe_build
    self.build(input_shapes)
  File "/usr/local/lib/python3.8/dist-packages/deepctr/layers/sequence.py", line 255, in build
    raise ValueError('A `AttentionSequencePoolingLayer` layer requires '
ValueError: A `AttentionSequencePoolingLayer` layer requires inputs of a 3 tensor with shape (None,1,embedding_size),(None,T,embedding_size) and (None,1) Got different shapes: [TensorShape([None, 15, 35]), TensorShape([None, 1, 35]), TensorShape([None, 1])]

This seems to be an issue in the model reconstruction when calling load_model(). More details in the additional content section.

To Reproduce(复现步骤)

Model:

    model = DIEN(
        feature_columns,
        behavior_feat_list,
        dnn_hidden_units=[256, 128, 64],
        dnn_dropout=0.5,
        gru_type='AUGRU',
        use_negsampling=False,
        att_activation='sigmoid',
    )
    model.compile(Adam(learning_rate=1e-5), 'binary_crossentropy', metrics=['binary_crossentropy'])

Train and save model:

    history = model.fit(train_inputs,
                        'click',
                        verbose=True,
                        epochs=1,
                        batch_size=32,
                        validation_split='\t'
            )
    save_model(
        model,
        'dien.h5',
        save_format='h5',
    )

Load model (the part that raises the exception):

    from deepctr.layers import custom_objects
    loaded_model = load_model('dien.h5', custom_objects)

Operating environment(运行环境):

  • python version: 3.8
  • tensorflow version: 2.2-2.5. Encounter compatibility from numpy and TF for TF version >= 2.6
  • deepctr version: 0.9.3
  • CUDA version: 11.7
  • NVIDIA driver version: 515.65.01
  • base docker image: tensorflow/tensorflow-2.5.1-gpu

Additional context

I could not try tensorflow older than 2.2 due to driver compatibility issues. DeepCTR also doesn't work with 2.6 <= TF <= 2.11.

My model has the following structure (from `model.summary()):

genre (InputLayer)              [(None, 1)]          0                                            
__________________________________________________________________________________________________
hist_genre (InputLayer)         [(None, 15)]         0                                            
__________________________________________________________________________________________________
...
hash_28 (Hash)                  (None, 1)            0           genre[0][0]                      
__________________________________________________________________________________________________
hash_15 (Hash)                  (None, 1)            0           genre[0][0]                      
__________________________________________________________________________________________________
hash_3 (Hash)                   (None, 15)           0           hist_genre[0][0]                 
__________________________________________________________________________________________________
...
sparse_seq_emb_hist_genre (Embe multiple             404         hash_3[0][0]                     
                                                                 hash_15[0][0]                    
                                                                 hash_28[0][0]                    
__________________________________________________________________________________________________
concat (Concat)                 (None, 15, 35)       0           sparse_seq_emb_hist_category[0][0
                                                                 sparse_seq_emb_hist_channel[0][0]
                                                                 sparse_seq_emb_hist_episode[0][0]
                                                                 sparse_seq_emb_hist_genre[0][0]  
                                                                 sparse_seq_emb_hist_part[0][0]   
                                                                 sparse_seq_emb_hist_feature0[0][
__________________________________________________________________________________________________
seq_length (InputLayer)         [(None, 1)]          0    

__________________________________________________________________________________________________
gru1 (DynamicGRU)               (None, 15, 35)       7455        concat[0][0]                     
                                                                 seq_length[0][0]                 
__________________________________________________________________________________________________
concat_2 (Concat)               (None, 1, 35)        0           sparse_seq_emb_hist_category[2][0
                                                                 sparse_seq_emb_hist_channel[2][0]
                                                                 sparse_seq_emb_hist_episode[2][0]
                                                                 sparse_seq_emb_hist_genre[2][0]  
                                                                 sparse_seq_emb_hist_part[2][0]   
                                                                 sparse_seq_emb_hist_feature0[2][
__________________________________________________________________________________________________
attention_sequence_pooling_laye (None, 1, 15)        10081       concat_2[0][0]                   
                                                                 gru1[0][0]                       
                                                                 seq_length[0][0]              
...

Following the stack track and a lot of extra debug messages, I believe load_model does not pass the inputs to embedding layers in the same order as the original model when reconstructing the model. Namely in tensorflow/python/keras/engine/functional.py , reconstruct_from_config(config, custom_objects, created_layers) builds the layers whenever all its inputs is ready. As a result, the outputs of the embedding layers in reconstructed model such as sparse_seq_emb_hist_genre could end up having the embedded historical behavior sequence (of shape (None, 15)) before the embedded sparse feature (of shape (None, 1)), i.e. output[0] is the embedded behavior sequence instead of output[1].

Multiple hash layers for the same input are also created when the model initializes the key embedding and query embedding for the attention layer due to a lack of sharing mechanism. This likely does not create a real issue as the two hashes should be identical.

I was able to make a work around by changing the order of the embedding look-up initialization in the dien model deepctr/models/sequence/dien.py:

    keys_emb_list = embedding_lookup(embedding_dict, features, history_feature_columns,
                                     return_feat_list=history_fc_names, to_list=True)
    dnn_input_emb_list = embedding_lookup(embedding_dict, features, sparse_feature_columns,
                                          mask_feat_list=history_feature_list, to_list=True)
    # Move query embeddings from the first being initialized to the last.
    query_emb_list = embedding_lookup(embedding_dict, features, sparse_feature_columns,
                                      return_feat_list=history_feature_list, to_list=True)

This modification is definitely not safe. Please let me know if anyone has a better solution. Thank you in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions