Skip to content

[Question] 文本分类中的CNN开头的模型accuracy不管换数据集还是调参数都只有0.2 #488

@hwq458362228

Description

@hwq458362228

You must follow the issue template and provide as much information as possible. otherwise, this issue will be closed.
请按照 issue 模板要求填写信息。如果没有按照 issue 模板填写,将会忽略并关闭这个 issue

Check List

Thanks for considering to open an issue. Before you submit your issue, please confirm these boxes are checked.

You can post pictures, but if specific text or code is required to reproduce the issue, please provide the text in a plain text format for easy copy/paste.

Environment

  • OS [e.g. Mac OS, Linux]: Win10
  • Python Version: 3.7
  • requirements.txt: TensorFlow 2.3 kashgari 2.0.1
[Paste requirements.txt file here]

Question

不管是使用SMP2018ECDTCorpus还是自己的数据集,在使用CNN开头的系列文本分类模型时,这个accuracy都不行,也试过改变学习率和epoch等参数,但是没啥作用,不知道不是这些模型本身有问题

from kashgari.corpus import SMP2018ECDTCorpus
from kashgari.tasks.classification import CNN_Model
from kashgari.callbacks import EvalCallBack

import logging
logging.basicConfig(level='DEBUG')

train_x, train_y = SMP2018ECDTCorpus.load_data('train')
valid_x, valid_y = SMP2018ECDTCorpus.load_data('valid')
test_x, test_y = SMP2018ECDTCorpus.load_data('test')

model = CNN_Model()
model.fit(train_x, train_y, valid_x, valid_y,batch_size=64,epochs=14)
model.evaluate(test_x,test_y,batch_size=64)

运行结果:
2022-04-14 18:08:55,276 [DEBUG] kashgari - loaded 1881 samples from C:\Users\hwq45.kashgari\datasets\SMP2018ECDTCorpus\train.csv. Sample:
x[0]: ['打', '开', '河', '南', '英', '东', '网', '站']
y[0]: website
2022-04-14 18:08:55,280 [DEBUG] kashgari - loaded 418 samples from C:\Users\hwq45.kashgari\datasets\SMP2018ECDTCorpus\valid.csv. Sample:
x[0]: ['来', '一', '首', ',', '灵', '岩', '。']
y[0]: poetry
2022-04-14 18:08:55,284 [DEBUG] kashgari - loaded 770 samples from C:\Users\hwq45.kashgari\datasets\SMP2018ECDTCorpus\test.csv. Sample:
x[0]: ['给', '曹', '广', '义', '打', '电', '话']
y[0]: telephone
Preparing text vocab dict: 100%|██████████| 1881/1881 [00:00<00:00, 943831.30it/s]
Preparing text vocab dict: 100%|██████████| 418/418 [00:00<00:00, 416936.76it/s]
2022-04-14 18:08:55,291 [DEBUG] kashgari - --- Build vocab dict finished, Total: 875 ---
2022-04-14 18:08:55,291 [DEBUG] kashgari - Top-10: ['[PAD]', '[UNK]', '[CLS]', '[SEP]', '的', '么', '我', '。', '怎', '你']
Preparing classification label vocab dict: 100%|██████████| 1881/1881 [00:00<?, ?it/s]
Preparing classification label vocab dict: 100%|██████████| 418/418 [00:00<?, ?it/s]
Calculating sequence length: 100%|██████████| 1881/1881 [00:00<00:00, 1894234.29it/s]
Calculating sequence length: 100%|██████████| 418/418 [00:00<00:00, 419430.40it/s]
2022-04-14 18:08:55,309 [DEBUG] kashgari - Calculated sequence length = 15
2022-04-14 18:08:55,337 [DEBUG] kashgari - Model: "functional_43"


Layer (type) Output Shape Param #

input (InputLayer) [(None, None)] 0


layer_embedding (Embedding) (None, None, 100) 87500


conv1d_6 (Conv1D) (None, None, 128) 64128


global_max_pooling1d_4 (Glob (None, 128) 0


dense_14 (Dense) (None, 64) 8256


dense_15 (Dense) (None, 31) 2015


activation_10 (Activation) (None, 31) 0

Total params: 161,899
Trainable params: 161,899
Non-trainable params: 0


Epoch 1/14
29/29 [==============================] - 0s 8ms/step - loss: 3.3098 - accuracy: 0.1735 - val_loss: 3.1836 - val_accuracy: 0.1901
Epoch 2/14
29/29 [==============================] - 0s 5ms/step - loss: 3.0778 - accuracy: 0.1992 - val_loss: 3.0883 - val_accuracy: 0.1953
Epoch 3/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0232 - accuracy: 0.1992 - val_loss: 3.0700 - val_accuracy: 0.2005
Epoch 4/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0164 - accuracy: 0.1987 - val_loss: 3.0591 - val_accuracy: 0.1901
Epoch 5/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0395 - accuracy: 0.1943 - val_loss: 3.0622 - val_accuracy: 0.1979
Epoch 6/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0327 - accuracy: 0.2003 - val_loss: 3.0659 - val_accuracy: 0.1875
Epoch 7/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0361 - accuracy: 0.1948 - val_loss: 3.0711 - val_accuracy: 0.1953
Epoch 8/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0347 - accuracy: 0.1987 - val_loss: 3.0581 - val_accuracy: 0.1901
Epoch 9/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0155 - accuracy: 0.1981 - val_loss: 3.0576 - val_accuracy: 0.2005
Epoch 10/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0415 - accuracy: 0.2036 - val_loss: 3.0651 - val_accuracy: 0.1953
Epoch 11/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0296 - accuracy: 0.1992 - val_loss: 3.0850 - val_accuracy: 0.1849
Epoch 12/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0132 - accuracy: 0.2053 - val_loss: 3.0643 - val_accuracy: 0.1953
Epoch 13/14
29/29 [==============================] - 0s 4ms/step - loss: 3.0523 - accuracy: 0.1899 - val_loss: 3.0639 - val_accuracy: 0.2005
Epoch 14/14
29/29 [==============================] - 0s 4ms/step - loss: 3.7734 - accuracy: 0.2075 - val_loss: 3.0653 - val_accuracy: 0.2031

Metadata

Metadata

Assignees

Labels

questionFurther information is requestedwontfixThis will not be worked on

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions