什么是Slot Filling?

Slot Filling是自然语言理解中的一个基本问题,是对语言含义的简单化处理,它的思想类似于语言学中框架主义的一派,先设定好特定的语言类型槽,再将输入的单词一一填入槽内,而获取言语含义的时候即是根据语义槽的含义进行提取和检索。我们这里的任务就是将表示定购航班(ATIS数据集)这一言语行为的一系列语句填入各种类型的语义槽中。

为什么使用SimpleRNN?

Slot Filling属于RNN应用中一对一的应用,通过训练模型,每个词都能被填到合适的槽中。
RNN和一般的神经网络的不同在于,在RNN中,我们在时间t的输出不仅取决于当前的输入和权重,还取决于之前的输入,而对于其他神经网络模型,每个时刻的输入和输出都是独立而随机的,没有相关性。放到我们要处理语义理解的问题上看,语言作为一种基于时间的线性输出,显然会受到前词的影响,因此我们选取RNN模型来进行解决这个问题。
这里选取SimpleRNN,是因为这个RNN比较简单,能达到熟悉框架的练习效果,之后可以选取其他有效的RNN模型,如LSTMS进行优化。

构建思路一览:

  • 载入数据,使用的是chsasank修改的mesnilgr的load.py。
  • 定义模型。采取Keras中的序列模型搭建,首先使用一个100维的word embedding层将输入的单词转化为高维空间中的一个向量(在这个空间中,语义和语法位置越近的单词的距离越小),然后我们构建一个dropout层防止过拟合,设置SimpleRNN层,设置TimeDistributed层以完成基于时间的反向传播。最后我们将这些层组织在一起,并确定optimizer和loss function。我们选取的optimizer是rmsprop,这样在训练后期依然能找到较有项,而选取categorical_crossentropy作为损失函数,则是因为处理的问题性质适合于此。
  • 训练模型。出于对计算资源的考虑,我们一般使用minibtach的方法批量对模型进行训练。但是我们这里的数据是一句句话,如果按照一个固定的batch_size将其分裂,可能增加了不必要的联系(因为上下两句话是独立的),因此我们将一句话作为一个batch去进行训练、验证以及预测,并手动算出一个epoch的平均误差。
  • 评估和预测模型。我们通过观察验证误差和预测F1精度来对模型进行评估。预测F1精度使用的是signsmile编写的conlleval.py。
  • 保存模型。
import numpy as np
import pickle
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import SimpleRNN
from keras.layers.core import Dense,Dropout
from keras.utils import to_categorical
from keras.layers.wrappers import TimeDistributed
from matplotlib import pyplot as plt

import data.load
from metrics.accuracy import evaluate
Using TensorFlow backend.

Load Data

train_set,valid_set,dicts = data.load.atisfull()
# print(train_set[:1])
# dicts = {'label2idx':{},'words2idx':{},'table2idx':{}}
w2idx,labels2idx = dicts['words2idx'],dicts['labels2idx']
train_x,_,train_label = train_set
val_x,_,val_label = valid_set
idx2w = {w2idx[i]:i for i in w2idx}
idx2lab = {labels2idx[i]:i for i in labels2idx}
n_classes = len(idx2lab)
n_vocab = len(idx2w)
words_train = [[idx2w[i] for i in w[:]] for w in train_x]
labels_train = [[idx2lab[i] for i in w[:]] for w in train_label]

words_val = [[idx2w[i] for i in w[:]] for w in val_x]
# labels_val = [[idx2lab[i] for i in w[:]] for w in val_label]
labels_val =[]
for w in val_label:
    for i in w[:]:
        labels_val.append(idx2lab[i])

print('Real Sentence : {}'.format(words_train[0]))
print('Encoded Form : {}'.format(train_x[0]))
print('='*40)
print('Real Label : {}'.format(labels_train[0]))
print('Encoded Form : {}'.format(train_label[0]))
    
    
Real Sentence : ['i', 'want', 'to', 'fly', 'from', 'boston', 'at', 'DIGITDIGITDIGIT', 'am', 'and', 'arrive', 'in', 'denver', 'at', 'DIGITDIGITDIGITDIGIT', 'in', 'the', 'morning']
Encoded Form : [232 542 502 196 208  77  62  10  35  40  58 234 137  62  11 234 481 321]
========================================
Real Label : ['O', 'O', 'O', 'O', 'O', 'B-fromloc.city_name', 'O', 'B-depart_time.time', 'I-depart_time.time', 'O', 'O', 'O', 'B-toloc.city_name', 'O', 'B-arrive_time.time', 'O', 'O', 'B-arrive_time.period_of_day']
Encoded Form : [126 126 126 126 126  48 126  35  99 126 126 126  78 126  14 126 126  12]

Define and Compile the model

model = Sequential()
model.add(Embedding(n_vocab,100))
model.add(Dropout(0.25))
model.add(SimpleRNN(100,return_sequences=True))
model.add(TimeDistributed(Dense(n_classes,activation='softmax')))
model.compile(optimizer = 'rmsprop',loss = 'categorical_crossentropy')
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, None, 100)         57200     
_________________________________________________________________
dropout_1 (Dropout)          (None, None, 100)         0         
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, None, 100)         20100     
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 127)         12827     
=================================================================
Total params: 90,127
Trainable params: 90,127
Non-trainable params: 0
_________________________________________________________________

Train the model

def train_the_model(n_epochs,train_x,train_label,val_x,val_label):
    epoch,train_avgloss,val_avgloss,f1s = [],[],[],[]
    for i in range(1,n_epochs+1):
        epoch.append(i)
        
        ## training
        train_avg_loss =0
        
        for n_batch,sent in enumerate(train_x):
            label = train_label[n_batch]
            # label to one-hot
            label = to_categorical(label,num_classes=n_classes)[np.newaxis,:]
            sent = sent[np.newaxis,:]
            loss = model.train_on_batch(sent,label)
            train_avg_loss += loss
            
        train_avg_loss = train_avg_loss/n_batch
        train_avgloss.append(train_avg_loss)
        
        ## evaluate&predict
        val_pred_label,pred_label_val,val_avg_loss  = [],[],0
        
        for n_batch,sent in enumerate(val_x):
            label = val_label[n_batch]
            label = to_categorical(label,num_classes=n_classes)[np.newaxis,:]
            sent = sent[np.newaxis,:]
            loss = model.test_on_batch(sent,label)
            val_avg_loss += loss
            
            pred = model.predict_on_batch(sent)
            pred = np.argmax(pred,-1)[0]
            val_pred_label.append(pred)
            
        val_avg_loss = val_avg_loss/n_batch
        val_avgloss.append(val_avg_loss)
        

        for w in val_pred_label:
            for k in w[:]:
                pred_label_val.append(idx2lab[k])
            
        prec, rec, f1 = evaluate(labels_val,pred_label_val, verbose=False)
        print('Training epoch {}\t train_avg_loss = {} \t val_avg_loss = {}'.format(i,train_avg_loss,val_avg_loss))
        print('precision: {:.2f}% \t recall: {:.2f}% \t f1 :{:.2f}%'.format(prec,rec,f1))
        print('-'*60)
        f1s.append(f1)

        
#     return epoch,pred_label_train,train_avgloss,pred_label_val,val_avgloss
    return epoch,f1s,val_avgloss,train_avgloss
epoch,f1s,val_avgloss,train_avgloss = train_the_model(40,train_x,train_label,val_x,val_label)

输出:

    Training epoch 1     train_avg_loss = 0.5546463992293973     val_avg_loss = 0.4345020865901363
    precision: 84.79%    recall: 80.79%      f1 :82.74%
    ------------------------------------------------------------
    Training epoch 2     train_avg_loss = 0.2575569036037627     val_avg_loss = 0.36228470020366654
    precision: 86.64%    recall: 83.86%      f1 :85.22%
    ------------------------------------------------------------
    Training epoch 3     train_avg_loss = 0.2238766908014994     val_avg_loss = 0.33974187403771694
    precision: 88.03%    recall: 85.55%      f1 :86.77%
    ------------------------------------------------------------
  ……
       ------------------------------------------------------------
    Training epoch 40    train_avg_loss = 0.09190682124901069    val_avg_loss = 0.2697056618613356
    precision: 92.51%    recall: 91.47%      f1 :91.99%
    ------------------------------------------------------------

可视化

观察验证误差,选取合适的epoch。

%matplotlib inline
plt.xlabel=('epoch')
plt.ylabel=('loss')
plt.plot(epoch,train_avgloss,'b')
plt.plot(epoch,val_avgloss,'r',label=('validation error'))
plt.show()

mark

print('最大f1值为 {:.2f}%'.format(max(f1s)))
最大f1值为 92.56%

保存模型

model.save('slot_filling_with_simpleRNN.h5')

结果分析

使用SimpleRNN最终得到的F1值为92.56%,和师兄的95.47%相比确实还相差很多。这主要是和我们模型的选取有关,SimpleRNN只能将前词的影响带入到模型中,但是语言中后词对前词也会有一定的影响,因此可以通过选择更加复杂的模型或者增加能够捕捉到后词信息的层来进行优化。

参考资料