Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference

Problem

BERT의 intermediate layer들에도 각각 다른 semantic knowledge가 담겨있지 않을까? 기존의 BERT based 모델들에서는 이 정보들을 잘 활용하지 못했다.

Solution Idea

각 단계의 transformer layer들의 출력물들을 pooling module을 거쳐 하나의 representation으로 합산하고 classifier를 거쳐 최종 polarity를 구한다.

ABSA 뿐 아니라 NLI(Natural Language Inference) task에도 활용해봄으로서 intermediate layer가 모두 semantic understanding에 도움이 된다는 것을 보인다.

Model

[Figure1] Overview of the proposed BERT-LSTM model.

Intermediate layer들을 활용하는 것의 핵심은 pooling module을 하나 달아주는 것이다. 보통의 BERT 내부는 12개의 레이어로 이뤄져 있고 모두 [CLS] 토큰을 하나씩 출력하게 된다. 이걸 하나의 문장 상징 sequence로 인식하고 pooling할 수 있는 모듈에 넣는다.

$h_{cls} = \{ h_{cls}^1, h_{cls}^2, ..., h_{cls}^L \}$

LSTM-Pooling: $o = h_{LSTM}^L = \overrightarrow {LSTM}(h_{CLS}^i), i \in [1, L]$
Attention Pooling: $o = W_h^T softmax(qh_{CLS}^T) h_{CLS}$ (q와 W는 learnable weights)

Pooling module을 거친 후 $y = softmax(W_o^T o + b_o)$

input에 aspect를 어떻게 알려주는지?

Sentence2 자리에 aspect word를 넣어준다. [github]

Experiment

Dataset

SemEval-2014 task4와 ACL 14 Twitter dataset