An Advantage Actor-Critic Algorithm with Confidence Explorat… — Dose of AI

Open Information Extraction (OIE) is a task of generating the structured representations of information from natural language sentences. Recently years, many works have trained an End-to-End OIE extractor based on Sequence-to-Sequence (Seq2Seq) model and applied Reinforce Algorithm to update the model. However, the model performance often suffers from a large training variance and limited exploration. This paper introduces a reinforcement learning framework that enables an Advantage Actor-Critic (AAC) algorithm to update the Seq2Seq model with samples from a novel Conﬁdence Exploration (CE). The AAC algorithm reduces the training variance with a ﬁne-grained evaluation of each individual word. The conﬁdence exploration provides effective training samples by exploring the word at key positions. Empirical evaluations demonstrate the leading performance of our Advantage Actor-Critic algorithm and Conﬁdence Exploration over other comparison methods.

An Advantage Actor-Critic Algorithm with Confidence Exploration for Open Information Extraction

Abstract