当前位置:网站首页>CIKM 2020 | a hierarchical query graph generation method for complex questions in knowledge base

CIKM 2020 | a hierarchical query graph generation method for complex questions in knowledge base

2020-11-10 13:38:50 osc_sg74u54s

PaperWeekly original ·  author | Comfort is constant

School | Master degree student of Nanjing University

Research direction | Knowledge map

Reading guide

This article discusses knowledge base question and answer (KBQA) The reinforcement learning method of query graph generation in . Given a natural language problem , Knowledge base Q & A attempts to automatically get answers from the knowledge base that stores relationships between entities . For a complex problem , Query graph generation is a common method based on semantic parsing . among , Query graph is an abstract graph representation of a problem .

The author thinks that the existing methods usually rely on limited rules , It makes them unable to deal with more complex problems . In this paper, we propose a method similar to ” The director - actor - Film critics “ Framework , To overcome this problem . among , The director determines the type of triples needed for the query graph , Actors generate the corresponding triples by selecting nodes and edges , The reviewer calculates the similarity between the generated triples and the given question .

Usually , The supervision information of knowledge base Q & a mainly includes natural language questions and their answers in the knowledge base , However, the multi-step query graph generation method is difficult to include supervision information in each step of the intermediate process , It is usually Weak supervision Of . therefore , The author puts the algorithm framework on hierarchical reinforcement learning .

Paper title :

Hierarchical Query Graph Generation for Complex Question Answering over Knowledge Graph

Thesis link :

http://www.bigdatalab.ac.cn/~jinxiaolong/publications/CIKM2020QiuZ.pdf

Method

2.1 Query graph

Query graph is a specific class of graph representation λ- calculus , Can be translated into an executable query language , for example   SPARQL. In the query graph designed by the author , Triples are divided into five types :

  • Basics (basic): The side of a triple is KG The predicate in ( Relationship ), The first and last entities of a triple are KG Node in ( Linked or not linked to the knowledge base )

  • Merge (union): The side is also KG The predicate in , But two of these nodes can be connected by more than one edge , Between triples through Logic or Connected to a

  • Filter (filter): Numerical or time comparison , Include <, ≤, >, ≥, =, ≠ Such as relationship

  • Ordinal number (ordinal): The head entities need to be sorted , Edges indicate that the sort is ascending or descending

  • polymerization (aggregation): Edges represent aggregate functions , Include count、limit etc.

Through these types of triples , The generated query graph has a basic processing sort 、 The ability to count and so on . As mentioned later “ Options ”, That is, the model selects one of these triples to generate .

2.2 “ The director - actor - Film critics ” frame

The common form of reinforcement learning is Markov decision-making process (MDP), It is agent (agent) And Environmental Science (environment) Interaction process of . among , Agents are learners and decision makers , The environment contains all the elements except the agent .

Because the knowledge base is made up of triples , The author describes query graph generation as about Options (option) Of MDP, Each option represents generating a triplet of the corresponding type . Different from traditional MDP, Through the options , Agents can choose multiple actions at a time .

The framework consists of three modules , The author compares them to “ The director ”、“ actor ” and “ Film critics ”:

  • The director According to the current state , Select an option , Determine what type of triples is required .

  • actor According to the current state And options , Choose an action . Options It will be retained in the next few time steps , Until it's realized .

  • Film critics Calculate the similarity between the newly generated triples and the problem , To be responsible for assessing Whether it is realized or not

And by choosing these options ( That is, the option will stay in the following time step ), Execute to terminate state , The resulting decision-making process , It's called a semi Markov decision process (Semi-MDP, SMDP). The main components of this decision process include :

2.2.1 state

Time step The state of It's a tuple , among It's a given problem , In time step The decision history of . For the director , The decision history is the current query graph ; For actors , It contains the current selection of The original action (primitive actions).

2.2.2 Options

An option contains three components :

  • Initial state set , Contains a state in which options can be initialized

  • Strategy within options , Select the original action for an option

  • Termination conditions Specifies whether the option is complete

If you start an option , The original action will be based on Selected , Until the options are based on End .

2.2.3 The original action

For each option , There are three types of primitive actions :1) In an existing query graph , Select an existing node ;2) According to the options , Add a predicate or edge ;3) Connect the newly generated node and the existing node

For different triples options , The author imposes different constraints on the original action .

2.2.4 Reward function

Because only the final answer is weak supervision , Predicting the answer F1 Value is used as an external reward , And only triple options, not original actions, are rewarded .

2.3 Constant node Links

A constant node represents a linked KG Entity or type , It can be in the form of a given time or numerical information in a problem . For a given natural language problem , The method in this paper first identifies such constant nodes .

  • Entity link : With the help of S-MART [2]( A famous knowledge base entity linking tool ) Generate (mention, entity) Yes .

  • Type link : adopt Stanford CoreNLP toolkit [3] Mark the part of speech for each word in the question , Extract all nouns and proper nouns . contain KG Words of type synonyms are chosen to construct (mention, type) Yes .

  • Time and value links : Use CoreNLP Named entity recognition tool , Get candidate time nodes or numerical nodes .

2.4 Problem and graph coding

For problem coding , The author uses a two-way GRU The network transforms natural language problems into vectors .

For graph coding , The author considers that the query graph is dynamically updated , It may be too expensive to encode query graph using graph neural network , Because we need to calculate the adjacency matrix every time .

The graph encoder used by the author consists of two modules :

  • 1 individual LSTM The network encodes all triples in the query graph respectively ; A query graph can be considered as a triple

  • transformer The model captures the interactions between triples , And in the case of no adjacency matrix , Generate a representation of the entire graph

In order to get a graphical representation of , All triples are encoded . Encoding different triples LSTM It's independent , The relationship between triples is not explicitly represented . To obtain the combined semantics of multiple triples from the global ,transformer Is applied to the representation of triples . Through the mechanism of self attention , The encoder weighs the importance of each triplet , And update the representation of all triples in the query graph .

What is worth discussing here is , Whether word embedding or knowledge mapping embedding using pre training can be compared with using GRU Learning embeddedness for better performance .

2.5 Hierarchical decision making

The search strategies of directors and actors are learned through status information and decision history . At each time step , The director accepts the State , And according to the calculated probability distribution, choose the option . Each option is assigned a dense embedding vector , The option space is obtained by stacking the embedded encoding of all available options .

When the director chooses the option , The actor chooses a series of original actions to complete the options . The decision history contains the current query graph and options The original action selected in . The action space is obtained by stacking the embedded codes of all available actions .

When a triplet is generated and added to the query graph , Options End , The director chooses a new option . When the decision process is complete , The answer can be obtained from the generated query graph . The algorithm framework is as follows 1 Shown .

experiment

The author in WebQuestions And two namesake ComplexQuestions Experiments on data sets , Some of the algorithms are listed at the end of the paper “ Extended reading ” in . In this paper, the experiment , The algorithm achieves SOTA.

In the test question ,56.8% There is a triplet option in the problem ( The query graph contains a triple ), Other problems contain two or more triples , And F1 It is more stable with the number of options , It shows that it has a certain ability to deal with complex problems .

Summary

The progress of artificial intelligence from perception to cognition , Need more research on Knowledge Mapping reasoning . This paper adopts the idea of strengthening learning , Improve the current query graph generation method for complex problems .

This paper proposes three modules , The director 、 Actors and critics , In fact, it corresponds to the hierarchical decision process of using reinforcement learning to generate query graph of complex problems :1) Select the option to execute ;2) Select the actions in the options ;3) Determine whether the option is complete .

And execute an option (option), It essentially corresponds to the process of adding the information of a single triple to the query graph . For one MDP, It may be unrealistic to deal with the whole triplet directly and completely in one step , And only deal with a single node or edge , You may not be able to associate actions well with triples .

The latter is the defect of most existing methods . Through the options - action (option-action) This hierarchical decision-making process , You can split the processing of a triplet into three actions ( It can be respectively corresponding to s、p、o) Realization .

The external reward of the reinforcement learning algorithm , From the triple option instead of the original action , To a certain extent, it solves the problem that there is no relationship between modeling actions in the previous methods , That is, the action about a triplet is associated with .

Personally think that , The triples that make up a query graph , In fact, there must be an explicitly representable Association , At least the query graph cannot contain some disconnected triples . And the graph representation method used in this paper , The relationship between triples is only through transformer Capture , It seems difficult to explain the association between triples .

in addition , Knowledge mapping embeds (KGE) Has been applied to triad classification (triple classification) Tasks , Whether to introduce... In the graph coding part KGE Ability to enhance the model to judge whether a triplet is reasonable or not ?

The author focuses on the graph representation of query graph and obtains SOTA, But few studies seem to focus on the design of problem representation , How to design a problem representation method more suitable for query graph generation , Maybe it's an open question .

some time , The author plans to study the mitigation of errors caused by external linking tools , And knowledge base Q & A , How to identify the problem of KG Relationships are still a challenge .

reference

[1] CIKM 2020 | Hierarchical Query Graph Generation for Complex Question Answering over Knowledge Graph

[2] ACL 2015 | S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

[3] ACL 2014 | The Stanford CoreNLP Natural Language Processing Toolkit

Extended reading

Query graph generation of knowledge base Q & A :

  • STAGG,ACL 2015,Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowl- edge Base

  • QUINT,WWW 2017 | Automated Template Generation for Question Answering over Knowledge Graphs

  • STF,EMNLP 2018,A State-transition Framework to Answer Complex Questions over Knowledge Base

  • CompQA,EMNLP 2018 | Knowledge Base Question Answering via Encoding of Complex Query Graphs

  • NFF,TKDE 2018 | Answering natural language questions by subgraph matching over knowledge graphs

  • Tree2Seq,Neurocomputing 2020,Knowledge-based question answering by tree-to-sequence learning

Read more

# cast draft   through Avenue #

  Let your paper be seen by more people  

How to make more high-quality content reach the reader group in a shorter path , How about reducing the cost of finding quality content for readers ? The answer is : People you don't know .

There are always people you don't know , Know what you want to know .PaperWeekly Maybe it could be a bridge , Push different backgrounds 、 Scholars and academic inspiration in different directions collide with each other , There are more possibilities . 

PaperWeekly Encourage university laboratories or individuals to , Share all kinds of quality content on our platform , It can be Interpretation of the latest paper , It can also be The learning or Technical dry cargo . We have only one purpose , Let knowledge really flow .

????  Standards for contributions :

• The manuscript is really personal Original works , The author's personal information should be indicated in the manuscript ( full name + School / Work unit + Education / Position + Research direction ) 

• If the article is not the first one , Please remind and attach all published links to your contributions  

• PaperWeekly By default, every article is the first , Will add “ original ” sign

????  Send email :

• Send email :hr@paperweekly.site 

• All articles are illustrated , Please send it separately in the attachment  

• Please leave an instant contact ( Wechat or mobile phone ), So that we can communicate with the author when editing and Publishing

????

Now? , stay 「 You know 」 We can also be found

Go to Zhihu home page and search 「PaperWeekly」

Click on 「 Focus on 」 Subscribe to our column

About PaperWeekly

PaperWeekly It's a recommendation 、 Reading 、 Discuss 、 An academic platform for reporting the achievements of the frontier papers on artificial intelligence . If you study or engage in AI field , Welcome to clicking on the official account 「 Communication group 」, The little assistant will take you into PaperWeekly In the communication group .

版权声明
本文为[osc_sg74u54s]所创,转载请带上原文链接,感谢