当前位置：网站首页>CIKM 2020 | a hierarchical query graph generation method for complex questions in knowledge base
CIKM 2020 | a hierarchical query graph generation method for complex questions in knowledge base
2020-11-10 13:38:50 【osc_sg74u54s】
PaperWeekly original · author ｜ Comfort is constant
School ｜ Master degree student of Nanjing University
Research direction ｜ Knowledge map
This article discusses knowledge base question and answer （KBQA） The reinforcement learning method of query graph generation in . Given a natural language problem , Knowledge base Q & A attempts to automatically get answers from the knowledge base that stores relationships between entities . For a complex problem , Query graph generation is a common method based on semantic parsing . among , Query graph is an abstract graph representation of a problem .
The author thinks that the existing methods usually rely on limited rules , It makes them unable to deal with more complex problems . In this paper, we propose a method similar to ” The director - actor - Film critics “ Framework , To overcome this problem . among , The director determines the type of triples needed for the query graph , Actors generate the corresponding triples by selecting nodes and edges , The reviewer calculates the similarity between the generated triples and the given question .
Usually , The supervision information of knowledge base Q & a mainly includes natural language questions and their answers in the knowledge base , However, the multi-step query graph generation method is difficult to include supervision information in each step of the intermediate process , It is usually Weak supervision Of . therefore , The author puts the algorithm framework on hierarchical reinforcement learning .
Paper title ：
Hierarchical Query Graph Generation for Complex Question Answering over Knowledge Graph
Thesis link ：
2.1 Query graph
Query graph is a specific class of graph representation λ- calculus , Can be translated into an executable query language , for example SPARQL. In the query graph designed by the author , Triples are divided into five types ：
Basics （basic）： The side of a triple is KG The predicate in （ Relationship ）, The first and last entities of a triple are KG Node in （ Linked or not linked to the knowledge base ）
Merge （union）： The side is also KG The predicate in , But two of these nodes can be connected by more than one edge , Between triples through Logic or Connected to a
Filter （filter）： Numerical or time comparison , Include <, ≤, >, ≥, =, ≠ Such as relationship
Ordinal number （ordinal）： The head entities need to be sorted , Edges indicate that the sort is ascending or descending
polymerization （aggregation）： Edges represent aggregate functions , Include count、limit etc.
Through these types of triples , The generated query graph has a basic processing sort 、 The ability to count and so on . As mentioned later “ Options ”, That is, the model selects one of these triples to generate .
2.2 “ The director - actor - Film critics ” frame
The common form of reinforcement learning is Markov decision-making process （MDP）, It is agent （agent） And Environmental Science （environment） Interaction process of . among , Agents are learners and decision makers , The environment contains all the elements except the agent .
Because the knowledge base is made up of triples , The author describes query graph generation as about Options （option） Of MDP, Each option represents generating a triplet of the corresponding type . Different from traditional MDP, Through the options , Agents can choose multiple actions at a time .
The framework consists of three modules , The author compares them to “ The director ”、“ actor ” and “ Film critics ”：
The director According to the current state , Select an option , Determine what type of triples is required .
actor According to the current state And options , Choose an action . Options It will be retained in the next few time steps , Until it's realized .
Film critics Calculate the similarity between the newly generated triples and the problem , To be responsible for assessing Whether it is realized or not
And by choosing these options （ That is, the option will stay in the following time step ）, Execute to terminate state , The resulting decision-making process , It's called a semi Markov decision process （Semi-MDP, SMDP）. The main components of this decision process include ：
Time step The state of It's a tuple , among It's a given problem , In time step The decision history of . For the director , The decision history is the current query graph ; For actors , It contains the current selection of The original action （primitive actions）.
An option contains three components ：
Initial state set , Contains a state in which options can be initialized
Strategy within options , Select the original action for an option
Termination conditions Specifies whether the option is complete
If you start an option , The original action will be based on Selected , Until the options are based on End .
2.2.3 The original action
For each option , There are three types of primitive actions ：1） In an existing query graph , Select an existing node ;2） According to the options , Add a predicate or edge ;3） Connect the newly generated node and the existing node
For different triples options , The author imposes different constraints on the original action .
2.2.4 Reward function
Because only the final answer is weak supervision , Predicting the answer F1 Value is used as an external reward , And only triple options, not original actions, are rewarded .
2.3 Constant node Links
A constant node represents a linked KG Entity or type , It can be in the form of a given time or numerical information in a problem . For a given natural language problem , The method in this paper first identifies such constant nodes .
Entity link ： With the help of S-MART （ A famous knowledge base entity linking tool ） Generate （mention, entity） Yes .
Type link ： adopt Stanford CoreNLP toolkit  Mark the part of speech for each word in the question , Extract all nouns and proper nouns . contain KG Words of type synonyms are chosen to construct （mention, type） Yes .
Time and value links ： Use CoreNLP Named entity recognition tool , Get candidate time nodes or numerical nodes .
2.4 Problem and graph coding
For problem coding , The author uses a two-way GRU The network transforms natural language problems into vectors .
For graph coding , The author considers that the query graph is dynamically updated , It may be too expensive to encode query graph using graph neural network , Because we need to calculate the adjacency matrix every time .
The graph encoder used by the author consists of two modules ：
1 individual LSTM The network encodes all triples in the query graph respectively ; A query graph can be considered as a triple
transformer The model captures the interactions between triples , And in the case of no adjacency matrix , Generate a representation of the entire graph
In order to get a graphical representation of , All triples are encoded . Encoding different triples LSTM It's independent , The relationship between triples is not explicitly represented . To obtain the combined semantics of multiple triples from the global ,transformer Is applied to the representation of triples . Through the mechanism of self attention , The encoder weighs the importance of each triplet , And update the representation of all triples in the query graph .
What is worth discussing here is , Whether word embedding or knowledge mapping embedding using pre training can be compared with using GRU Learning embeddedness for better performance .
2.5 Hierarchical decision making
The search strategies of directors and actors are learned through status information and decision history . At each time step , The director accepts the State , And according to the calculated probability distribution, choose the option . Each option is assigned a dense embedding vector , The option space is obtained by stacking the embedded encoding of all available options .
When the director chooses the option , The actor chooses a series of original actions to complete the options . The decision history contains the current query graph and options The original action selected in . The action space is obtained by stacking the embedded codes of all available actions .
When a triplet is generated and added to the query graph , Options End , The director chooses a new option . When the decision process is complete , The answer can be obtained from the generated query graph . The algorithm framework is as follows 1 Shown .
The author in WebQuestions And two namesake ComplexQuestions Experiments on data sets , Some of the algorithms are listed at the end of the paper “ Extended reading ” in . In this paper, the experiment , The algorithm achieves SOTA.
In the test question ,56.8% There is a triplet option in the problem （ The query graph contains a triple ）, Other problems contain two or more triples , And F1 It is more stable with the number of options , It shows that it has a certain ability to deal with complex problems .
The progress of artificial intelligence from perception to cognition , Need more research on Knowledge Mapping reasoning . This paper adopts the idea of strengthening learning , Improve the current query graph generation method for complex problems .
This paper proposes three modules , The director 、 Actors and critics , In fact, it corresponds to the hierarchical decision process of using reinforcement learning to generate query graph of complex problems ：1） Select the option to execute ;2） Select the actions in the options ;3） Determine whether the option is complete .
And execute an option （option）, It essentially corresponds to the process of adding the information of a single triple to the query graph . For one MDP, It may be unrealistic to deal with the whole triplet directly and completely in one step , And only deal with a single node or edge , You may not be able to associate actions well with triples .
The latter is the defect of most existing methods . Through the options - action （option-action） This hierarchical decision-making process , You can split the processing of a triplet into three actions （ It can be respectively corresponding to s、p、o） Realization .
The external reward of the reinforcement learning algorithm , From the triple option instead of the original action , To a certain extent, it solves the problem that there is no relationship between modeling actions in the previous methods , That is, the action about a triplet is associated with .
Personally think that , The triples that make up a query graph , In fact, there must be an explicitly representable Association , At least the query graph cannot contain some disconnected triples . And the graph representation method used in this paper , The relationship between triples is only through transformer Capture , It seems difficult to explain the association between triples .
in addition , Knowledge mapping embeds （KGE） Has been applied to triad classification （triple classification） Tasks , Whether to introduce... In the graph coding part KGE Ability to enhance the model to judge whether a triplet is reasonable or not ？
The author focuses on the graph representation of query graph and obtains SOTA, But few studies seem to focus on the design of problem representation , How to design a problem representation method more suitable for query graph generation , Maybe it's an open question .
some time , The author plans to study the mitigation of errors caused by external linking tools , And knowledge base Q & A , How to identify the problem of KG Relationships are still a challenge .
 CIKM 2020 | Hierarchical Query Graph Generation for Complex Question Answering over Knowledge Graph
 ACL 2015 | S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking
 ACL 2014 | The Stanford CoreNLP Natural Language Processing Toolkit
Query graph generation of knowledge base Q & A ：
STAGG,ACL 2015,Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowl- edge Base
QUINT,WWW 2017 | Automated Template Generation for Question Answering over Knowledge Graphs
STF,EMNLP 2018,A State-transition Framework to Answer Complex Questions over Knowledge Base
CompQA,EMNLP 2018 | Knowledge Base Question Answering via Encoding of Complex Query Graphs
NFF,TKDE 2018 | Answering natural language questions by subgraph matching over knowledge graphs
Tree2Seq,Neurocomputing 2020,Knowledge-based question answering by tree-to-sequence learning
# cast draft through Avenue #
Let your paper be seen by more people
How to make more high-quality content reach the reader group in a shorter path , How about reducing the cost of finding quality content for readers ？ The answer is ： People you don't know .
There are always people you don't know , Know what you want to know .PaperWeekly Maybe it could be a bridge , Push different backgrounds 、 Scholars and academic inspiration in different directions collide with each other , There are more possibilities .
PaperWeekly Encourage university laboratories or individuals to , Share all kinds of quality content on our platform , It can be Interpretation of the latest paper , It can also be The learning or Technical dry cargo . We have only one purpose , Let knowledge really flow .
???? Standards for contributions ：
• The manuscript is really personal Original works , The author's personal information should be indicated in the manuscript （ full name + School / Work unit + Education / Position + Research direction ）
• If the article is not the first one , Please remind and attach all published links to your contributions
• PaperWeekly By default, every article is the first , Will add “ original ” sign
???? Send email ：
• Send email ：firstname.lastname@example.org
• All articles are illustrated , Please send it separately in the attachment
• Please leave an instant contact （ Wechat or mobile phone ）, So that we can communicate with the author when editing and Publishing
Now? , stay 「 You know 」 We can also be found
Go to Zhihu home page and search 「PaperWeekly」
Click on 「 Focus on 」 Subscribe to our column
PaperWeekly It's a recommendation 、 Reading 、 Discuss 、 An academic platform for reporting the achievements of the frontier papers on artificial intelligence . If you study or engage in AI field , Welcome to clicking on the official account 「 Communication group 」, The little assistant will take you into PaperWeekly In the communication group .
- C++ 数字、string和char*的转换
- Won the CKA + CKS certificate with the highest gold content in kubernetes in 31 days!
- C + + number, string and char * conversion
- C + + Learning -- capacity() and resize() in C + +
- C + + Learning -- about code performance optimization
C + + programming experience (6): using C + + style type conversion
Latest party and government work report ppt - Park ppt
Online ID number extraction birthday tool
Field pointer? Dangling pointer? This article will help you understand!
GVRP of hcna Routing & Switching
- LeetCode 91. 解码方法
- Seq2seq implements chat robot
- [chat robot] principle of seq2seq model
- Leetcode 91. Decoding method
- HCNA Routing＆Switching之GVRP
- GVRP of hcna Routing & Switching
- HDU7016 Random Walk 2
- [Code+＃1]Yazid 的新生舞会
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- HDU7016 Random Walk 2
- [code + 1] Yazid's freshman ball
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- Qt Creator 自动补齐变慢的解决
- HALCON 20.11：如何处理标定助手品质问题
- HALCON 20.11：标定助手使用注意事项
- Solution of QT creator's automatic replenishment slowing down
- Halcon 20.11: how to deal with the quality problem of calibration assistant
- Halcon 20.11: precautions for use of calibration assistant
- "Top ten scientific and technological issues" announced| Young scientists 50 ² forum
- Reverse linked list
- JS data type
- Remember the bug encountered in reading and writing a file
- Singleton mode
- 在这个 N 多编程语言争霸的世界，C++ 究竟还有没有未来？
- In this world of N programming languages, is there a future for C + +?
- js Promise
- js 数组方法 回顾
- ES6 template characters
- js Promise
- JS array method review
- 【Golang】️走进 Go 语言️ 第一课 Hello World
- [golang] go into go language lesson 1 Hello World