当前位置:网站首页>Wanzi review: the latest development of industry knowledge mapping

Wanzi review: the latest development of industry knowledge mapping

2020-12-08 10:11:16 osc_ otuqqtuq

author | Li Jingyang [1], Niu Guanglin [2], Tang Chengguang [1], Yu Haiyang [1], Li Yang [1], Fu Bin [1], Sun Jian [1]

Company | Alibaba - Hands on the court - mistress Conversational AI The team [1], School of computer science, Beijing University of Aeronautics and Astronautics [2]

Abstract

Industry knowledge map is the cornerstone of intelligent application of industry cognition . At present, in most of the subdivided vertical domains , Industry knowledge map of schema Build on the heavy involvement of domain experts , This mode has high labor cost , The construction period is long , At the same time, in the absence of large-scale supervised data, the effect of information extraction is not good , This limits the implementation of the industry knowledge map and reduces the acceptance of the map .

This paper deals with the problems mentioned above schema The latest technology development related to low resource extraction difficulty is organized and analyzed , It includes our semi-automatic schema The practice of construction , At the same time Document AI And long structured language model in document level information extraction technology analysis and discussion , Hope to bring some inspiration and help to the research work of peers .

introduction

The development path of AI technology from computing to perception to cognition has become the consensus of most AI research and application experts . Machines have cognitive intelligence , And then reasoning 、 inductive 、 Making decisions and even creating , To some extent, it needs a brain full of knowledge . Knowledge map [4, 18, 19], As a more and more popular formal description framework of semantic knowledge in the Internet era , It has become an important way to promote the development of artificial intelligence from perception to cognition .

The application of knowledge mapping is very extensive now : In the general field ,Google、 Baidu and other search companies use it to provide intelligent search services ,IBM Waston Q & a robot 、 Apple Siri Voice assistant and Wolfram Alpha All use graphs to understand problems 、 Reasoning and Q & A ; In every vertical area , Industry data is also rapidly evolving from large-scale data to mapping knowledge , And based on the industry knowledge in the form of atlas , For smart customer service 、 Intelligent decision making 、 Intelligent marketing and other intelligent services to enable .

The knowledge map question and answer system developed by Alibaba cloud Xiaomi team mainly serves government affairs 、 Operator, 、 insurance 、 Taxation 、 education 、 Medical and other fields . In the application practice of knowledge mapping in these industries , We found that the industry mapping faces the following challenges :

 

  • Manual schema It's hard to build : Industry knowledge map schema Building is often undertaken by business experts who are more familiar with the business . Although business experts are better at business , But it's against the atlas and schema The understanding and use of the concept has a lot of start-up costs , This directly leads to business experts can not quickly Abstract organization from their own business knowledge to summarize the atlas to meet the application requirements schema;

  • Low resource information extraction is difficult : It is different from the large-scale supervised data resources accumulated in the general field , Most of the segmented vertical domains have limited supervised resources for information extraction . How to have supervision resources limited circumstances , How to improve the efficiency and performance of triple extraction from the perspective of model and industry data , Is the core challenge of industry information extraction .

Besides , More and more application scenarios of vertical domain mapping are based on documents as direct source data , How to effectively parse various types of document data , And reasonable design Document level information extraction Model , It also occupies a more and more core position among the many challenges in the construction of industry map .

In the following section ( See the picture below ), This article first introduces schema The key technologies we're building are KBQA It is used to assist business experts in landing schema Semi automatic building schema Introduction to the construction scheme ; Then, it describes the challenges faced by the entity recognition module and the corresponding technical solutions , Mainly from the domain knowledge into 、 Semi supervised learning and complex entity recognition are discussed from three aspects ;

Relation extraction part , This article from remote monitoring 、 Small sample learning 、 The key challenges and existing solutions are described from the perspectives of entity relationship extraction and text level relationship extraction . Combined with the actual needs of the business , At the end of the paper, the new challenges of document level information extraction are proposed and the potential solutions are discussed .

schema structure

Knowledge map schema Building is the first step in building a knowledge map , But at the same time, it is also one of the links that affect the rapid progress of the project . In the process of application based on knowledge map in various industries , Most industries have not been exposed to the knowledge map , So there is no precipitation of knowledge in the industry schema To build the industry map .

At the same time, because the concept of knowledge map is relatively new , Industry business experts need an understanding to skillfully build schema The process of , And this process often needs the frequent intervention of algorithmists . In this way, in a new industry map spectrum related applications , According to our project experience , complete schema Building often requires weekly or even monthly time units .

In the application of new industry landing map , To save the map schema Time and labor costs to build , We need a semi-automatic schema The solution to build , So that schema The time complexity of the build is reduced to a day level unit of time . In terms of information extraction technology , Facing a new industry , The business knowledge is characterized by its openness and independence from past domain knowledge , Therefore, we draw lessons from the field of open information extraction (OpenIE) Some of the technologies and ideas in order to meet our needs .

therefore , Later in this section , We're going to talk about OpenIE Some of the technological advances in , And it's semi-automatic for us schema The construction of the algorithm to explore the introduction .

1.1 Open information extraction

1.1.1 brief introduction

Open information extraction (OpenIE) It means that the machine reads 、 Integrate and comb open free text without fixed entity and relationship type , Automatically extract structured knowledge from it . In general ,OpenIE It includes open entity recognition and open entity relationship extraction . because schema Building involves entities and relationships , therefore , there OpenIE Open entity relation extraction .

for instance ,OpenIE From the sentence “ Alibaba is a technology company headquartered in Hangzhou, China ” To take out (“ Alibaba ”,“ The headquarters is set up in ”,“ Hangzhou, China ”) and (“ Alibaba ”,“ yes ”,“ Science and technology company ”) Two triples . Usually ,OpenIE The general term for the extracted is SPO A triple , Respectively Subject, Predicate, Object.

Common data sets in this direction include FewRel [1,2],NYT-FB [6],OIE2016 [3] etc. , The evaluation index is based on the accuracy of prediction , Recall rate and F1 Value is the evaluation index .

 1.1.2 Model is introduced

(1) Classical extraction system

More classic OpenIE The system is basically based on the syntax and grammar rules of the sentence with the corresponding triple discriminator SPO extract . With TextRunner [5] For example , It is mainly divided into three steps :

1. Classifier training : Based on the grammar analysis, the noun phrase is obtained , In this paper, we take the words between phrases as the relations and filter the rules to construct the positive sample of triplet , Negative samples are constructed by random substitution , The Bayes classifier is trained by constructing features manually ;2. Preliminary extraction : As above, the noun phrases and relations in the sentence are extracted , According to the classifier, we can judge whether the extracted triples are credible ;3. Triple Filter : The extracted relations are normalized based on rules , And count the frequency of triples .

 

With the development of deep learning and the continuous enrichment of related data sets , In recent years ,OpenIE There are also some supervised and unsupervised methods based on deep learning .

(2) Unsupervised extraction

DRWE [7] Model ( See the picture below ) Open relationship identification using unsupervised method . say concretely , It uses some existing tools to identify the key entities and entity pairs and the shortest dependency path , Then combined with the pre training word vector 、 The shortest dependent path and entity type between entity pairs construct feature vector and carry out PCA Dimension reduction , Then the final relational clustering result is obtained by hierarchical clustering .

RSN Model [8] ( See the picture below ) On existing relational annotation data , be based on CNN The model trains the semantic matching model between sentences , The model is used to calculate the similarity matrix between sentences in test data , Then we use graph based clustering algorithm Louvain Clustering with unfixed clustering categories .RSN The model is semi supervised 、 Good results have been achieved in the relationship recognition task of remote supervision . This kind of model is limited by the existing entity recognition and syntactic analysis tools, or need a priori annotation data for more accurate clustering , And it only clusters the relationships, but does not extract them explicitly .

(3) Supervised extraction

RnnOIE [9]( See the picture below ) Take a supervised approach , take OpenIE Of SPO Extraction is modeled as Sequence tagging problem . say concretely , It carries out the word vector and part of speech vector of a word concat, Input to BiLSTM in , With a final softmax Output for label classification . In recent years bert The proposed , Large scale pre training model brings better generalization ability ,Span Select Methods , Because it can use more semantic information , It's starting to go beyond tradition OpenIE Based on CRF Related methods .

Because of the large-scale annotation data, it is difficult to obtain ,RnnOIE-SupervisedRL [73] Model ( See the picture below ) First of all, large-scale extraction based on syntactic and semantic rules is performed automatically , Train on this data RnnOIE Model , The preliminary extraction model is obtained . To enhance the accuracy of the model ,RnnOIE-SupervisedRL For the above preliminary extraction model , The training mechanism of reinforcement learning is used for further training , Its reward Is based on the head match The syntactic satisfaction of and based on Bert The product of semantic matching degree given by the pre training model of .

Experiments confirm that , The above model is in OIE2016 On dataset F1 Values are determined by 20.4% Promoted to 32.5%, The two sub models contribute about 4% and 8% The promotion of . The above model is currently considered by SPO The form is relatively simple , For complex situations ( If it contains a SP, Multiple O Sentences ) Further research is needed on the treatment of .

(4) Generative model

Neural OpenIE [11] take Encoder-Decoder Architecture introduces to OpenIE In the mission , Thus, the information extraction mode is transformed into the information generation mode . This mode can effectively solve implicit Predicate Extract questions , For example, from sentences “ Zhang San ,90 after , Love the second dimension ” To take out ( Zhang San , The year of birth ,90 after ), among “ The year of birth ” It's implicit Predicate. This kind of method faces the same difficulty of complex information extraction and information normalization as the previous supervised method .

               

1.2 semi-automatic schema structure

In question and answer based on knowledge map (KBQA) in , We have implemented semi-automatic based on questions schema structure . Take the provident fund scenario as an example , The map below shows the provident fund schema Part of , What the algorithm does is to extract from a large number of questions from users “ Accumulation fund ” by subject,“ hand in for safekeeping ”、“ extract ”、“ Unseal ” by   predicate.

At the same time, due to the fact that some compound type properties are involved (compound value type), such as “ extract ” Properties are compound class properties , Because of its restrictive properties “ Extraction site ” and “ The purpose of the provident fund is ”. As in the back be based on GNN Extraction graph of Shown , The algorithm extracts from the question set ( Accumulation fund , extract , Rental housing ), And then the business side check sum is further abstracted to ( Accumulation fund , extract , The purpose of the provident fund is ).

therefore , Finally, the algorithm will extract from the question subject, predicate and constaint   In the third part of , In the previous example “ Accumulation fund ”,“ extract ” and “ Rental housing ”.

Based on syntax pipeline The formula extraction

We use subject-predicate-constraint Of pipeline Extraction model , The logic of the scheme is roughly : First, the question text is clustered ( Unfixed number of clusters ), Then extract a triplet from each cluster ( Entity , Main attribute , Limiting conditions / Sub attribute values ), The entity , Main attribute , Limiting conditions / Sub attribute values are words or phrases , For example, triples ( Accumulation fund , extract , Rental housing ).

We first implement the dependency parsing as the core Deductive Extraction process ( As shown in the figure below ), It mainly includes hierarchical clustering , key word / Phrase extraction and alignment , Part of speech distribution induction ,Subject、Predicate、Constraint Extraction and other modules .

▲  Based on syntax pipeline Extract graph of form

be based on GNN Extraction of

We find that the above scheme does not consider the comprehensive relationship between various dependency syntactic logics well , And the generalization performance is limited . therefore , On the basis of the above plan , The structure of cluster graph is designed and implemented , And draw lessons from the knowledge map diagram convolution neural network method to carry on the modeling scheme . In order to achieve domain independent effects , Of nodes in graph structure embedding Representation is based on the position of words in the cluster vocabulary set onehot It means that generated .

In practical terms , be based on GNN Compared with the first version of the model, the model has better generalization and quasi recall . The figure below shows and ( Accumulation fund , extract , Rental housing ) A structured example of the related cluster graph .

▲ be based on GNN Extraction graph of

1.3 Summary

From the industry knowledge map schema Building starts , This section introduces open information extraction (OpenIE) And schema The relationship between the constructed , Also on OpenIE Based on rules in 、 This paper introduces the model based on supervision data and generative model . meanwhile , This section also introduces the application of KBQA scenario , from OpenIE inspire , Semi automatic based on user question schema A brief introduction to the construction algorithm .

Although we have realized the semi-automatic based on question schema A preliminary version of the build , But there are still many challenges and difficulties in the real landing , In the future, we may further explore in the following directions :

  • Complex samples , For example, a cluster contains a SP, Multiple O The circumstances of ;

  • This paper introduces the industry pre training language model to improve the generalization of the model ;

  • With the help of OpenIE In order to extract the attribute or condition information implied in the question , Such as “ I this year 56 了 , Can I buy Corning insurance ?” in “ I this year 56” The implied conditional information of is “ Age ”.

 

Knowledge map schema It is similar to the table name in the relational database and the column name in the table , Then you need to fill the table with real data . Because the map of knowledge is made up of ( Entity , Relationship , Entity ) Triple composition , Therefore, the key of subsequent construction is entity recognition and relationship extraction .

 

Entity recognition

2.1 brief introduction

Named entity recognition (Named Entity Recognition, abbreviation NER), It refers to the identification of entities with specific meanings in the text , Commonly used NER The entity types in the dataset mainly include person names 、 Place names 、 Organization name 、 Proper nouns, etc , And time 、 Number 、 currency 、 Scale value and so on . Named entities refer to things that can be identified with proper nouns , A named entity generally represents the only concrete individual , Including names of people 、 Place names, etc .

2.2 Data sets and metrics

Common Chinese NER Data sets Include ,OntoNotes4.0 [12],MSRA [13] and Weibo [14] etc. , The first two are extracted from news texts , The latter is extracted from social media . Commonly used English data sets are CoNLL2003 [15],ACE 2004 [16] and OntoNotes 5.0 [17] etc. . Want to know more about datasets , Please refer to [74].

stay Data tagging On , There are mainly BIO(Beginning、Inside、Outside) and BIOES(Beginning、Inside、End、Outside、Single) Two annotation systems . Besides , There is also an improved version annotation method for complex entity extraction , Will be in 2.4.4 Part of the introduction .

stay Model evaluation On , Because the recognition of named entity includes entity boundary and type recognition , So only when the boundary and type of an entity are correctly identified , Can be considered to be correctly identified . According to the different requirements for the accuracy of entity boundary prediction, it can be divided into Exact Match or Relaxed Match, And use accuracy , Recall rate and F1 Value to calculate the score . at present , be based on Exact Match Of micro The accuracy of , Recall rate and F1 Values are most commonly used .

2.3 challenges

at present , Named entity recognition faces the following challenges in the construction of industry knowledge map :

  • The corpus of vertical field tagging is few , It leads to the bad effect of the model

    There are many categories of vertical domain segmentation , When entering a new vertical domain , Often the available monitoring data are limited . On this basis, the recognition effect of the trained model is unsatisfactory .

  • The prior knowledge of vertical domain has not been effectively utilized

    If there is enough monitoring data , The amount of other types of prior knowledge in the industry is relatively large . But these industry data have not been applied to NER Task to improve the performance of the model more effectively .

  • Vertical domain complex entities are difficult to recognize

    In general, the entity recognition encountered in research and landing is mostly continuous entity recognition , But the proportion of complex entity recognition in practical application is higher and higher , Especially in the field of medical entity extraction .

2.4 Main stream NER Deep learning model

We have investigated the technological advances associated with these challenges , This section gives the corresponding report and analysis .

2.4.1 Classic models

Based on deep learning NER Model , Dadu general NER Tasks are modeled as sequential tagging tasks , And take Encoder-Decoder Architecture to model .

First apply deep learning to NER The model of the task should be LSTM+CRF Model [20], Different from the classic artificial feature design ,LSTM+CRF The model is based on data for feature learning , And good results have been achieved , It greatly promotes deep learning in NER The process of application in .

after , The design is in the model ,BiLSTM [21] To replace the LSTM As Encoder. Except for LSTM For the representative of the cyclic neural network RNN Model as Encoder, There are also convolutional neural networks CNN As Encoder Practice .

Newer ,ID-CNNs [22] utilize dilated CNN Model ( See the diagram below ) Solved the original CNN The limitation of the linear growth of the receptive field with the convolution layer number , Thus, the Encoder Feeling field of , In this way, we can integrate and use more long-range information to predict .

With BERT [23] For the emergence of the representative pre training language model , So that BERT As Encoder Become the new strongest Baseline, In the application landing , Often with the help of knowledge distillation technology to BERT The model is distilled , So as to improve the efficiency of online prediction .

           

2.4.2 The model of knowledge enhancement

(1) Vocabulary enhancement

For Chinese tasks , The lexical information in a sentence is obviously important , But first break the sentence into words , On the basis of word sequence, the task of sequence annotation is carried out , such NER The effect of model architecture is limited by the accuracy of word segmentation . therefore , How to integrate the lexical information in the sentence into the sequence tagging model based on words , It's Chinese NER One of the main research directions .

Lattice-LSTM [24] To express a sentence by words and words in it Lattice structure ( See the picture below ). In the word sequence based LSTM On the basis of ,Lattice-LSTM Copy LSTM Information transmission mechanism of , Integrating the information of a word into the representation of the first and last characters of the word . In this way, character level information and vocabulary level information are integrated organically , It enriches the semantic expression of the model , It also makes the model robust to noise caused by word segmentation .

In Chinese dataset MSRA [13] and WeiBo [14] On ,Lattice-LSTM Of F1 Values have the best performance over character based and vocabulary based models 2% The above performance improvements .

           

   

LR-CNN [25] Model by using CNN Model , And in CNN Introduction in Rethink Mechanism to solve Lattice-LSTM The model can not be parallelized and the confusion between words in the sentence . Concrete ,LR-CNN Will be different layer The result of convolution is considered different n-gram Vector representation of character groups , Then use the word vector in the sentence to attention The way to integrate it into its corresponding n-gram In the vector representation of character groups , In order to integrate lexical information .

In order to solve the problem of word confusion ,LR-CNN take CNN Of the last floor feature Vector and CNN The vector representation of each layer is performed again attention, In order to achieve the use of the last layer of feature To tune The effect of previous feature screening and expression , Furthermore, the model can adjust the confusion between words adaptively .

In Chinese dataset MSRA [13] and WeiBo [14] On ,LR-CNN Compare with Lattice-LSTM Of F1 The values are respectively 0.6% and 1.2% Performance improvement of .

FLAT [26] In the fusion of characters and words Lattice Structurally , introduce Transformer To model . Relative to the above RNN and CNN For the model of the infrastructure ,FLAT The ability to integrate longer-range information , And make better use of GPU Resources for parallel training and reasoning .

The main model point is : One 、 take Lattice The structure is reconstructed into a sequence structure according to the position of characters and the positions of the first and last characters of a word ; Two 、 because Transformer The absolute position vector coding can't model the sequence information well , therefore ,FLAT According to the beginning and end of words , Head , Caudal head , The trailing character distance defines four distances , And vector coding for these four distances .

Consider the character / Words and other characters / Vector representation of words , And the vector representation of distance is used to calculate the weight , Finally, we get the corresponding attention. In Chinese dataset MSRA [13] and WeiBo [14] On ,FLAT Compare with LR-CNN Of F1 The values are respectively 0.6% and 3% Performance improvement of .

(2) Entity type information enhancement

BERT-MRC [27] The description information of the entity type to be predicted is input into the model as prior knowledge , And will NER Problem modeling is reading comprehension problem (MRC), Finally through BERT To model .

Concrete , A given sentence S And the entity type to be extracted, such as “organization”, Through the question generation module, it will “organization” Turn it into a question Q“find organizations including companies, agencies and institutions”, Put this Q and S Enter as two sentences into BERT Training in .

Due to the addition of prior knowledge of entity types , In Chinese dataset OntoNotes4.0 Based on half the training data ,BERT-MRC The model effect of the model can achieve simple sentence S Input to BERT The training effect of the model with sequence annotation on the total data .

Besides , Because the identification of each type of data is distinguished , therefore , This kind of model can effectively solve the problem of entity intersection and nesting in complex entity recognition ( see 2.4.4). In Chinese dataset MSRA [13] On ,BERT-MRC Compared with the foregoing FLAT The model has 1.4% The promotion of , achieve 95.75% Of F1 value .

TriggerNER [28] It also takes the type information of the entity as part of the input of the model , The difference in   BERT-MRC, Its type information comes from some words in the sentence , be called Trigger words. As shown in the example below , Through the blue font in the sentence Trigger vocabulary , It can be inferred that Rumble Fish It's a restaurant name . In terms of model implementation ,TriggerNER It is divided into TriggerEncoder&Matcher and Trigger-Enhanced Sequence Tagging Two parts , Both parts are based on the same BiLSTM Provide the representation information of words .


TriggerEncoder&Matcher Part of it is based on Trigger The prediction of entity type and the expression of original sentence and Trigger Word sequence representation matching ,Trigger-Enhance Some will BiLSTM The presentation information provided is similar to Trigger Encoding The presentation information provided is integrated , Finally through CRF Layer for model output .

In the prediction phase , Test the sentences in the test set Trigger Vocabulary comes from the training set Trigger Dictionaries match . stay CONLL2003 On the English dataset ,TriggerNER stay 20% On the training set Trigger The effect of training after labeling and BiLSTM-CRF stay 70% The effect of training on the original training set is equivalent .

       

▲ Trigger Examples of vocabulary

 

2.4.3 Semi supervised model

Semi supervised algorithms are designed to model models on labeled and unlabeled datasets ( The overall model classification is shown in the figure below ). Semi supervised neural network learning using unlabeled data , stay NER Has been widely studied in the field .

With BERT [23] For the representative of the pre training language model , Based on large-scale unlabeled data , utilize   random mask And other mechanisms to model the joint probability distribution of word sequences , So as to carry out self-monitoring training , Finally, it can integrate text knowledge into the expression of word vector . On this basis , On tagged data fine-tune, You can get good results NER Model .

NCRF-AE [29] take label Information is modeled as hidden variables , Then make use of autoencoder To model both tagged and unlabeled data . say concretely , By way of label Information is modeled as hidden variables y, And then the probability distribution that needs to be predicted P(y|x) Replace with the following encoder-decoder Model , Furthermore, the reconstruction loss of unlabeled data can be used to enhance the modeling of label information .  

The difference in NCRF-AE The way to model label information as hidden variable directly ,VSL-G [30] By introducing pure hidden variables and the hierarchical structure between hidden variables , And make use of variational lower bound To construct the reconstruction loss function , Thus, the supervised loss function is separated from the unsupervised loss function . The significance of this model is to introduce and design the hierarchical structure between hidden variables , On this basis, the introduction of VAE The lower bound loss plays a good role in regularizing the parameters in the supervised model , Thus, it can achieve good generalization performance by training on small data sets .

Put sentences in a language A A sentence translated into another language B, And translate it back C, To get (A, C) Parallel corpus .LADA [31] Find out A and C Most of them contain the same number of target class entities . Based on this discovery ,LADA Put the model in the unlabeled sentence A,C Each token Add the output vectors on the , The vector obtained is the number vector of each type of entity contained in the sentence , Take the difference between these two vectors l2_ Norm as a loss on unsupervised samples . So we can use large-scale unsupervised data for model training , In the case of a small amount of data , It can improve the accuracy of the model .

added ,LADA [31] Using the image domain for data enhancement Mixup Method introduced to NER In the to .Mixup The core of the method is to interpolate the eigenvectors , To get new training data . because NER It belongs to sequence labeling problem , So we need to design more than one token The implicit vector interpolation method of .LADA [31] Use the original sentence token The sequence is rearranged and combined, and the training sentence set is carried out KNN The way of clustering , Two kinds of interpolation methods are obtained , Experiments show that this interpolation method is in NER It works .

Compared with LADA Data enhancement at the hidden vector level ,ENS-NER [32] Gaussian noise is added to the word vector to enhance the statistical data , And random masking token And synonym substitution for linguistic data enhancement , So as to enhance the data . Experiments on related datasets show that this kind of data enhancement is effective for NER There is a gain , Moreover, linguistic data enhancement and statistical data enhancement have the same effect .

It is worth noting that , except BERT Beyond the language model , When the amount of original label data accounts for a small proportion of the original training set in the above-mentioned semi supervised models , Such as 10% about , The effect is obvious , But when the proportion of the original tag training data increases , The gain of non original tag data to the model is not obvious .

2.4.4 Complex entities

The above model is mainly for the extraction of continuous entities , In practical application, there are some problems in the recognition of complex entities . The complexity here refers to the existence of discontinuous single entity and the covering and cross relationship between multiple entities . The following figure shows the discontinuous entity (discontinuous entity), Nested entities (nested entities) and Intersecting entities (overlapping entities) Example .

[33] In order to solve the problem that contains discontinuous entities overlapping Entity recognition , Introduced BIO Variants of the tagging system , That is to say BIO On the basis of , Added BD,BI,BH,IH Four indicators , Represent the Beginning of Discontinuous body, Inside of Discontinuous body, Beginning of Head and Inside of Head. In the picture above c For example , Under the new labeling system , The result is : muscle (BH) meat (IH) Pain, (B) pain (I) and (O) Tired (BD) work (ID). The drawback of this approach is that , If there are more than one discontinuous entity in the same sentence , There will be entity confusion .

[34] be based on transition-based Method , Introduce richer action Class to solve discontinuous entities overlapping The problem of identification . Concrete , Its use stack Store processed span, And use buffer Store unprocessed token.NER It can be reshaped as follows : Given the state of the parser , Predict a... Used to change the state of the parser action, Repeat the process , Until the parser reaches the end state ( namely stack and buffer All empty. ) until ( The figure below shows ). obviously , This kind of method can not only solve the discontinuous entity recognition , Can also solve entity nesting and partial overlap , Therefore, although this kind of method design is more complex than the previous annotation method , But it provides a unified framework for continuous and complex entity recognition . Besides , This method belongs to sequential decision problem , Therefore, a possible direction is to use the method of deep reinforcement learning to reshape the objective function and optimization process .

More ,[35] Introducing sentences hypergraph Structure representation is used to solve the problem of multi class entity nesting and discontinuous recognition , Compared with the classical model of sequence prediction , Its ultimate goal is to predict the local subgraph .

 

2.5 Summary

This section focuses on three challenges faced by entity recognition tasks : There are few labeled data , Industry knowledge is not fully utilized and complex entities are difficult to extract , Introduce the related technology progress , It mainly includes the following Bi-LSTM+CRF For the representative of the classic model , The model of knowledge enhancement , Semi supervised model and complex entity recognition model .

In terms of practical application , Based on the classical model, the method combining with industry dictionary or entity relationship description has been widely used , But in the recognition of complex entities , At present, there is no good model structure or simple and effective solution .

Relationship extraction

3.1 brief introduction

Relation extraction refers to the classification of relationship types between given entity pairs . Compare with OpenIE Relation extraction of non fixed type in , The relation extraction mentioned in this part refers to the relation extraction of fixed relation category set .

3.2 Data sets and metrics

at present , Relation extraction benchmark The data set mainly includes :

  • Sentence level relation extraction data set :ACE-2005 [36],SemEval 2010 Task-8 Data sets [37], TACRED [38]

  • Remote monitoring relation extraction data set :NYT Data sets (NYT10)[39]

  • Small sample relation extraction data set :FewRel [1],FewlRel 2.0 [2];

  • Document level relational extraction dataset :DocRED Data sets [40]

stay Evaluation indicators On , For supervised relationships, extract tasks , Use standard precision , Recall rate and F Measure to evaluate . For the relationship extraction model of Remote Supervision , Will be retained and / Or manual assessment . Tags for aligned text with a repository are not golden Of . therefore , In the ongoing assessment , Only relational facts from the knowledge base are considered correct for the test set , The new predicted relationship is considered wrong .

Because this assumption cannot express reality , So sometimes it needs to be evaluated manually . In small sample relation extraction , With N-way K-shot Configuration in the form of ,N Relationship means relationship ( class ) The number of ,K The number of relations represented by each instance . Test the model according to different data configurations , The accuracy of the model on the test set is explained .

3.3 challenges

at present , Relation extraction is one of the most important and difficult tasks in the automatic construction of knowledge map , In practical application and algorithm research, the main challenges are as follows :

  • Data tagging costs a lot : Because extracting relationships from text requires context information , It's also a difficult task for people , It takes a long time to get high quality annotation data , So the cost of manually tagging data is very high .

  • The long tail relationship doesn't work well : In the real world , There are inevitably a lot of long tailed relationships , These relationships have very little training data , It is difficult to train general relation extraction methods, especially those based on deep learning .

  • It is difficult to extract relations from complex scenes : Two kinds of complex relation extraction are often involved in the actual scene , There are big technical challenges :

     (1) Paragraph level relation extraction : The relationship between entities cannot be obtained directly from a single sentence , We need to read multiple sentences in the whole paragraph to extract relations in the way of machine reading comprehension .

     (2) The text contains multiple relationships : In the case of multiple relationships in the text , The current method is to capture the topological structure information of the whole text with the help of graph neural network , meanwhile , Sometimes it is necessary to infer the implicit relationship between entities from multiple relations in a sentence .

  • Error propagation from entity recognition to relation extraction : Using entity recognition and relation extraction Pipeline It is easy to cause error propagation in relation extraction . This kind of error propagation can be effectively avoided by using the method of joint extraction of entity relations , One of the effective methods is that entity recognition and relation extraction can be regarded as a sequence annotation task to realize the modeling of the whole triple .

 

3.4 Deep learning model of mainstream relation extraction

We looked at recent advances in research related to these challenges , The following part of this section mainly includes a report on these progress and some of our own thoughts , As a whole, the following figure summarizes the following main contents .

Besides , We are 2020 In the paper published in , adopt dblp Search relation extraction paper , According to the key words in the title , Get the data shown in the figure below , From this we can see the heat distribution of related research .

3.4.1 Classic models

In relation extraction , It is very important to extract the global features of relations in sentences . Convolutional neural networks (CNN) The ability to combine local features to obtain features that can represent the global situation .[41] The earliest will be CNN combination max pooling and word embedding Code the whole sentence , And the sentence coding representation is used for relation classification , The performance of the traditional method of relation extraction exceeds .

Newer ,[42] A multilayer attention convolution neural network is proposed (Multi-level Attention CNN), Introduce attention mechanism into CNN in , Give more weight to words that reflect more important relationships , In order to improve the effect of relation extraction .

because CNN Only local features can be extracted , Can not be well applied to a sentence between two entities of the distance between the case . Cyclic neural network (RNN) Especially long-term and long-term memory networks (LSTM) Ability to learn long-distance dependencies between entities ,[43] use RNN The relation extraction is carried out and the ratio is obtained based on CNN The relationship extraction is better .

therefore ,[44] It is found that the shortest dependency path between entities can best reflect the relationship between entities ( In the syntactic dependency tree , The shortest path from two entities to the common ancestor node ), And use it LSTM Coding realizes relation extraction .

2018 year , Pre training language model BERT [23] In a number of NLP The task shows powerful performance , A natural idea is to use BERT Model instead of CNN or RNN Coding sentences to achieve relation extraction .

2019 year ,[45] The earliest will be BERT Application in relation extraction , Based on BERT Relation extraction of R-BERT Model , By typing a sentence into BERT, And will BERT The results are input to the full connection layer for multi classification , Complete the relationship extraction task , At that time, this method achieved more than all the deep learning based relation extraction .

[46] Put forward EPGNN Model ( The figure below ), In combination with BERT The sentence features extracted from the model and the topological features of the subgraphs of the entity pairs extracted by the graph neural network in the knowledge map , To extract relations .

3.4.2 Remote monitoring model

Relationship extraction based on deep learning needs a lot of training data , But manually tagging these training data is very time-consuming and expensive . To solve this problem ,[47] stay 2009 The first remote monitoring technology was used to input sentences in text with Freebase Triple alignment in knowledge map , At this point, the triplet provides supervision information . However , There are two main problems in relation extraction using remote supervision :

 

  • Can't model overlapping relationships : There may be multiple different relationships between two entities , for example ( Jack ma, , establish , Alibaba ) and ( Jack ma, , CEO, Alibaba ), Therefore, it is impossible to determine which relationship between entities in the knowledge map should be the relationship that the current sentence needs to extract .

  • noise ( error ) label : The triple pairs in the knowledge map provide incorrect relation labels for entity pairs in some sentences , This brings confusion and errors to the training of the model .

In order to solve the above problems , At present, it is mainly from Multi instance multi label learning 、 Introduce more effective knowledge and Denoise These three angles realize the relationship extraction of remote supervision .

(1) Multi instance multi label learning (MIML)

In order to solve the problem of overlapping relations , Multi instance and multi label learning can be applied to relation extraction task . Single instance learning model is to predict a relation category from a sentence , The multi instance and multi label learning method relaxed this condition , It predicts from a sentence bag that it contains multiple relation categories . The following figure is a typical example of multi instance and multi label . It can be seen that , Above picture ( Obama , The United States ) This pair of entities corresponds to multiple instances ( The sentence ), At the same time, in the knowledge map (DB) Provide this pair of entities with 2 A label .

[48] The first proposed relationship extraction method based on multi instance and multi label learning MIML-RE, By using probability graph models to represent pairs of entities “ More than one instance ” and “ Multiple labels ”. The method of multi instance and multi label can solve the problem of overlapping relation , therefore , More remote monitoring methods are mainly used to solve the problem of noise labels . In the multi instance learning task , How to find the most relevant sentence from a sentence bag is particularly important .

PCNN [49] In the extraction of sentence feature vector representation, the position of entity is considered , Each sentence is encoded by segmented pooling operation , And choose the sentence with the highest probability of correctly predicting the relation tag in a sentence bag to update the parameters .

Considering the different importance of the relationship between different sentences in a sentence bag ,[50] Introducing sentence level Attention Mechanism , A sentence with a larger weight contributes more to the parameter update , conversely , The sentence with smaller weight contributes less to parameter updating , This makes full use of all the training data . Because relation extraction needs to consider the context information of entity pairs in a sentence , Therefore, dependency structure information is very important for relation extraction .

C-GCN [51] utilize GCN Dependency trees that encode sentences , So as to realize the relation extraction , A path centered pruning method is designed , Remove unrelated paths from the dependency tree of a sentence .

(2) How to introduce external knowledge

In order to improve entity representation and provide more semantic information for relation extraction , So as to reduce the influence of noise information on relation extraction , APCNN [52] stay PCNN [49] External entity description is introduced in , Entity description can provide more semantic information for improving entity representation and further predicting relationships . meanwhile , Represent learning from knowledge map TransE Model [53] To be inspired by , Make a relationship express satisfaction : Relationship means = The head entity represents - The triple constraint of tail entity representation , Furthermore, relational representation is applied to the sentence level attention mechanism of relation extraction .

Previous studies have shown that different relationships are independent , But in fact, the relational set has structured high-level semantic information , For example, in Freebase In the map of knowledge , Relationships are expressed in a hierarchical structure , The highest level of each relationship represents the general type of relationship . Therefore, we can capture the semantic relevance between different relationships from the relational level .

Based on this feature ,[54] Using hierarchical knowledge of relationships , The hierarchical attention mechanism is designed , In each sentence bag, pay attention to the relevance between the relationships , Implement instance selection from coarse to fine , Improve the relationship extraction effect of remote supervision .[55] take GCN It is used to get the embedded representation of relationship in knowledge map embedding , A mechanism of knowledge perception attention from coarse to fine is proposed , Integrate related knowledge into relational extraction model .

 

(3) How to remove noise tags

Another more direct solution to noise labeling in remote monitoring is Remove noise labels , At present, there are mainly Reinforcement learning and Confrontation training Two kinds of methods .

Strengthen learning to denoise

For remote supervision relation extraction , The best way to deal with error marked candidate sentences is to treat it with a deterministic decision , Instead of using attention weights in previous studies . So ,[56] Put forward a fundamental solution , Through training deep reinforcement learning strategy to generate false positive indicators , It can dynamically identify false positive samples of each relationship type , The false positive samples were redistributed to the real negative samples , To mitigate the impact of noise data .

Allied ,[57,58] All of them adopt relation extraction based on reinforcement learning , It decomposes the relation extraction problem into two tasks : Examples and selection relationships . Instance selector is a reinforcement learning agent , It uses weak supervision of relational classifiers to select instances . The advantage of relation extraction based on reinforcement learning is that the relation extraction model is decoupled from the case selection model based on reinforcement learning , Therefore, this kind of method can be easily adapted to any relation extraction model based on neural network .

Combat training denoising

[59] It is the first time to use the method of confrontation training to add anti noise to word embedding , To learn in multiple instances, multiple tags (MIML) Based on the framework of CNN and RNN The method of relation extraction .DSGAN [60] By learning the generator and discriminator of sentence level real positive samples to eliminate the noise data in remote supervised relation extraction .

[68] Aiming at the two shortcomings of the current noise data elimination model :(1) There is no effective method to introduce explicit supervision into denoising process ;(2) Evaluation of the optimization difficulty caused by sampling operation on denoising results , In this paper, we propose an antagonistic denoising framework , The framework provides an effective way to introduce human supervision , And use the monitoring and potentially useful information behind noisy data in a unified framework ( The model is shown in the figure below ).

       

3.4.3 Small sample relation extraction

In most data sets , The distribution of relations has a long tail , The amount of training data available for these long tail relationships is often small . Professor Liu Zhiyuan of Tsinghua University first proposed the task of small sample relation extraction and constructed the first large-scale small sample relation extraction data set FewRel [1], And in 2019 Consider domain migration and “ None of the above ” Detection task of FewRel 2.0 edition [2].

The vast majority of small sample relation extraction studies will be tested on these two datasets . Usually , There are two ways to realize small sample learning: metric learning and meta learning , therefore , The current small sample relation extraction is also based on these two kinds of methods .

Measurement learning model

The latest measure based learning approach is proposed by Google MTB Model [61], It USES the Comparative learning Thought , introduce matching the blanks The goal is : If two sentences contain the same entity pair , Then the similarity of their relationship representation should be as high as possible , On the contrary, the similarity should be as low as possible . At the same time, it will make the entity in the sentence with a certain probability ( In the paper is p=0.7) Conduct mask, In order to improve the model's ability to express relational semantics in sentences when entities are missing .

In the past year and a half , This model is in FewRel [1] All the evaluation indicators of the data set are still in SOTA state , And surpass human performance in two of these indicators . But it is questionable ,MTB The model depends on its basis on Wikipedia Self built include 6 Data sets of 100 million sentence pairs , And it has a supervisory relationship in low resource extraction tasks such as SemEval 2010 Task-8 [37],TACRED [38] It's not as good as Based The effect of training the model on full data is .

Meta learning model

[62] A Bayesian meta learning method is used to effectively learn the posterior distribution of the relation prototype vector , The initial parameters of the relation prototype vector are obtained by learning from the global graph using graph neural network , Then we use the meta learning algorithm with no model MAML dependent SGLD Methods to optimize the relation prototype vector , Then, the optimized relation prototype vector is used to predict the relationship .

         

3.4.4 Joint extraction of entity relationship

All the methods mentioned above need to use named entity recognition technology to determine the entity reference and its entity type , Then we apply the relation extraction technology . such Pipeline It is easy to cause error propagation , That is, if there is an error in named entity recognition , In the relation extraction stage, this error will be magnified, which will affect the effect of relation extraction . This kind of error propagation can be effectively avoided by using the method of joint extraction of entity relations . meanwhile , The purpose of entity recognition and relation extraction is to automatically construct triple knowledge , Therefore, the two tasks should be integrated .

(1) Model based on sequence tagging

[63] A novel annotation scheme is proposed ( See the picture below ), It treats entity relationship extraction task as a sequence annotation task , Simplify the complexity of the task , And the performance of the model is better than the previous Pipeline And the joint extraction method , This work has also achieved 2017 year ACL The outstanding Thesis Award of . However , This method cannot solve the problem of overlapping relations .

In order to solve the problem of relationship overlap in entity relationship extraction ,Wei A novel cascaded binary markup framework is proposed by et al CasRel [64], Different from the traditional relation extraction model, the relationship tags are predicted for entities ,CasRel The core idea of this paper is to model the relationship as a function from the beginning to the end , In other words, all possible tail entities can be identified under the condition of given relationship and head entity .CasRel Skillfully solved the problem of overlapping relations , And it has achieved significant performance improvement on public datasets .

               

(2) Based on text span Dynamic graph model of

DyGIE [65]( See the picture below ) In this paper, entity recognition and relation extraction are modeled as sentences span Graph construction and graph node classification , The nodes of the graph are in the sentence span. This model is out of the one-dimensional annotation and prediction system of the sequence tagging , But in the two-dimensional graph structure carries on the annotation and the forecast .

DyGIE++ [66] stay DyGIE Based on the model, the task of event element recognition is added , And the node representation after multiple graph information transmission is integrated to make the final prediction , what's more , This model uses Bert Replaced the original BiLSTM Make the underlying representation .

This kind of model can effectively solve the problem of entity nesting in entity recognition , But for discontinuous entities 、 The problem of overlapping relations has not been fully studied .

 

3.4.5 Paragraph level relation extraction

Most of the existing relational extraction methods are for relation extraction , However , In the real world , Many relationships between entities need to be expressed through multiple sentences in a text .

For example, a piece of text like this :“ Alibaba Dharma hall was founded in 2017 year 10 month 11 Japan , Is a company dedicated to exploring the unknown of Technology , Research institutes driven by human vision , The president is Zhang Jianfeng .” This text contains multiple entities , In especial “ Alibaba Damo hospital ” and “ Zhang Jianfeng ” The relationship between the two entities “ dean ” It takes more than one sentence to get . For this kind of entities across multiple sentences relationship extraction , It is necessary to extract relations from multiple sentences in the whole document in a way similar to machine reading comprehension .

[67] Considering that there are different ways of association between sentences in a document , For example, CO referential relationship , Semantic dependency tree, etc , Put forward GCNN The model is 5 Different association methods are used to establish different graphs, and the graph convolution operation is carried out separately , Then add the results of each graph , In this paper, the association features between sentences in the document are combined to extract relations .

In order to enhance the generality of paragraph level relation extraction , Tsinghua Liu Zhiyuan teacher team Yao yuan and others are in 2019 It was proposed that DocRED Data sets [40], It's based on Wikipedia text and WikiData Knowledge map is constructed by , Is a large-scale manually annotated paragraph level relation extraction data set . among ,DocRED In more than 40% We can only extract relational facts from multiple sentences , Therefore, it is necessary for the model to have a strong comprehensive understanding of the information in the article , Especially the ability to extract relations across sentences .

In the paper DocRED The latest relational extraction method is used on the dataset , And evaluate these methods , The current methods are difficult to achieve good results , It shows that paragraph level relation extraction is a direction worthy of further study . 

 

3.5 Summary

This section extracts some of the main challenges from relationships : Small sample , Remote monitoring data quality is difficult to guarantee , Entity relationship extraction and document level relation extraction , The related technical methods are introduced . It can be seen that , relatively speaking , The research direction of remote supervision and entity relationship extraction attracts more research , Both of them did get a good effect on the landing . But in the small sample and document level relationship extraction problem in practical application is also more and more important . Special , More and more actual extraction tasks take the whole document as the input of the task , However, there are few studies in this field .

 

New challenges

4.1 Document level information extraction problem

In the actual project , In addition to extracting entities and relationships from sentences and paragraphs , We also face new challenges in extracting information from documents . The following two pictures are about insurance contracts pdf A screenshot of the document . In the processing of such documents , We face two tasks :

(1) Document structure extraction

In many vertical industries , As the example shows, there are a lot of semi-structured documents . How to analyze data well according to the hierarchical structure of document content itself , Then according to its hierarchical structure to summarize and sort out the knowledge map schema It's a huge new challenge facing us today . There are various formats of industry documents , Yes pdf,word,txt Etc ,pdf The format is divided into standard pdf, searchable pdf And the scanned version pdf,word The versions of the documents are also different .

The internal format of the document is ever-changing , For example, there are single columns , Double column , Horizontal version , Vertical version of ( Less ), The title is obvious , The title is not obvious , There are some segment If the title is valuable , There are some segment For example, the notes are relatively small in value and so on . Of course , besides , And there's a lot of tables embedded in it 、 Image and other information identification confusion and other issues .

(2) Given schema Information extraction of

In the knowledge map schema Given the premise , Extract specific information from such documents , For example, the age at which the insurance is drawn . Due to the diversity of document formats and industry statements and cross references within documents , It makes it very difficult to extract such information directly from documents , For example, in the first document “ Coverage ” Corresponding to the insured age , In the second document “ Age of insurance ” The actual content of the document refers to 10.1 Section content . These require document level semantic understanding and logical reasoning ability , Can be very good for this kind of information extraction .

4.2 Cutting edge research

Facing the challenge of document level information extraction , We find that it is possible to integrate the two kinds of technologies, and finally give a solution to document level information extraction . Here is a brief introduction to them :

4.2.1  Document AI

Facing the document level information extraction task mentioned above , The first thing to consider is the data parsing of such documents , That is, how to extract the data in the document according to its original structure . It involves reading multi-source documents ,segment/paragraph Distinguish ,segment/paragraph There are many tasks, such as relationship discrimination . obviously , Visual information of this kind of document (Layout information) It's crucial for data parsing .

Document Intelligentce( Also known as Document AI) It's a specialized analysis document Layout Information and internal structure The research area of [69], Its purpose is to decompose a document or pictorial document into separate region(Phisical Layout), And structure out region Role ( Like a title or paragraph ) And relationships (Logical Structure), Such as the relationship between the title and the subheading , The relationship between title and content . therefore ,Document AI The domain model can be used to solve the problem mentioned above .

   

                   

But there are more advanced Document AI Model , Such as LayOut( See the picture below )[70] etc. , It is mainly used to process the structured identification of bill content . The most cutting-edge dataset is DocBank [71], It is based on arxiv Lots of papers on the website pdf Documents are more than latex The corresponding relationship between the code and automatically build Document AI Training data , But it's only for the region For identification , Such as identification Abstract, Introduction, caption, table The content such as , But the lack of right region Between logical structure The identification of , and region Of logical structure distinguish , The information structure of the aforementioned documents is crucial .

     

therefore , In this aspect of research , Whether it's large-scale data set construction or integration Physical Layout   and Logical Structure The joint extraction model of , At present, the relevant literature is still rare , More attention and in-depth research are needed .

4.2.2 Long structured language model

Only in the document segment Integrate the overall information of the document in the representation of , In order to do a good job in document based information extraction task . therefore , On the premise of effective structured extraction of the above documents , How to encode and represent macro and local information of such structured data , The second challenge we face is .

Recently used to encode long sentences (1w character - 10w Character level ) The language model of may be able to effectively solve the above challenges . Concrete ,ETC Model [72]( See the picture below ) utilize Global-local Of Attention The mechanism realizes the pre training representation of long and structured statements , The validity of the method is verified by the key phrase extraction of hierarchical data based on Web pages . however ETC [72] In the multi-level structure of the statement coding is still not very good design , and Global-local Sparsity of Attention The mechanism also faces the defect of information loss .

therefore , How to build on the above document structure , New architecture design on the recently emerged long sentence language model , It makes the language model more effective in encoding structured information and text information of documents , It also needs more attention and in-depth research .

 

4.3 Summary

This section describes the new document level information extraction challenges we face in real business , And as a potential solution , This section also introduces Document AI and Two technical modules of long sentence language model . In short ,Document AI and No language model can directly adapt to the current extraction task , Facing the new challenge of document level information extraction mentioned above , There is no systematic solution yet , therefore , This research direction deserves more attention and research of researchers and engineering peers .

 

Summary and Outlook

This paper focuses on the construction of industry knowledge map , Yes schema structure 、 This paper introduces and analyzes the related technologies and the latest development of entity recognition and relation extraction . At the same time, we introduce the new challenge of document level information extraction , It also analyzes and discusses Document AI And long structured document language model on this new challenge of cutting-edge technology .

With the continuous development and improvement of knowledge map as the bottom layer of cognition , The application field has also penetrated into various vertical industries from the Internet , The construction of efficient mapping based on the knowledge of various industries will be the application of knowledge mapping to ToB Key markets . From our point of view , There are several trends in the construction of industry knowledge map in the future :

  • schema Build automation : The domain of industry knowledge mapping will develop a set of effective schema Build relevant standards and specifications , Thus for schema The automatic construction algorithm provides clear optimization iteration goal and reasonable architecture design reference . With nlp The rapid and vigorous development of the field ,schema The ability of information extraction and abstract integration involved in the construction will also be greatly improved . therefore , Industry map scema The man-machine investment ratio in construction will be from 7:3 It's been developing to 5:5,3:7 Even fully automated schema structure .

  • The unity and low resource of information extraction : Information extraction scheme , Will be more and more inclined to the comprehensive extraction of information , Unity covers entities 、 Relationship 、 Comprehensive information such as events , This will bring great changes to the algorithm model architecture design and data engineering link construction . meanwhile , In addition to the continuous development of large-scale language models , Implicit data resources generation and explicit industry prior knowledge resources integration technology will also continue to develop mature , These will promote the low resource information extraction model will become the mainstream solution .

  • From the sentence level 、 Paragraph level to document level : With the scale of its data 、 The structure and macroscopicity of knowledge and the multimodality of content , Document level information extraction will get more and more research , Thus, large-scale industry structured or even unstructured documents are used as input , Direct output of mapping industry knowledge end-to-end map building links will become popular in the future .

 

Last , It is hoped that this progress study can bring some inspiration and help to readers' research work , At the same time, thank you for your patience , If there is any mistake in this article , Please grant me your advice .

Note appended : The architecture diagram and formula related to the algorithm model in this paper are all from the original paper , Other pictures are provided by the author of this article .

 

reference

[1] Han, Hao Zhu, Pengfei Yu, ZiyunWang, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2018d. Fewrel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In Proceedings of EMNLP, pages 4803–4809.

[2] Tianyu Gao, Xu Han, Hao Zhu, Zhiyuan Liu, Peng Li, Maosong Sun, and Jie Zhou. 2019. FewRel 2.0: Towards more challenging few-shot relation classification. In Proceedings of EMNLP-IJCNLP, pages 6251–6256.

[3] https://github.com/gabrielStanovsky/oie-benchmark

[4]《 Knowledge map : Method , Practice and Application 》, Wang haofen / Lacquer Guilin / Chen Huajun Editor in chief , Electronic industry press , 2019

[5] Yates, A.; Banko, M.; Broadhead, M.; Cafarella, M.; Etzioni,O.; and Soderland, S. 2007. Textrunner: Open information extraction on the web. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), 25–26.

[6] Diego Marcheggiani and Ivan Titov. 2016. Discretestate variational autoencoders for joint discovery and factorization of relations. Transactions of ACL.

[7] Elsahar, H., Demidova, E., Gottschalk, S., Gravier, C., & Laforest, F. (2017, May). Unsupervised open relation extraction. In European Semantic Web Conference (pp. 12-16). Springer, Cham.

[8] Wu, R., Yao, Y., Han, X., Xie, R., Liu, Z., Lin, F., ... & Sun, M. (2019, November). Open relation extraction: Relational knowledge transfer from supervised data to unsupervised data. In EMNLP-IJCNLP (pp. 219-228).

[9] Stanovsky, G., Michael, J., Zettlemoyer, L., & Dagan, I. (2018, June). Supervised open information extraction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 885-895).

[10] Zhan, J., & Zhao, H. (2020, April). Span model for open information extraction on accurate corpus. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 9523-9530).

[11] Cui, L., Wei, F., & Zhou, M. (2018). Neural open information extraction. arXiv preprint arXiv:1805.04270.

[12] Sameer Pradhan, Mitchell P. Marcus, Martha Palmer, Lance A. Ramshaw, Ralph M. Weischedel, and Nianwen Xue, editors. 2011. Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, CoNLL 2011, Portland, Oregon, USA, June 23-24, 2011. ACL.

[13] Gina-Anne Levow. 2006. The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In Proceedings of the Fifth SIGHANWorkshop on Chinese Language Processing, pages 108–117, Sydney, Australia. Association for Computational Linguistics.

[14] Nanyun Peng and Mark Dredze. 2015. Named entity recognition for Chinese social media with jointly trained embeddings. In EMNLP. pages 548–554.

[15] Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 - June 1, 2003, pages 142–147.

[16] George R Doddington, Alexis Mitchell, Mark A Przybocki, Stephanie M Strassel Lance A Ramshaw, and Ralph M Weischedel. 2005. The automatic content extraction (ace) program-tasks, data, and evaluation. In LREC, 2:1.

[17] Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Bj¨orkelund, Olga Uryupina, Yuchen Zhang, and Zhi Zhong. 2013. Towards robust linguistic analysis using OntoNotes. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 143–152, Sofia, Bulgaria. Association for Computational Linguistics.

[18]  Ruan Tong , Wang Mengjie , Wang haofen , & Hu fanghuai . (2016). Research on the construction and application of vertical knowledge map . Knowledge Management Forum (3).

[19] Wu, T.; Qi, G.; Li, C.; Wang, M. A Survey of Techniques for Constructing Chinese Knowledge Graphs and Their Applications. Sustainability 2018, 10, 3245.

[20] Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of machine learning research, 12(ARTICLE), 2493-2537.

[21] Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.

[22] Strubell, E., Verga, P., Belanger, D., & McCallum, A. (2017). Fast and accurate entity recognition with iterated dilated convolutions. arXiv preprint arXiv:1702.02098.

[23] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[24] Zhang, Y., & Yang, J. (2018). Chinese ner using lattice lstm. arXiv preprint arXiv:1805.02023.

[25] Gui, T., Ma, R., Zhang, Q., Zhao, L., Jiang, Y. G., & Huang, X. (2019, August). CNN-Based Chinese NER with Lexicon Rethinking. In IJCAI (pp. 4982-4988).

[26] Li, X., Yan, H., Qiu, X., & Huang, X. (2020). FLAT: Chinese NER Using Flat-Lattice Transformer. arXiv preprint arXiv:2004.11795.

[27] Li, X., Feng, J., Meng, Y., Han, Q., Wu, F., & Li, J. (2019). A unified mrc framework for named entity recognition. arXiv preprint arXiv:1910.11476.

[28] Yuchen Lin, B., Lee, D. H., Shen, M., Moreno, R., Huang, X., Shiralkar, P., & Ren, X. (2020). TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition. arXiv, arXiv-2004.

[29] Zhang, X., Jiang, Y., Peng, H., Tu, K., & Goldwasser, D. (2017). Semi-supervised structured prediction with neural crf autoencoder. Association for Computational Linguistics (ACL).

[30] Chen, M., Tang, Q., Livescu, K., & Gimpel, K. (2019). Variational sequential labelers for semi-supervised learning. arXiv preprint arXiv:1906.09535.

[31] Chen, J., Wang, Z., Tian, R., Yang, Z., & Yang, D. (2020). Local Additivity Based Data Augmentation for Semi-supervised NER. arXiv preprint arXiv:2010.01677.

[32] Lakshmi Narayan, P. (2019). Exploration of Noise Strategies in Semi-supervised Named Entity Classification.

[33] Alejandro Metke-Jimenez and Sarvnaz Karimi. 2015. Concept extraction to identify adverse drug reactions in medical forums: A comparison of algorithms. CoRR abs/1504.06936.

[34] Xiang Dai, Sarvnaz Karimi, Ben Hachey, Cécile Paris. An Effective Transition-based Model for Discontinuous NER. ACL 2020: 5860-5870

[35] Wei Lu and Dan Roth. 2015. Joint mention extraction and classification with mention hypergraphs. In Conference on Empirical Methods in Natural Language Processing, pages 857–867, Lisbon, Portugal.

[36] Walker, C., Strassel, S., Medero, J., and Maeda, K. 2005. ACE 2005 multilingual training corpus-linguistic data consortium.

[37] Szpakowicz, S. 2009. Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions, pages 94–99. Association for Computational Linguistics.

[38] Zhang, Yuhao and Zhong, Victor and Chen, Danqi and Angeli, Gabor and Manning, Christopher D. 2017. Position-aware Attention and Supervised Data Improve Slot Filling. In Proceedings of EMNLP. Pages 35-45.

[39] Riedel, S., Yao, L., and McCallum, A. 2010. Modeling relations and their mentions without labeled text. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 148-163. Springer.

[40] Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, and Maosong Sun. 2019. DocRED: A large-scale document-level relation extraction dataset. In Proceedings of ACL, pages 764–777.

[41] Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of COLING, pages 2335–2344.

[42] Linlin Wang, Zhu Cao, Gerard De Melo, and Zhiyuan Liu. 2016. Relation classification via multi-level attention cnns. In Proceedings of ACL, pages 1298–1307.

[43] Dongxu Zhang and Dong Wang. 2015. Relation classification via recurrent neural network. arXiv preprint arXiv:1508.01006.

[44] Xu, Y., Mou, L., Li, G., Chen, Y., Peng, H., and Jin, Z. 2015. Classifying relations via long short term memory networks along shortest dependency paths. In proceedings of EMNLP, pages 1785–1794.

[45] Shanchan Wu and Yifan He. 2019. Enriching pre-trained language model with entity information for relation classification.

[46] Zhao, Y., Wan, H., Gao, J., and Lin, Y. 2019. Improving relation classification by entity pair graph. In Asian Conference on Machine Learning, pages 1156–1171.

[47] Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of ACL-IJCNLP, pages 1003–1011.

[48] Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of EMNLP, pages 455–465.

[49] Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of EMNLP, pages 1753–1762.

[50] Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of ACL, pages 2124–2133.

[51] Yuhao Zhang, Peng Qi, and Christopher D. Manning. 2018. Graph convolution over pruned dependency trees improves relation extraction. In Proceedings of EMNLP, pages 2205–2215.

[52] Guoliang Ji, Kang Liu, Shizhu He, Jun Zhao, et al. 2017. Distant supervision for relation extraction with sentence-level attention and entity descriptions. In AAAI, pages 3060–3066.

[53] Bordes A, Usunier N, Garcia-Duran A, et al. 2013. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems. pages 2787-2795.

[54] Xu Han, Pengfei Yu, Zhiyuan Liu, Maosong Sun, and Peng Li. 2018. Hierarchical relation extraction with coarse-to-fine grained attention. In Proceedings of EMNLP, pages 2236–2245.

[55] Ningyu Zhang, Shumin Deng, Zhanlin Sun, Guanying Wang, Xi Chen, Wei Zhang, and Huajun Chen. 2019. Long-tail relation extraction via knowledge graph embeddings and graph convolution networks. In Proceedings of NAACL-HLT, pages 3016–3025.

[56] Qin, P., Xu, W., and Wang, W. Y. 2018b. Robust distant supervision relation extraction via deep reinforcement learning. arXiv preprint arXiv:1805.09927.

[57] Xiangrong Zeng, Shizhu He, Kang Liu, and Jun Zhao. 2018. Large scaled relation extraction with reinforcement learning. In Proceedings of AAAI, pages 5658–5665.

[58] Jun Feng, Minlie Huang, Li Zhao, Yang Yang, and Xiaoyan Zhu. 2018. Reinforcement learning for relation classification from noisy data. In Proceedings of AAAI, pages 5779–5786.

[59] Yi Wu, David Bamman, and Stuart Russell. 2017. Adversarial training for relation extraction. In Proceeding of EMNLP, pages 1778–1783.

[60] Pengda Qin, Weiran Xu, William Yang Wang. 2018. DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction. In Proceeding of ACL, pages 496–505.

[61] Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski. 2019. Matching the blanks: Distributional similarity for relation learning. In Proceedings of ACL, pages 2895–2905.

[62] Meng Qu, Tianyu Gao, Louis-Pascal Xhonneux, Jian Tang. 2020. Few-shot Relation Extraction via Bayesian Meta-learning on Task Graphs. In Proceedings of ICML.

[63] Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao,Peng Zhou, Bo Xu. 2017. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1227–1236.

[64] Wei, Zhepei and Su, Jianlin and Wang, Yue and Tian, Yuan and Chang, Yi. 2020 A Novel Cascade Binary Tagging Framework for Relational Triple Extraction}. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, pages 1476—1488.

[65] Luan, Y., Wadden, D., He, L., Shah, A., Ostendorf, M., & Hajishirzi, H. (2019). A general framework for information extraction using dynamic span graphs. arXiv preprint arXiv:1904.03296.

[66] Wadden, D., Wennberg, U., Luan, Y., & Hajishirzi, H. (2019). Entity, relation, and event extraction with contextualized span representations. arXiv preprint arXiv:1909.03546.

[67] Sahu, S. K., et al. 2019. Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: 4309–4316.

[68]mLiu, B., Gao, H., Qi, G., Duan, S., Wu, T., & Wang, M. (2019, April). Adversarial Discriminative Denoising for Distant Supervision Relation Extraction. In International Conference on Database Systems for Advanced Applications (pp. 282-286). Springer, Cham.

[69] Namboodiri, A. M., & Jain, A. K. (2007). Document structure and layout analysis. In Digital Document Processing (pp. 29-48). Springer, London.

[70] Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020, August). Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1192-1200).

[71]mLi, M., Xu, Y., Cui, L., Huang, S., Wei, F., Li, Z., & Zhou, M. (2020). DocBank: A Benchmark Dataset for Document Layout Analysis. arXiv preprint arXiv:2006.01038.

[72] Ainslie, J., Ontanon, S., Alberti, C., Cvicek, V., Fisher, Z., Pham, P., ... & Yang, L. (2020, November). ETC: Encoding Long and Structured Inputs in Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 268-284).

[73] Tang, J., Lu, Y., Lin, H., Han, X., Sun, L., Xiao, X., & Wu, H. (2020, November). Syntactic and Semantic-driven Learning for Open Information Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings (pp. 782-792).

[74] https://paperswithcode.com/task/named-entity-recognition-ner

 

Read more

# cast draft   through Avenue #

  Let your paper be seen by more people  

How to make more high-quality content reach the reader group in a shorter path , How about reducing the cost of finding quality content for readers ? The answer is : People you don't know .

There are always people you don't know , Know what you want to know .PaperWeekly Maybe it could be a bridge , Push different backgrounds 、 Scholars and academic inspiration in different directions collide with each other , There are more possibilities . 

PaperWeekly Encourage university laboratories or individuals to , Share all kinds of quality content on our platform , It can be Interpretation of the latest paper , It can also be The learning or Technical dry cargo . We have only one purpose , Let knowledge really flow .

????  Standards for contributions :

• The manuscript is really personal Original works , The author's personal information should be indicated in the manuscript ( full name + School / Work unit + Education / Position + Research direction ) 

• If the article is not the first one , Please remind and attach all published links to your contributions  

• PaperWeekly By default, every article is the first , Will add “ original ” sign

????  Send email :

• Send email :hr@paperweekly.site 

• All articles are illustrated , Please send it separately in the attachment  

• Please leave an instant contact ( Wechat or mobile phone ), So that we can communicate with the author when editing and Publishing

????

Now? , stay 「 You know 」 We can also be found

Go to Zhihu home page and search 「PaperWeekly」

Click on 「 Focus on 」 Subscribe to our column

About PaperWeekly

PaperWeekly It's a recommendation 、 Reading 、 Discuss 、 An academic platform for reporting the achievements of the frontier papers on artificial intelligence . If you study or engage in AI field , Welcome to clicking on the official account 「 Communication group 」, The little assistant will take you into PaperWeekly In the communication group .

版权声明
本文为[osc_ otuqqtuq]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201208101102503a.html