当前位置:网站首页>2021-ieee paper - Application Status and performance analysis of deep neural network in document image table recognition

2021-ieee paper - Application Status and performance analysis of deep neural network in document image table recognition

2022-05-14 14:00:01Zheng Jianyu JY

2021 year 5 month 12 Day received ,
2021 year 6 month 4 Daily acceptance ,
 Publication date 2021 year 6 month 9 Japan ,
 Current version date 2021 year 6 month 24 Japan . 

Download address of the original paper

Abstract - Abstract

   Table recognition The first stage Is to detect the table area in the document . And then , stay The second stage Identify the table structure , In order to extract information from each unit . Form detection and Structural identification It is a key problem in the field of table understanding . However , Because there are a lot of diversity and asymmetry in the table , As a result, table analysis is a complex task , Therefore, it is an active research field in document image analysis . The latest advances in the computing power of graphics processing units make the performance of deep neural networks better than the traditional most advanced machine learning methods . Form understanding from Deep neural network Benefit a lot from the latest breakthrough of . However , There is no unified description of the deep learning methods of table detection and table structure recognition . This paper makes a comprehensive analysis of modern methods using deep neural network . Besides , It also has a comprehensive understanding of the latest technologies and related challenges in the understanding of tables in document images . The main data sets and their complexity have been described with quantitative results . Besides , This paper briefly summarizes the promising directions that can further improve table analysis in document images .

1. introduction - Introduction

   Since the last ten years , Table understanding has gained great appeal . Table is a common method to represent and transfer structured data [1]. With depth neural network (DNN) The rise of , Various forms are used for detection 、 Segmented and identified data sets have been published [2],[3]. This allows researchers to use deep neural networks DNN To improve state-of-the-art results .
   before , The problem of table recognition has always been to use traditional method To deal with the [4]–[7].Kieninger and Dengel[8]、Kieninger[9]、Kieninger and Dengel[10] Completed one of the early work in the field of table analysis . In addition to detecting the table area , Their system T-Recs It also extracts the structure information of the table .
   later , Machine learning technology is applied to test forms . One of the pioneers was Cesarini et al [11]. The system they put forward Tabfinder Convert document to MXY Trees ,MXY Trees Is a hierarchical representation of documents . It searches for a block area on horizontal and vertical parallel lines , Then depth first search processes noisy document images , Get a table area .e.Silva[12] Rich hidden Markov models are used , The table area is detected based on the joint probability distribution .
   Support vector machine (SVM)[13] Together with some hand-made functions, it is used to test the table [14].Fan and Kim[15] This paper attempts to detect tables by fusing various classifiers , These classifiers are trained according to the language and layout information of the document . Tran wait forsomeone [16] Another job of using Region of interest ROI To detect the table in the document image . If ROI The text block in satisfies a specific rule set , Then these areas will be further filtered as a table .
  Wang wait forsomeone [17] A comprehensive study was carried out , Not only focus on table detection , Also focus on table decomposition . they Algorithm based on probability optimization Similar to the famous X-Y Cutting algorithm [18].Shigarov wait forsomeone [19] The published system uses the boundary box of words to restore the structure of the table . Because the system relies heavily on Metadata , Adopted by the author PDF File to perform the experiment .
 chart 1

chart 1: Table analysis of the traditional and deep learning methods of the transfer path comparison . Feature extraction in traditional methods is mainly realized by image processing technology , Convolutional network is used for deep learning technology . Different from the traditional method , The deep learning method for table understanding does not depend on data , And it has better generalization ability .

   chart 1 It describes traditional method and Deep learning Methods the standard transfer path in the process of table understanding (pipeline) Compare . The traditional table recognition system is either not universal enough on different data sets , Or we need PDF Additional metadata in the file . In most traditional methods , In order to improve the performance of traditional table recognition system , It also adopts exhaustive pre-processing and post-processing . However , In the deep learning system , Convolutional neural networks are mainly neural networks [20] Used to extract features , Instead of manually extracting features . And then , The target detection or segmentation network attempts to distinguish the table parts further decomposed and recognized in the document image .
 chart 2

chart 2: The organization of the methods explained in the paper . The concept of writing in blue represents table detection technology . The red method shows the method of table segmentation or table structure recognition , The green architecture describes the table recognition method , The method of extracting the contents of the cells in the table . As shown in the figure , Some architectures have been used for multiple table understanding tasks .

   Text documents can be divided into two categories . The first category belongs to born-digital born digital ) file , It doesn't just contain text , It also contains layout information and other related metadata .PDF Documentation is an example of this . The second type of file is obtained using devices such as scanners and cameras . As far as we know , At present, there is no noteworthy research to apply deep learning to table recognition in camera images . However , In the literature , There is a kind of Based on heuristics Methods [21] Suitable for document images captured by camera . The scope of this survey is to evaluate the method based on deep learning for performing table recognition on scanned document images .
   The structure of this article is as follows : In the second quarter The previous investigations in the field of table understanding are discussed . In the third quarter Several methods of using the concept of deep learning to solve table analysis are discussed in detail . chart 2 The structure and flow of the above method are explained . The fourth quarter, Describes the data sets exposed in tabular analysis . Section 5 Explain the famous evaluation index , And provide In the third quarter Performance analysis of all methods discussed in . Section 6 Summarize the discussion , and Section 7 Highlighted various outstanding issues and future directions .

2. Related work - Related Work

 chart 3

chart 3: The growth trend in the field of chart analysis . These data are obtained by looking up 2015 - 2019 Collection of annual publications on form detection and form recognition in .

   The problem of table analysis has been a recognized problem for many years . chart 3 Shows the past 5 The growth trend of the number of publications in the middle of the year . Because this is a review article , We want to be right table Some of the previous surveys and descriptions in the community . In the chapter of document identification in one of his books ,Dougherty Defines the table [22]. In the investigation of document identification ,Handley[23] The task of table recognition is described , The previous work in this field is accurately explained . later ,Lopresti and Nagy[24] This paper introduces the survey on table understanding , In their survey, they discussed the heterogeneity in different types of tables . They also identified potential areas where many examples could be used for improvement . The comprehensive survey was translated into tabular form , Later published as a Book [25].
  Zanibbi wait forsomeone [26] Put forward a detailed investigation , Including all the latest materials and the most advanced methods at that time . They define the form recognition problem as “ Model 、 Observe 、 The interaction of transformation and reasoning ”[27]. Hurst in his doctoral thesis [28] The explanation of the table is defined in .e.Silva wait forsomeone [29] stay 2006 Another survey was published in . While evaluating existing table processing algorithms , The author puts forward his own end-to-end table processing method and evaluation index , To solve the problem of table structure identification .
  Embley wait forsomeone [27] Wrote an overview , This paper expounds the form processing paradigm .2014 year ,Coüasnon and Lemaitre Published another comment on form recognition and forms [30]. This review provides a brief overview of the latest methods at that time . In the next year , As far as we know ,Khusro wait forsomeone [31] Published on detection and extraction PDF The latest comments on the form in the document .

3. methodology - Methodologies

Such as [32] Described , We also define the table understanding problem as three steps :
  A. Form detection : Detect the table boundary according to the boundary box in the document image .
  B. Table structure segmentation : Define the structure of the table by analyzing the row and column layout information .
  C. Form identification : Including structural segmentation and parsing information of table cells .

A. Form detection - TABLE DETECTION

 chart 4

chart 4: The basic flow of table detection and the methods used in the methods discussed . To locate the table boundary , Document images are transmitted through various deep learning architectures .

   The first part of extracting information from a table is to identify the table boundary in the document image [33]. chart 4 The basic flow of table detection discussed in many methods is explained . Various deep learning concepts have been used to detect table areas from document images . This section reviews the deep learning techniques used to perform table detection in document images . For the convenience of readers , We divide these methods into discrete deep learning concepts . surface 1 All table detection methods based on object detection are summarized , and surface 2 The advantages and limitations of applying other methods based on deep learning technology are emphasized .
   According to our knowledge ,Hao wait forsomeone [34] The first method of using deep learning method to solve the task of table detection is proposed . In addition to using convolutional neural network to extract image features , The author also uses PDF Metadata applies some heuristic methods . Because the technology is based on PDF file , Instead of relying on document images , Therefore, we decided not to include this study in the performance analysis .
 surface 1

surface 1: The advantages and limitations of various forms detection methods based on deep learning based on object detection framework are summarized .

 surface 2

surface 2: The advantages and limitations of various table detection methods are summarized . The methods in this table are based on the concept of deep learning , Not target detection algorithms . Bold horizontal lines separate the technologies of different architectures .

1) Target detection algorithm - OBJECT DETECTION ALGORITHMS

   Target detection is a branch of deep learning , It involves detecting targets in any image or video frame . The region based target detection algorithm is mainly divided into two steps : The first step is to generate an appropriate solution , Also known as regions of interest . In the second step , The convolution neural network is used to classify these regions of interest .

a: The migration study - TRANSFER LEARNING

   The migration study The concept of is to use a pre trained model for problems belonging to different but related fields [35]. Due to the limited number of available tag data sets , Transfer learning is overused in vision based methods [36]–[39]. For similar reasons , Researchers in the document image analysis community have also enhanced the ability of transfer learning , To advance their approach [40]–[42]. The ability of transfer learning helps researchers reuse pre trained networks in table detection and table structure recognition in document images ( stay ImageNet[20] or COCO[43] Trained in ). And the first 3-A1 section .b3-A1.c and 3-A1.f The table detection method based on transfer learning is explained , The first 3-B5 section This paper expounds the technology of applying transfer learning to the task of table structure recognition .

b: FASTER R-CNN

   The target detection algorithm is transformed from Fast R-CNN[54] Improve to Faster R-CNN[55] after , The table is treated as an object in the document image .Gilani wait forsomeone [44] The depth learning method is used to detect the table . This technology takes image conversion as a preprocessing step , Then perform table detection . In the image transformation part , Take the binary image as the input , On it, the blue channel of the image 、 The green and red channels are transformed by Euclidean distance [56]、 Linear distance transformation [57] And maximum distance transformation [58]. later , Gilani et al [44] A region based object detection model is used , be called Faster R-CNN[55]. Its Area generation network (RPN) The backbone of is based on ZFNet[59]. Their method can be used in UNLV Data sets [2] Beat the most advanced results .
  Schreiber wait forsomeone [45] Using the ability of deep learning, a work is performed on the document image . Their end-to-end system DeepDeSRT It can not only detect the table area , You can also distinguish the structure of the table , Both of these tasks are through the application of unique deep learning technology .
   By using Faster R-CNN[55] Table detection is realized . They experimented with two different architectures as backbone networks (backbone):Zeiler and Fergus(ZFNet)[59] and deep VGG-16 The Internet [60]. The model is in Pascal VOC[61] Pre trained on the dataset . The first 3-B section Explain the method of structure segmentation .
   With the graphics processing unit (GPU) The increase in memory , Create space for larger public data sets , To make the most of it GPU The function of .Li wait forsomeone [62] Understand this need , And put forward TableBank, It includes 417K Marked tables and their respective document images . They also use Faster R-CNN[55] To complete the form detection task , A baseline model is proposed . A baseline method for structure recognition is also proposed , This will be in the next 3-B Section explains .
   stay ICDAR 2019 In another study presented at the annual meeting , Use Faster R-CNN Table combination detection , And use positioning Corner point Methods to further improve the table [63]. The author defines the angle as the size drawn around the vertices of the table, which is 80×80 The square of . In addition to locating the boundaries of the table , You can also use the same Faster R-CNN Model detection corner .
   These corners will be further refined after various explorations , It's like two consecutive corners on the same horizontal line . After analyzing the corners , Inaccurate corners are filtered and left to form a group . The author thinks that , Most of the time , The inaccuracy of the table boundary is because compared with the top and bottom of the boundary , The left and right sides of the boundary are not detected accurately . therefore , In this experiment , Refine only the right and left sides of the detected table . By calculating the intersection of the union between tables , First find the corresponding corner of the table , So as to refine . And then , Move the horizontal point of the table by obtaining the average value between the table boundary and the corresponding corner . In this paper ICDAR 2017 Page object detection dataset [64] On An experiment was carried out , And report with tradition Faster R-CNN Methods compared ,F-measure Added 2.8%.

c: Deformable convolution - DEFORMABLE CONVOLUTIONS

  Siddiquie wait forsomeone [46] stay 2018 Another method was proposed in , This is a Schreiber wait forsomeone [45] The follow-up work of . They use Faster R-CNN In the model Deformable convolution neural network [65] Performed the table detection task . The author claims that , Because there are various table layouts and scales in the document , The performance of deformable convolution is better than that of traditional convolution . their DeCNT Model in ICDAR-2013[66]、ICDAR-2017 POD[64]、UNLV[2] and Marmot[3] The most advanced results are shown on the data set .
  Agarwal wait forsomeone [49] Put forward a kind called CDeC-Net( Composite deformable cascade network - Composite Deformable Cascade Network) Method to detect the table boundary in the document image . In this work , The author empirically proves that , There is no need to add additional pre / Post processing technology to obtain the most advanced table test results . This work is based on a new cascade Mask R-CNN [67] as well as ( Composite trunk - composite backbone) network , Composite backbone network is a dual backbone network structure ( Two ResNeXt-101[68][69]. In the composite trunk , The author uses deformable convolution instead of traditional convolution , To solve the problem of detecting tables with arbitrary layout . By combining the deformable composite trunk and strong Cascade Mask R-CNN Combination , Their proposed system produced comparable results on several public data sets in the form community .

d: YOLO

  YOLO (You Only Look Once) [70] It's a famous model , It is used to effectively detect objects in real-world images ,Huang wait forsomeone [47] It is also applied to the table detection task .YOLO Different from the regional recommended approach , Because it handles the task of object detection more like regression , It's not a classification problem .YOLOv3[71] yes YOLO[70] The latest enhanced version of , Therefore, it is used in this experiment . To make predictions more accurate , The blank space will be removed from the table area of the forecast , At the same time, the noisy page objects are refined .

e: MASK R-CNN, YOLO, SSD , Retina Net

   Another research using target detection algorithms is CasadoGarcía wait forsomeone [72] Published “ The advantage of near field fine tuning of table detection in document image ”. After a detailed evaluation , The authors have shown that when fine tuning from a closer domain , The performance of table detection has improved . Using target detection algorithm , The author used Mask R-CNN[73]、YOLO[74]、SSD[75] and Retina Net[76]. For this experiment , Two basic data sets were selected . The first data set is PascalVOC[61], It contains natural landscape images , Not closely related to datasets in the table community . The second basic data set is TableBank[62], Among them is 41.7 10000 tag images , stay The fourth quarter, -G It is further explained in . Two separate models were trained on these data sets , And at all ICDAR Table game data sets and other data sets ( Such as Marmot and UNLV[2]) A comprehensive test was carried out on , Then these data sets are explained in Section 4 . This paper points out that , When using models , The average is up 17% Compared with the model trained on real images , Fine tune with a dataset closer to the domain .

f: Cascade MASK R-CNN

   With General spatial feature extraction network - generic spatial feature extraction networks[77]、[78] And target detection network [67]、[79] The latest improvements , We have seen significant improvements in the form detection system .Prasad wait forsomeone [48] Published CascadeTabNet, This is an end-to-end table detection and structure recognition method . In this work , The author uses Cascade Mask R-CNN[67]( This is a multi-level Mask R-CNN) And HRNet[77] A new hybrid as the basic network . This article makes use of [44] Similar areas proposed , Input the converted image into Strong cascade mask R-CNN[67], Instead of the original document image . Their proposed system can be used in ICDAR-2013[66]、ICDAR-2019[80] and TableBank[62] The most advanced results are achieved on the data set .
   In a recent job , Zheng et al [52] A framework for table detection and structure recognition in document images is published . The author thinks that , The proposed system GTE(Global Table Extractor) It is a general method based on vision , Any object detection algorithm can be used . This method provides the original document image to multiple object detectors , These detectors detect both tables and individual cells , To achieve accurate table detection . With the help of additional penalty loss and predicted cell boundary , The prediction table of target detector is further refined . This method further improves the predicted cell region , To solve the problem of table structure recognition , The first III-B This is explained in section .

2) Semantic image segmentation - SEMANTIC IMAGE SEGMENTATION

  2018 year ,Kavasidis The deep convolution neural network 、 Graphic models are combined with the concept of salient features , Used to test charts and tables [81]. The author thinks that , The task of detecting the table can be used as significance detection , Instead of using object detection networks . The model is based on semantic image segmentation technology . It first extracts salient features , Then classify each pixel , Whether the pixel belongs to the region of interest or not . In order to pay attention to long-term dependence , The model adopts extended convolution (dilated convolutions)[82]. Last , The generated saliency map is propagated to Fully connected conditional random field (CRF)[83], This further improves the prediction .

a: Full convolution neural network - FULLY CONVOLUTIONAL NETWORKS

   Supported by deep learning TableNet It's an end-to-end model , Used to detect and identify Paliwal wait forsomeone [84] Table structure in the provided document image . This method takes advantage of Fully convolutional network [85] The concept of , With pre trained VGG-19[60] Layer as the basic network . The author claims that , The problem of identifying table area and structure can be solved together similarly . They further showed how to use the ability of migration learning to improve the performance of new data sets .

3) Figure neural network - GRAPH NEURAL NETWORKS

   lately , We see Graphical neural networks The application in the field of table understanding is on the rise .Riba wait forsomeone [87] An experiment was carried out , Use graphical neural network to detect forms in invoice documents . Due to the limited amount of information available in the invoice image , The author believes that graphical neural network is more suitable for detecting table area . This article also publishes the original RVL-CDIP Data sets [89] A subset of tags , The dataset is publicly available .
  Holeček wait forsomeone [86] The idea of using graph volume to understand tables in structured documents such as invoices is put forward , Thus, the application of graph neural network is extended . The proposed study is also PDF On the document , However , The authors claim that the model is robust enough , Can handle other types of data sets . This study combines line item table detection with information extraction , Solved the problem of table detection . Use the line item method , Any word can easily distinguish whether it is part of a line item . After classifying all the words , It can effectively detect the table area , Because compared with other text areas in the invoice , The rows in the table are well separated .

4) Generative antagonistic network - GAN(GENERATIVE ADVERSARIAL NETWORKS)

   Generative countermeasures network (GAN)[90] Also used to identify tables . The proposed method [88] Make sure that the generated network does not see the difference between rule tables and fewer rule tables , And try to extract the same features in these two cases . And then , Feature generator and semantic segmentation model ( Such as Mask R-CNN[73] or U-net[91]) Combination . Will be based on GAN Feature generator and Mask R-CNN After the combination , stay ICDAR2017 POD The method is evaluated on the data set [64]. The author claims that , This method will help to solve other target detection and segmentation problems .

B. Table structure segmentation - TABLE STRUCTURAL SEGMENTATION( Not translated yet )

   Once the boundary of the table is detected , The next step is to identify rows and columns [29]. In this section , We will review recent attempts to address Table structure segmentation The way of the problem . We classify these methods according to the structure of deep neural network . surface 3 These methods are summarized by emphasizing their advantages and limitations . chart 6 The basic flow of table structure segmentation technology discussed in this paper is shown . At present, I only study the part of table positioning , So I only translated the first part , The second and third parts have not been translated yet .
 chart 6

chart 6: The basic flow of table structure segmentation and the methods used in the methods discussed . In order to identify the structure of the table , Tabular images are endowed with various deep neural structures , Instead of document images .

 surface 3

surface 3: The advantages and limitations of various table structure recognition methods based on deep learning are summarized . Bold horizontal lines separate different architectural methods .

C. Form identification - TABLE RECOGNITION( Not translated yet )

   As mentioned in Section 3 , The task of table recognition includes extracting table structure and extracting text from table cells . Relatively speaking , Less progress has been made in this particular area . In this section , We will introduce the recent experiment trying to solve the problem of table recognition . surface 4 These methods are summarized by emphasizing their advantages and limitations .
 surface 4

surface 4: The advantages and limitations of the method based on deep learning are summarized , These methods are only used for the table recognition task of scanning document images .

4. Data sets - DataSets

   The performance of deep neural network is directly related to the size of data set [45],[46]. In this section , We will discuss all the well-known data sets disclosed for dealing with table detection and table structure recognition in document images . surface 5 It comprehensively explains the detection used to perform and compare tables in document images 、 Structure segmentation and identification of all the above data sets . chart 8 Shows samples from some outstanding data sets in the table community .
 surface 5

surface 5: Tabular datasets .TD Indicates table detection ,TSR Indicates table structure recognition , and TR Indicates table recognition .

 chart 8

chart 8: The sample document image is taken from ICDAR-2013[66]、ICDAR-2017-POD[64]、UNLV[2] and UW3[112] Data set of . The red border represents the table area . The difference between samples in the data set is very obvious .

1.ICDAR-2013 Fine tuned ICDAR-13 Dataset URL

   International Conference on document analysis and recognition (ICDAR)2013[66] Table is the most famous data set among community researchers . The dataset is for 2013 year ICDAR The table competition organized by the conference was released . The dataset has annotations for table detection and table recognition . Data set from PDF The composition of the document , These files are usually converted into images , For various methods . The dataset contains structured tables 、 graphics 、 Charts and text as information . There are 238 Zhang image , among 128 This contains a table . The data set has been widely used to compare the most advanced methods . Such as surface 5 Described , This dataset is annotated for all three table understanding tasks discussed in this article . chart 8(A) Two samples in this dataset are shown .

2.ICDAR-2017-POD

   The dataset [64] Also proposed for 2017 year ICDAR Page object detection (POD) competition . This data set is widely used in evaluation table detection methods . This data set is better than ICDAR 2013 The tabular dataset is much larger . It consists of 2417 It's made up of pictures , Include forms 、 Formulas and numbers . in many instances , The data set is divided into 1600 Images (731 A table area ), Used for training , And the rest 817 Images (350 A table area ) Used for testing . A pair of instances of the dataset, such as chart 8(b) Shown . The dataset contains only surface 5 Table boundary information explained in .

3.UNLV

  UNLV Data sets [2] It is a recognized data set in the field of document image analysis . The dataset consists of scanned document images from different sources , Such as financial report 、 Journals and research papers with different table layouts . Although the dataset contains about 10000 Zhang image , But only 427 This image contains a table area . Usually , this 427 Images were used to conduct various experiments in the research community . This data set has been used in all three tabular analysis tasks discussed in this article . chart 8(c) Several samples in this dataset are shown .

4.UW3

  UW3[112] It is another popular data set for researchers in the field of document image analysis . This dataset contains scanned documents from books and magazines . There are about 1600 A scanned document image , Only one 165 An image has a table area . Annotated table coordinates are shown in XML Format display . Two samples in this dataset, such as chart 8(d) Shown . Although the data set has a limited number of table areas , But it has notes on all three table understanding issues discussed in this article .

5.ICDAR-2019

   lately , stay ICDAR 2019 Table detection and recognition (cTDaR)[80] The competition of . In the competition , Two new data sets are proposed : Modern data sets and historical data sets . Modern data sets contain data from scientific papers 、 Samples of forms and financial documents . Archival data sets include handwritten accounting ledgers 、 Train timetable 、 Simple forms in old books, printing and other images . In modern data sets , The specified train test division used for the test table is 600 An image for training ,240 An image for testing . Again , For historical data sets , The recommended data distribution is 600 Images and for training 199 An image for the test section . Such as surface 5 Shown , The dataset also contains information about table boundaries and cell range annotations . This novel data set is challenging in nature , Because it contains both modern and historical ( The archive ) Document image . This data set will be used to evaluate the robustness of tabular analysis methods . To understand diversity , chart 9 Two samples from historical and modern data sets are depicted in .
 chart 9

chart 9: from ICDAR-2019 Data sets [80] Examples of archived and modern document images obtained in , As the first 4-E Section . The red border represents the table area .

6.MARMOT Address

   not long ago , Marmot Is one of the largest public data sets , It is widely used by researchers in the field of table understanding . The data set was developed by the Institute of computer science and technology of Peking University (Institute of Computer Science and Technology, Peking University for short ) Put forward , Later, Fang et al [3] explain . The data set is composed of 1970 - 2011 Composition of Chinese and English conference papers in , share 2000 Images . Due to its diverse and very complex page layout , This data set is very useful for training networks . In the data set , The ratio of positive film to negative film is about 1:1. Some incorrect ground truth notes have been reported in the past ,Schreiber And others later cleaned it up [45]. Such as surface 5 Described , The dataset has annotations for table boundaries , It is widely used to train deep neural networks for table detection .

7. TableBank Address

  2019 Beginning of the year ,Li wait forsomeone [62] Realize that the tabular community needs large data sets , And released TableBank, The data set is composed of 41.7 Ten thousand marked images with table information . This data set is collected by crawling through available online documents in .docx Format . Another data source for this dataset is from arXiv Data collected by the database LaTeX The publisher of this dataset believes that , This contribution will help researchers take advantage of the power of deep learning and fine-tuning methods . The author claims that , The data set can be used for table detection and structure recognition tasks . However , We can't find annotations for structure recognition in the dataset . surface 5 Summarize the important information of the data set .

8.TabStructDB Address

   stay 2019 Year of ICDAR The meeting , Except for the table competition [80], Other researchers have also released new data sets in the field of tabular analysis .Siddiqui wait forsomeone [50] A post called TabStructDB Data set of . because ICDAR-2017-POD Data sets [64] Contains only information about table boundaries , The authors use this data set , And annotate it with structural information , This structural information includes the boundaries of rows and columns in the table . For consistency , The author also retains [80] The same dataset mentioned in . surface 5 Summarizes important information about the dataset . Because this dataset provides information about row and column boundaries , Therefore, it is convenient for researchers to regard the table structure recognition task as an object detection or semantic segmentation problem .

9.TABLE2LATEX-450K( Add link description

   Recent ICDAR Another large data set released at the meeting is TABLE2LA TEX-450K[109]. The dataset contains 45 Ten thousand annotated tables and their corresponding images . This huge data set is retrieved 1991 - 2016 Year of arXiv The article is constructed , be-all LaTeX The source files are downloaded . After source code extraction and subsequent refinement , Get high-quality labeled data sets . Such as surface 5 Described , The dataset contains comments for the structural segmentation of the table and the contents of the table cells . Except for datasets , The publisher also publishes all preprocessing scripts . This data set is an important contribution to solving the problems of table structure segmentation and table recognition in document images , Because it enables researchers to train large-scale deep learning architectures from scratch , These architectures can be further refined on relatively small data sets .

10.SciTSR Address

  SciTSR yes Chi And other people in 2019 Another data set released in [98]. According to the author , This is one of the largest public data sets , Used for table structure identification task . Data set from 15000 individual PDF The format of the table and its notes . The data set is obtained from arXiv To grab LaTeX The source file is built . about 25% The dataset consists of complex tables that span multiple rows or columns . The dataset has table structure segmentation and table identification annotation , Such as surface 5 Shown . Due to its complex table structure , This data set can be used to improve the most advanced system for structural segmentation and recognition of tables with complex layout .

11.DeepFigures Download link , Click to download

   As far as we know ,DeepFigures[4] Is the largest data set that can be exposed for performing table detection tasks . The dataset contains more than 140 Ten thousand documents and their corresponding tables and graphic borders . The author uses arXiv and PubMed Online scientific article development data set on Database . The basic facts of the data set are XML Format provided . Such as surface 5 Shown , The dataset contains only the bounding box of the table . In order to make full use of deep neural network to solve the problem of table detection , This large data set can be used as the basic data set , Realize the fine-tuning technology of closer domain .

12.RVL-CDIP (SUBSET) Address

  RVL-CDIP(Ryerson Vision Lab Complex Document Information Processing)[89] Is a well-known data set in the document analysis community . It contains 40 10000 images , The average is divided into 16 class .Riba wait forsomeone [87] By annotating its 518 Invoice , utilize RVL-CDIP Data sets . The dataset has been exposed for table detection tasks . The data set has only surface 5 Notes on table boundaries mentioned in . actual RVL-CDIP This subset of the dataset [89] It is an important contribution to the evaluation of form detection system specially designed for invoice document image .

13.PubTabNet Address

  PubTabNet yes Zhong wait forsomeone [32] On 2019 year 12 Another data set released in May .PubTabNet Is currently the largest public data set , Contains more than 56.8 Ten thousand images , Each cell has corresponding table and content structure information . The data set is obtained from PubMed CentralTM Open access subset (PMCOA) Collect scientific articles and create . The basic fact format of this data set is HTML, Can be used for web Applications . The authors believe that , The data set will improve the performance of the information extraction system in the table , They also plan to publish the basic facts of each table unit in the future . surface 5 Summarize the important information of the data set . And TABLE2LA TEX-450K Data sets [109] Together ,PubTabNet[32] It allows researchers to independently train the complete parameters of deep neural network in table structure extraction or table recognition tasks .

14.IIIT-AR-13K

   lately ,Mondal wait forsomeone [113] A new one called IIT-AR-13K The new dataset of , Contributed to the graphic page object detection community . The authors generated this data set by collecting public annual reports in English and other languages . The author claims that , This is the largest manual annotation data set published to solve the problem of object detection in graphic pages . In addition to the form , The data set also includes numbers 、 Natural image 、 Logo and signature notes . The publisher of the data set provides training for various tasks of page object detection 、 Verification and testing . For table detection ,11000 Samples for training , and 2000 And 3000 Samples are used for verification and testing .

15.CamCap

  CamCap This is the last data set we included in this survey , Composed of images taken by the camera . The data set is composed of Seo wait forsomeone [21] Put forward . The image on the surface is represented by 11647 Units make up , The image on the surface is represented by 12938 Units make up . chart 10 Contains several samples from this dataset , Illustrates these challenges . The proposed dataset is public , It can be used for table detection and table structure identification tasks , Such as surface 5 Shown . In order to evaluate the robustness of the table detection method to the document image captured by the camera , This data set is an important contribution . It is worth mentioning that , Kassem et al [96] Published a report from UNLV Data set synthesis method for creating camera captured images . chart 11 An example of a camera captured image created by synthesis is depicted in .
 chart 10

chart 10: from CamCap Data sets [21] An example of a real camera captured image taken in , See the first IV-O section . The red border represents the table area .

 chart 11

chart 11: An example of a camera captured image synthesized by a linear perspective transformation method [96].

5. evaluation - Evalution

   In this section , We will introduce the well-known Evaluation indicators , as well as In the third quarter Detailed evaluation and comparison of all methods cited in .

A. Evaluation indicators - EVALUATION METRICS

   Before clarifying the performance evaluation , Let's first discuss that the evaluation index is used to evaluate the performance of the method in question .

1) accuracy - PRECISION

 chart 12

chart 12:IOU Threshold set to 0.5 An example of the accuracy of target detection problem . The leftmost case is not considered accurate , The other two predictions are accurate , Because of their IOU Greater than 0.5. Green represents the basic facts , Red indicates the predicted bounding box .

   accuracy [114] Defined as the percentage of the predicted area that belongs to the ground truth . chart 12 Illustrations of different types of accuracy are explained in . The accuracy formula is as follows :
 Insert picture description here

2) Recall rate - RECALL

   Recall rate [114] The percentage of the predicted area of the ground is the actual area . The recall formula is explained as follows :
 Insert picture description here

3) F- fraction - F-MEASURE

  F-measure[114] It is calculated by the harmonic average of accuracy and recall rate .F- The formula of measurement is :
 Insert picture description here

4) INTERSECTION OVER UNION ( I O U IOU IOU )

   I O U IOU IOU[115] Is an important evaluation index , It is usually used to determine the performance of target detection algorithms . It is a measure of the degree of overlap between the predicted area and the actual ground truth area . Its definition is as follows :
 Insert picture description here

5) BLEU SCORE

  BLEU( Bilingual evaluation substitute )[116] It is an evaluation method used to compare various machine translation problems . After comparing the predicted text with the actual basic facts , score .BLEU Measurement will predict from 0 be assigned to 1 branch , among 1 Is the best score for predicting text .

B. Evaluation of form detection - EVALUATIONS FOR TABLE DETECTION

   The problem of table detection is to distinguish the table areas in the document image , And regress the coordinates of the bounding box classified as the table area . surface 6 Explains section 3-A section The performance comparison of various table detection methods discussed in detail . in the majority of cases , The performance of table detection method is ICDAR-2013[66]、ICDAR-2017-POD[64] and UNLV[2] Evaluated on the dataset .
 surface 6

surface 6: Test performance comparison . The double horizontal line divides the results obtained on various data sets . Highlights the results in all relevant data sets . about ICDAR-2019 Data sets [80], None of the three methods is directly comparable , Because they report different IOU Threshold F-Measure. therefore ,ICDAR-2019 The results on the dataset are not highlighted .

   surface 6 A joint intersection threshold for calculating accuracy and recall is also defined ( I O U IOU IOU ). chart 13 The definition of accurate and imprecise prediction of table detection task is explained . Highlights the results with the highest accuracy in all relevant data sets . It is worth mentioning that , Some methods do not reference I O U IOU IOU The threshold of ; However , They compared the results with other methods of defining thresholds . therefore , We consider the same threshold for these procedures .
 chart 13

chart 13: About the accuracy example of table detection task . Green represents the basic facts , And red represents the predicted table area . In the first case , The prediction is not accurate , Because there is a difference between the predicted bounding box and the ground truth IOU Less than 0.5. The table on the right is accurate , Because it covers almost the entire table area .

   We cannot include Holeček wait forsomeone [86] The literature results provided , Because they didn't use any standard data sets for comparison , The new method is compared with logistic regression [117]. It turns out that , Their model has surpassed logistic Regression method .
  Qasim wait forsomeone [96] Another method is in the 3-A3 This is explained in section , The method does not use any known data set to evaluate its method . However , They use [118] and [119] Two types of graph neural networks tested their methods on synthetic data sets . In addition to graphical neural networks , The complete convolution neural network is also used for fair comparison . After detailed evaluation , The fusion of graphical neural network and convolutional neural network has surpassed all other methods , have 96.9 Perfect matching accuracy . We only use graphical neural networks to provide 65.6 Perfect matching accuracy , It still exceeds the accuracy of the method using only complete convolution neural network .

C. Evaluation of table structure segmentation - EVALUATIONS FOR TABLE STRUCTURAL SEGMENTATION

 chart 14

chart 14: About the precision example of table structure segmentation task . Green means ground truth, And red represents the predicted bounding box . For the sake of simplicity , The detection accuracy of rows and columns are displayed separately . In the example shown IOU The threshold is considered to be 0.5.

   The task of table structure segmentation is evaluated according to the accuracy of the separation of rows or columns of the table [45]、[45]、[50]. chart 14 The meaning of imprecise and accurate prediction for both row and column detection tasks is shown . surface 7 Summed up in ICDAR 2013 Performance comparison of various methods for performing table structure segmentation task on table competition data set [66].
 surface 7

surface 7: Structural sectional performance . Highlighted outstanding results . The results of the last two lines are not directly comparable with other methods , Because it uses PDF File instead of document image .

 surface 8

surface 8: surface ICDAR-2019 Structure segmentation performance of data set [80]. For simplicity , These results are listed separately in this table .

   lately , The problem of table structure recognition has been in the accurate prediction of cell boundaries in table images [48]、[52]、[94] It was evaluated on . As with the previous method [45]、[45]、[50] Different , We're alone surface 8 The results of these methods are given in .
   The results with the highest accuracy are highlighted in the table . It is worth mentioning that , except surface 7 And table 8 In addition to the methods mentioned in , There are two other methods in Chapter 3-B Section discusses . We cannot incorporate its results into surface 7, Because these methods are not evaluated on any standard data set , There are no standard evaluation indicators . However , Their results will be explained in the next paragraph .
  TableBank The creator of the [62] A baseline model for table structure segmentation and table detection is proposed . To test what they're doing TableBank The table structure on the dataset identifies the performance of the baseline model , They adopted 4-gramBLEU fraction [116] As an evaluation indicator . It turns out that , stay Word+Latex When training their images to text models on data sets ,BLEU The score is 0.7382, Generalization is better in all cases .

D. Evaluation of form identification - EVALUATIONS FOR TABLE RECOGNITION

   Table recognition includes dividing table structure and extracting information from cells . In this section , We will introduce paragraph... Above 3-C The evaluation of the two methods discussed in section .
   When studying the challenges of end-to-end neuroscience table recognition , Author Deng et al [109] stay TABLE2LA TEX-450K Their image to text model was tested on the data set . The model is obtained 32.40% Accurate matching accuracy ,BLEU The score is 40.33. The author also studies the model , The model can well identify the structure of the table . It has been concluded that , The model has multiple columns ( That's ok ) Problems encountered in the case of complex structures .
   Zhong et al [32] Another study also experimented with the table recognition task . To evaluate the observations , They put forward their own evaluation indicators , be called TEDS, Where similarity is the use of Pawlik and Augsten[120] The calculation method of editing distance of the same tree is proposed . Their encoder - Dual decoder (EDD) Model in PubTabNet The data set is marked with 88.3% Of TEDS Scores beat all other baseline models .
   surface 9 The results of these two methods are summarized . It is worth mentioning that , Due to the different data sets and evaluation indicators used in these technologies , Therefore, the proposed methods cannot be directly compared with each other .
 surface 9

surface 9: Table recognition performance . Due to the use of different data sets and evaluation indicators , The results mentioned in this table cannot be directly compared with each other .

6. Conclusion - Conclusion

   In the field of document analysis , Table analysis is a very important and well studied problem . The development of deep learning concept has greatly changed the problem of table understanding , And set new standards . In this review article , We discussed some recent modern procedures , These programs apply the concept of deep learning to complete the task of extracting information from tables in document images . In the third section , We explained the use of deep learning to perform table detection 、 Method of structure segmentation and recognition . chart 5 Sum graph 7 Are shown separately for Table detection and structure segmentation are the most famous and least famous Methods . We are surface 5 All publicly available data sets and their access information are summarized in . stay surface 6、7、8 and 9 in , We make a detailed performance comparison of the methods discussed on various data sets . We have discussed the most advanced methods for table detection on known public data sets , Almost perfect results have been achieved . Once the table area is detected , Next, we need to carry out table structure segmentation and table recognition . After studying several recent methods , We think , There is still room for improvement in these two areas .

7. Future work - Future Work

   When analyzing and comparing various methods , We noticed some aspects that need to be emphasized , In order to be considered in future work . For table detection , One of the most used evaluation indicators is IOU[45],[46]. Most of the methods discussed in this paper are based on accuracy 、 Recall rate and F- On the basis of measurement , The method is compared with the most advanced method before [114]. These three indicators are based on the specific data determined by the author IOU Threshold calculation . We firmly believe that ,IOU The threshold needs to be standardized , In order to make a fair comparison . Another important factor we mentioned is , When comparing different research methods , We found another important factor about performance . In a few cases , Semantic segmentation is proved to be superior to other table structure segmentation methods in accuracy . However , The description of execution time is not obvious .
   up to now , Traditional methods have been used to detect tables from document images captured by cameras [21]. The power of deep learning methods can be used to improve the most advanced table analysis system in this field . Deep learning takes advantage of huge data sets [45]. lately , A large number of publicly available data sets have been published [32]、[62]、[98], These datasets not only provide annotations for table structure extraction , It also provides comments for table detection . We hope that these contemporary data sets will be tested . Through the integration of various deep learning concepts with recently published data sets , It can further enhance the results of table segmentation and recognition methods . As far as we know , Reinforcement learning [121],[122] Has not been studied in the field of table analysis , However, there are some problems in extracting information from document images [123]. For all that , For form detection and recognition , This is also an exciting and promising future direction .

reference - Reference

Reference
[1] S. Sarawagi, ‘‘Information extraction,’’ Databases, vol. 1, no. 3, pp. 261–377, 2007.
[2] A. Shahab, F. Shafait, T. Kieninger, and A. Dengel, ‘‘An open approach towards the benchmarking of table structure recognition systems,’’ in Proc. 8th IAPR Int. Workshop Document Anal. Syst. (DAS), 2010, pp. 113–120.
[3] J. Fang, X. Tao, Z. Tang, R. Qiu, and Y . Liu, ‘‘Dataset, ground-truth and performance metrics for table detection evaluation,’’ in Proc. 10th IAPR Int. Workshop Document Anal. Syst., Mar. 2012, pp. 445–449.
[4] Y .-S. Kim and K.-H. Lee, ‘‘Extracting logical structures from HTML tables,’’ Comput. Standards Interfaces, vol. 30, no. 5, pp. 296–308, Jul. 2008.
[5] H.-H. Chen, S.-C. Tsai, and J.-H. Tsai, ‘‘Mining tables from large scale HTML texts,’’ in Proc. 18th Int. Conf. Comput. Linguistics (COLING), vol. 1, 2000, pp. 166–172.
[6] H. Masuda, S. Tsukamoto, S. Yasutomi, and H. Nakagawa, ‘‘Recognition of HTML table structure,’’ in Proc. 1st Int. Joint Conf. Natural Lang. Process. (IJCNLP), 2004, pp. 183–188.
[7] C.-Y . Tyan, H. K. Huang, and T. Niki, ‘‘Generator for document with html tagged table having data elements which preserve layout relationships of information in bitmap image of original document,’’ U.S. Patent 5 893 127, Apr. 6, 1999.
[8] T. Kieninger and A. Dengel, ‘‘A paper-to-HTML table converting system,’’ in Proc. Document Anal. Sys. (DAS), vol. 98, 1998, pp. 356–365.
[9] T. G. Kieninger, ‘‘Table structure recognition based on robust block segmentation,’’ Document Recognit. V, vol. 3305, pp. 22–32, Apr. 1998.
[10] T. Kieninger and A. Dengel, ‘‘Applying the T-RECS table recognition system to the business letter domain,’’ in Proc. 6th Int. Conf. Document Anal. Recognit., 2001, pp. 518–522.
[11] F. Cesarini, S. Marinai, L. Sarti, and G. Soda, ‘‘Trainable table location in document images,’’ in Proc. Object Recognit. Supported User Interact. Service Robots, vol. 3, 2002, pp. 236–240.
[12] A. C. E. Silva, ‘‘Learning rich hidden Markov models in document analysis: Table location,’’ in Proc. 10th Int. Conf. Document Anal. Recognit., 2009, pp. 843–847.
[13] C. Cortes and V . V apnik, ‘‘Support-vector networks,’’ Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
[14] T. Kasar, P . Barlas, S. Adam, C. Chatelain, and T. Paquet, ‘‘Learning to detect tables in scanned document images using line information,’’ in Proc. 12th Int. Conf. Document Anal. Recognit., Aug. 2013, pp. 1185–1189.
[15] M. Fan and D. S. Kim, ‘‘Detecting table region in PDF documents using distant supervision,’’ 2015, arXiv:1506.08891. [Online]. Available: http://arxiv.org/abs/1506.08891
[16] D. N. Tran, T. A. Tran, A. Oh, S. H. Kim, and I. S. Na, ‘‘Table detection from document image using vertical arrangement of text blocks,’’ Int. J. Contents, vol. 11, no. 4, pp. 77–85, Dec. 2015.
[17] Y . Wang, I. T. Phillips, and R. M. Haralick, ‘‘Table structure understanding and its performance evaluation,’’ Pattern Recognit., vol. 37, no. 7, pp. 1479–1497, Jul. 2004.
[18] G. Nagy, ‘‘Hierarchical representation of optically scanned documents,’’ in Proc. 7th Int. Conf. Pattern Recognit., 1984, pp. 347–349.
[19] A. Shigarov, A. Mikhailov, and A. Altaev, ‘‘Configurable table structure recognition in untagged PDF documents,’’ in Proc. ACM Symp. Document Eng., Sep. 2016, pp. 119–122.
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 25, Dec. 2012, pp. 1097–1105.
[21] W. Seo, H. I. Koo, and N. I. Cho, ‘‘Junction-based table detection in camera-captured document images,’’ Int. J. Document Anal. Recognit., vol. 18, no. 1, pp. 47–57, Mar. 2015.
[22] E. R. Dougherty, Electronic Imaging Technology, vol. 60. Bellingham, W A, USA: SPIE, 1999.
[23] J. C. Handley, ‘‘Table analysis for multiline cell identification,’’ Document Recognit. Retrieval VIII, vol. 4307, pp. 34–43, Dec. 2000.
[24] D. P . Lopresti and G. Nagy, ‘‘A tabular survey of automated table processing,’’ in Proc. Sel. 3rd Int. Workshop Graph. Recognit. Recent Adv., 1999, pp. 93–120.
[25] D. Lopresti and G. Nagy, ‘‘Automated table processing,’’ in Proc. 3rd Int. Workshop, Graph. Recognit. Recent Adv., no. 1941, 2000, p. 93.
[26] R. Zanibbi, D. Blostein, and J. Cordy, ‘‘A survey of table recognition,’’ Document Anal. Recognit., vol. 7, no. 1, pp. 1–16, Mar. 2004.
[27] D. W. Embley, M. Hurst, D. Lopresti, and G. Nagy, ‘‘Table-processing paradigms: A research survey,’’ Int. J. Document Anal. Recognit., vol. 8, nos. 2–3, pp. 66–86, Jun. 2006.
[28] M. F. Hurst, ‘‘The interpretation of tables in texts,’’ Ph.D. dissertation, Univ. Edinburgh, Edinburgh, U.K., 2000.
[29] A. C. e Silva, A. M. Jorge, and L. Torgo, ‘‘Design of an end-to-end method to extract information from tables,’’ Int. J. Document Anal. Recognit., vol. 8, nos. 2–3, pp. 144–171, Jun. 2006.
[30] B. Coüasnon and A. Lemaitre, ‘‘Recognition of tables and forms,’’ in Handbook of Document Image Processing and Recognition, D. Doermann and K. Tombre, Eds. London, U.K.: Springer, 2014, pp. 647–677.
[31] S. Khusro, A. Latif, and I. Ullah, ‘‘On methods and tools of table detection, extraction and annotation in PDF documents,’’ J. Inf. Sci., vol. 41, no. 1, pp. 41–57, Feb. 2015.
[32] X. Zhong, E. ShafieiBavani, and A. J. Yepes, ‘‘Image-based table recognition: Data, model, and evaluation,’’ 2019, arXiv:1911.10683. [Online]. Available: http://arxiv.org/abs/1911.10683
[33] J. Hu, R. S. Kashi, D. Lopresti, and G. T. Wilfong, ‘‘Evaluating the performance of table processing algorithms,’’ Int. J. Document Anal. Recognit., vol. 4, no. 3, pp. 140–153, Mar. 2002.
[34] L. Hao, L. Gao, X. Yi, and Z. Tang, ‘‘A table detection method for PDF documents based on convolutional neural networks,’’ in Proc. 12th IAPR Workshop Document Anal. Syst. (DAS), Apr. 2016, pp. 287–292.
[35] L. Torrey and J. Shavlik, ‘‘Transfer learning,’’ in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. Hershey, PA, USA: IGI Global, 2010, pp. 242–264.
[36] Y . Zhu, Y . Chen, Z. Lu, S. Pan, G.-R. Xue, Y . Y u, and Q. Yang, ‘‘Heterogeneous transfer learning for image classification,’’ in Proc. AAAI Conf. Artif. Intell., 2011, vol. 25, no. 1, pp. 1304–1309.
[37] B. Kulis, K. Saenko, and T. Darrell, ‘‘What you saw is not what you get: Domain adaptation using asymmetric kernel transforms,’’ in Proc. CVPR, Jun. 2011, pp. 1785–1792.
[38] C. Wang and S. Mahadevan, ‘‘Heterogeneous domain adaptation using manifold alignment,’’ in Proc. IJCAI, 2011, vol. 22, no. 1, p. 1541.
[39] W. Li, L. Duan, D. Xu, and I. W. Tsang, ‘‘Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 6, pp. 1134–1148, Jun. 2014.
[40] M. Loey, F. Smarandache, and N. E. M. Khalifa, ‘‘Within the lack of chest COVID-19 X-ray dataset: A novel detection model based on GAN and deep transfer learning,’’ Symmetry, vol. 12, no. 4, p. 651, Apr. 2020.
[41] M. Z. Afzal, S. Capobianco, M. I. Malik, S. Marinai, T. M. Breuel, A. Dengel, and M. Liwicki, ‘‘Deepdocclassifier: Document classification with deep convolutional neural network,’’ in Proc. 13th Int. Conf. Document Anal. Recognit. (ICDAR), Aug. 2015, pp. 1111–1115.
[42] A. Das, S. Roy, U. Bhattacharya, and S. K. Parui, ‘‘Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks,’’ in Proc. 24th Int. Conf. Pattern Recognit. (ICPR), Aug. 2018, pp. 3180–3185.
[43] T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P . Perona, D. Ramanan, P . Dollár, and C. L. Zitnick, ‘‘Microsoft COCO: Common objects in context,’’ in Proc. Eur . Conf. Comput. Vis. Cham, Switzerland: Springer, 2014, pp. 740–755.
[44] A. Gilani, S. R. Qasim, I. Malik, and F. Shafait, ‘‘Table detection using deep learning,’’ in Proc. 14th IAPR Int. Conf. Document Anal. Recognit. (ICDAR), Nov. 2017, pp. 771–776.
[45] S. Schreiber, S. Agne, I. Wolf, A. Dengel, and S. Ahmed, ‘‘DeepDeSRT: Deep learning for detection and structure recognition of tables in document images,’’ in Proc. 14th IAPR Int. Conf. Document Anal. Recognit. (ICDAR), vol. 1, Nov. 2017, pp. 1162–1167.
[46] S. A. Siddiqui, M. I. Malik, S. Agne, A. Dengel, and S. Ahmed, ‘‘DeCNT: Deep deformable CNN for table detection,’’ IEEE Access, vol. 6, pp. 74151–74161, 2018.
[47] Y . Huang, Q. Yan, Y . Li, Y . Chen, X. Wang, L. Gao, and Z. Tang, ‘‘A YOLO-based table detection method,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 813–818.
[48] D. Prasad, A. Gadpal, K. Kapadni, M. Visave, and K. Sultanpure, ‘‘CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2020, pp. 572–573.
[49] M. Agarwal, A. Mondal, and C. V . Jawahar, ‘‘CDeC-Net: Composite deformable cascade network for table detection in document images,’’ 2020, arXiv:2008.10831. [Online]. Available: http://arxiv. org/abs/2008.10831
[50] S. A. Siddiqui, I. A. Fateh, S. T. R. Rizvi, A. Dengel, and S. Ahmed, ‘‘DeepTabStR: Deep learning based table structure recognition,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 1403–1409.
[51] K. A. Hashmi, D. Stricker, M. Liwicki, M. N. Afzal, and M. Z. Afzal, ‘‘Guided table structure recognition through anchor optimization,’’ 2021, arXiv:2104.10538. [Online]. Available: http://arxiv.org/abs/2104.10538
[52] X. Zheng, D. Burdick, L. Popa, X. Zhong, and N. X. R. Wang, ‘‘Global table extractor (GTE): A framework for joint table identification and cell structure recognition using visual context,’’ in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., Jan. 2021, pp. 697–706.
[53] S. Raja, A. Mondal, and C. Jawahar, ‘‘Table structure recognition using top-down and bottom-up cues,’’ in Proc. Eur . Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 70–86.
[54] R. Girshick, ‘‘Fast R-CNN,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1440–1448.
[55] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-time object detection with region proposal networks,’’ 2015, arXiv:1506.01497. [Online]. Available: http://arxiv.org/abs/1506.01497
[56] H. Breu, J. Gil, D. Kirkpatrick, and M. Werman, ‘‘Linear time Euclidean distance transform algorithms,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 5, pp. 529–533, May 1995.
[57] R. Fabbri, L. D. F. Costa, J. C. Torelli, and O. M. Bruno, ‘‘2D Euclidean distance transform algorithms: A comparative survey,’’ ACM Comput. Surveys, vol. 40, no. 1, pp. 1–44, Feb. 2008.
[58] I. Ragnemalm, ‘‘The Euclidean distance transform in arbitrary dimensions,’’ Pattern Recognit. Lett., vol. 14, no. 11, pp. 883–888, Nov. 1993.
[59] M. D. Zeiler and R. Fergus, ‘‘Visualizing and understanding convolutional networks,’’ in Proc. Eur . Conf. Comput. Vis. Cham, Switzerland: Springer, 2014, pp. 818–833.
[60] K. Simonyan and A. Zisserman, ‘‘V ery deep convolutional networks for large-scale image recognition,’’ 2014, arXiv:1409.1556. [Online]. Available: http://arxiv.org/abs/1409.1556
[61] M. Everingham, L. V an Gool, C. K. I. Williams, J. Winn, and A. Zisserman, ‘‘The Pascal visual object classes (VOC) challenge,’’ Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, Jun. 2010.
[62] M. Li, L. Cui, S. Huang, F. Wei, M. Zhou, and Z. Li, ‘‘TableBank: Table benchmark for image-based table detection and recognition,’’ in Proc. 12th Lang. Resour . Eval. Conf., 2020, pp. 1918–1925.
[63] N. Sun, Y . Zhu, and X. Hu, ‘‘Faster R-CNN based table detection combining corner locating,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 1314–1319.
[64] L. Gao, X. Yi, Z. Jiang, L. Hao, and Z. Tang, ‘‘ICDAR2017 competition on page object detection,’’ in Proc. 14th IAPR Int. Conf. Document Anal. Recognit. (ICDAR), vol. 1, Nov. 2017, pp. 1417–1422.
[65] J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, and Y . Wei, ‘‘Deformable convolutional networks,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 764–773.
[66] M. Gobel, T. Hassan, E. Oro, and G. Orsi, ‘‘ICDAR 2013 table competition,’’ in Proc. 12th Int. Conf. Document Anal. Recognit., Aug. 2013, pp. 1449–1453.
[67] Z. Cai and N. V asconcelos, ‘‘Cascade R-CNN: Delving into high quality object detection,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6154–6162.
[68] S. Xie, R. Girshick, P . Dollár, Z. Tu, and K. He, ‘‘Aggregated residual transformations for deep neural networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1492–1500.
[69] Y . Liu, Y . Wang, S. Wang, T. Liang, Q. Zhao, Z. Tang, and H. Ling, ‘‘Cbnet: A novel composite backbone network architecture for object detection,’’ in Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 7, pp. 11653–11660.
[70] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘Y ou only look once: Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788.
[71] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’ 2018, arXiv:1804.02767. [Online]. Available: http://arxiv.org/abs/ 1804.02767
[72] Á. Casado-García, C. Domínguez, J. Heras, E. Mata, and V . Pascual, ‘‘The benefits of close-domain fine-tuning for table detection in document images,’’ in Proc. Int. Workshop Document Anal. Syst. Cham, Switzerland: Springer, 2020, pp. 199–215.
[73] K. He, G. Gkioxari, P . Dollár, and R. Girshick, ‘‘Mask R-CNN,’’ in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2961–2969.
[74] Y . Deng, A. Kanervisto, J. Ling, and A. M. Rush, ‘‘Image-to-markup generation with coarse-to-fine attention,’’ vol. 10, 2016, arXiv:1609.04938. [Online]. Available: http://arxiv.org/abs/1609.04938
[75] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg, ‘‘SSD: Single shot multibox detector,’’ in Proc. Eur . Conf. Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 21–37.
[76] T.-Y . Lin, P . Goyal, R. Girshick, K. He, and P . Dollár, ‘‘Focal loss for dense object detection,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2980–2988.
[77] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y . Zhao, D. Liu, Y . Mu, M. Tan, X. Wang, W. Liu, and B. Xiao, ‘‘Deep high-resolution representation learning for visual recognition,’’ IEEE Trans. Pattern Anal. Mach. Intell., early access, Apr. 1, 2020, doi: 10.1109/TPAMI.2020.2983686.
[78] S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y . Zhang, M.-H. Yang, and P . Torr, ‘‘Res2Net: A new multi-scale backbone architecture,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 2, pp. 652–662, Feb. 2021.
[79] K. Chen, W. Ouyang, C. C. Loy, D. Lin, J. Pang, J. Wang, Y . Xiong, X. Li, S. Sun, W. Feng, Z. Liu, and J. Shi, ‘‘Hybrid task cascade for instance segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4974–4983.
[80] L. Gao, Y . Huang, H. Déjean, J.-L. Meunier, Q. Yan, Y . Fang, F. Kleber, and E. Lang, ‘‘ICDAR 2019 competition on table detection and recognition (cTDaR),’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 1510–1515.
[81] I. Kavasidis, S. Palazzo, C. Spampinato, C. Pino, D. Giordano, D. Giuffrida, and P . Messina, ‘‘A saliency-based convolutional neural network for table and chart detection in digitized documents,’’ 2018, arXiv:1804.06236. [Online]. Available: http://arxiv.org/abs/1804.06236
[82] F. Y u and V . Koltun, ‘‘Multi-scale context aggregation by dilated convolutions international conference on learning representations (ICLR) 2016,’’ Tech. Rep., 2016.
[83] P . Krähenbühl and V . Koltun, ‘‘Efficient inference in fully connected crfs with Gaussian edge potentials,’’ in Proc. Adv. Neural Inf. Process. Sys., vol. 24, 2011, pp. 109–117.
[84] S. S. Paliwal, V . D, R. Rahul, M. Sharma, and L. Vig, ‘‘TableNet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 128–133.
[85] J. Long, E. Shelhamer, and T. Darrell, ‘‘Fully convolutional networks for semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 3431–3440.
[86] M. Holecek, A. Hoskovec, P . Baudiš, and P . Klinger, ‘‘Table understanding in structured documents,’’ in Proc. Int. Conf. Document Anal. Recognit. Workshops (ICDARW), Sep. 2019, pp. 158–164.
[87] P . Riba, A. Dutta, L. Goldmann, A. Fornés, O. Ramos, and J. Llados, ‘‘Table detection in invoice documents by graph neural networks,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 122–127.
[88] Y . Li, L. Gao, Z. Tang, Q. Yan, and Y . Huang, ‘‘A GAN-based feature generator for table detection,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 763–768.
[89] A. W. Harley, A. Ufkes, and K. G. Derpanis, ‘‘Evaluation of deep convolutional nets for document image classification and retrieval,’’ in Proc. 13th Int. Conf. Document Anal. Recognit. (ICDAR), Aug. 2015, pp. 991–995.
[90] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, ‘‘Generative adversarial networks,’’ 2014, arXiv:1406.2661. [Online]. Available: http://arxiv. org/abs/1406.2661
[91] O. Ronneberger, P . Fischer, and T. Brox, ‘‘U-net: Convolutional networks for biomedical image segmentation,’’ in Proc. Int. Conf. Med. Image Comput. Compt. Intervent. Cham, Switzerland: Springer, 2015, pp. 234–241.
[92] S. A. Siddiqui, P . I. Khan, A. Dengel, and S. Ahmed, ‘‘Rethinking semantic segmentation for table structure recognition in documents,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 1397–1402.
[93] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, ‘‘ImageNet large scale visual recognition challenge,’’ Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015.
[94] Y . Zou and J. Ma, ‘‘A deep semantic segmentation model for image-based table structure recognition,’’ in Proc. 15th IEEE Int. Conf. Signal Process. (ICSP), vol. 1, Dec. 2020, pp. 274–280.
[95] M. B. Dillencourt, H. Samet, and M. Tamminen, ‘‘A general approach to connected-component labeling for arbitrary image representations,’’ J. ACM, vol. 39, no. 2, pp. 253–280, Apr. 1992.
[96] S. R. Qasim, H. Mahmood, and F. Shafait, ‘‘Rethinking table recognition using graph neural networks,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 142–147.
[97] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, ‘‘The graph neural network model,’’ IEEE Trans. Neural Netw., vol. 20, no. 1, pp. 61–80, Jan. 2009.
[98] Z. Chi, H. Huang, H.-D. Xu, H. Y u, W. Yin, and X.-L. Mao, ‘‘Complicated table structure recognition,’’ 2019, arXiv:1908.04729. [Online]. Available: http://arxiv.org/abs/1908.04729
[99] W. Xue, Q. Li, and D. Tao, ‘‘ReS2TIM: Reconstruct syntactic structures from table images,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 749–755.
[100] W. Xue, Q. Li, Z. Zhang, Y . Zhao, and H. Wang, ‘‘Table analysis and information extraction for medical laboratory reports,’’ in Proc. IEEE 16th Int. Conf Dependable, Autonomic Secure Comput., 16th Int. Conf Pervas. Intell. Comput., 4th Int. Conf Big Data Intell. Comput. Cyber Sci. Technol. Congr . (DASC/PiCom/DataCom/CyberSciTech), Aug. 2018, pp. 193–199.
[101] C. Tensmeyer, V . I. Morariu, B. Price, S. Cohen, and T. Martinez, ‘‘Deep splitting and merging for table structure decomposition,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 114–121.
[102] S. A. Khan, S. M. D. Khalid, M. A. Shahzad, and F. Shafait, ‘‘Table structure extraction with bi-directional gated recurrent unit networks,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 1366–1371.
[103] J. Chung, C. Gulcehre, K. Cho, and Y . Bengio, ‘‘Empirical evaluation of gated recurrent neural networks on sequence modeling,’’ 2014, arXiv:1412.3555. [Online]. Available: http://arxiv.org/abs/1412.3555
[104] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[105] G. Klein, Y . Kim, Y . Deng, J. Senellart, and A. M. Rush, ‘‘OpenNMT: Open-source toolkit for neural machine translation,’’ 2017, arXiv:1701.02810. [Online]. Available: http://arxiv.org/abs/1701.02810
[106] J. Wang, K. Chen, S. Yang, C. C. Loy, and D. Lin, ‘‘Region proposal by guided anchoring,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 2965–2974.
[107] F. Y u and V . Koltun, ‘‘Multi-scale context aggregation by dilated convolutions,’’ 2015, arXiv:1511.07122. [Online]. Available: http://arxiv.org/ abs/1511.07122
[108] T. N. Kipf and M. Welling, ‘‘Semi-supervised classification with graph convolutional networks,’’ 2016, arXiv:1609.02907. [Online]. Available: http://arxiv.org/abs/1609.02907
[109] Y . Deng, D. Rosenberg, and G. Mann, ‘‘Challenges in end-to-end neural scientific table recognition,’’ in Proc. Int. Conf. Document Anal. Recognit. (ICDAR), Sep. 2019, pp. 894–901.
[110] Y . Deng, A. Kanervisto, J. Ling, and A. M. Rush, ‘‘Image-to-markup generation with coarse-to-fine attention,’’ in Proc. Int. Conf. Mach. Learn., 2017, pp. 980–989.
[111] S. F. Rashid, A. Akmal, M. Adnan, A. A. Aslam, and A. Dengel, ‘‘Table recognition in heterogeneous documents using machine learning,’’ in Proc. 14th IAPR Int. Conf. Document Anal. Recognit. (ICDAR), vol. 1, Nov. 2017, pp. 777–782.
[112] I. Phillips, ‘‘User’s reference manual for the UW English/technical document image database III,’’ UW-III English/Tech. Document Image Database Manual, Univ. Washington English Document Image Database, Washington, DC, USA, 1996.
[113] A. Mondal, P . Lipps, and C. Jawahar, ‘‘IIIT-AR-13K: A new dataset for graphical object detection in documents,’’ in Proc. Int. Workshop Document Anal. Syst. Cham, Switzerland: Springer, 2020, pp. 216–230.
[114] D. M. W. Powers, ‘‘Evaluation: From precision, recall and Fmeasure to ROC, informedness, markedness and correlation,’’ 2020, arXiv:2010.16061. [Online]. Available: http://arxiv.org/abs/2010.16061
[115] M. B. Blaschko and C. H. Lampert, ‘‘Learning to localize objects with structured output regression,’’ in Proc. Eur . Conf. Comput. Vis. Cham, Switzerland: Springer, 2008, pp. 2–15.
[116] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, ‘‘BLEU: A method for automatic evaluation of machine translation,’’ in Proc. 40th Annu. Meeting Assoc. Comput. Linguistics, 2002, pp. 311–318.
[117] D. G. Kleinbaum, K. Dietz, M. Gail, M. Klein, and M. Klein, Logistic Regression. New Y ork, NY , USA: Springer-V erlag, 2002.
[118] Y . Wang, Y . Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, ‘‘Dynamic graph CNN for learning on point clouds,’’ ACM Trans. Graph., vol. 38, no. 5, pp. 1–12, Nov. 2019.
[119] S. R. Qasim, J. Kieseler, Y . Iiyama, and M. Pierini, ‘‘Learning representations of irregular particle-detector geometry with distanceweighted graph networks,’’ Eur . Phys. J. C, vol. 79, no. 7, pp. 1–11, Jul. 2019.
[120] M. Pawlik and N. Augsten, ‘‘Tree edit distance: Robust and memoryefficient,’’ Inf. Syst., vol. 56, pp. 157–173, Mar. 2016.
[121] V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. V eness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, and S. Petersen, ‘‘Human-level control through deep reinforcement learning,’’ Nature, vol. 518, pp. 529–533, 2015.
[122] A. G. Barto, ‘‘Reinforcement learning,’’ in Neural Systems for Control. Amsterdam, The Netherlands: Elsevier, 1997, pp. 7–30.
[123] J. Park, E. Lee, Y . Kim, I. Kang, H. I. Koo, and N. I. Cho, ‘‘Multi-lingual optical character recognition system using the reinforcement learning of character segmenter,’’ IEEE Access, vol. 8, pp. 174437–174448, 2020.

原网站

版权声明
本文为[Zheng Jianyu JY]所创,转载请带上原文链接,感谢
https://chowdera.com/2022/134/202205141344315793.html

随机推荐