当前位置：网站首页>ICDAR 2021 competition scientific literature analysis - table identification summary (the rest is document layout analysis)
ICDAR 2021 competition scientific literature analysis - table identification summary (the rest is document layout analysis)
2022-05-14 14:00:25【Zheng Jianyu JY】
Mission B For the table identification part , This article only looks at table recognition
Abstract （ Is not important , If you want to look directly at the table recognition part, you can skip ）.
The scientific literature contains important information about cutting-edge innovations in different fields . The development of automatic document processing has promoted the rapid development of natural language information processing . However , Scientific literature is usually in the form of
Unstructured PDF Format Provide . although PDF Very suitable for saving basic visual elements on canvas , Such as character 、 line 、 Shape, etc. , To present to humans , But the machine is right PDF Automatic format processing brings many challenges . There are more than 2.5 One trillion PDF file , These problems are also common in many other important applications .
A key challenge in automatically extracting information from scientific literature is , Documents usually contain
Unnatural language The content of , Such as
Graphics and tables . However , These contents usually illustrate the key results of research 、 Information or summary . In order to fully understand the scientific literature , The automation system must be able to recognize the layout of documents , And parse the non natural language content into machine-readable format . our ICDAR 2021 Scientific literature analysis competition （ICDAR2021 SLP） To promote progress in document understanding .ICDAR221-SLP utilize
PubTabNet Data sets , Provide hundreds of thousands of training and evaluation examples . On mission A（ Document layout recognition ） in , Submission with the highest performance combines object detection and professional solutions for different categories . stay
Mission B“ Form identification ” in ,top Submission depends on the method of identifying table components and the post-processing method of generating table structure and content . The results of both tasks show impressive performance , It also opens up the possibility for the practical application of high performance .
Portable document format （PDF） Documents can be found everywhere , The number of documents in multiple industries exceeds 2.5 One trillion
, Including insurance documents 、 Medical documents and peer-reviewed scientific articles .PDF It is one of the main sources of online and offline knowledge . although PDF Very suitable for saving basic elements on canvas （ character 、 line 、 shape 、 Image, etc ）, For different operating systems or devices for human use , But it's not a format that machines can understand .
At present, most document understanding methods rely on deep learning , This requires a lot of training examples . We use the PubMed Central1 Automatic generation of large data sets .PubMed Central National Institutes of health / A large collection of full-text articles in the biomedical field provided by the National Medical Library .
As of today ,PubMed Central Have 2476 Nearly of the journals 700 Ten thousand full-text articles , This makes it possible to study a large number of document understanding problems with different article styles . Our dataset uses PubMed Central Generated by a subset of , This subset is issued under a commercially available knowledge sharing license .
The competition is divided into two tasks , One is to understand the document layout by asking participants to identify several types of information in the document page （ Mission A）, The other is by asking participants to generate table images HTML Version to understand the table （
Mission B）.IBM Research AI The leaderboard system is used to collect and evaluate the information submitted by participants . The system is based on EvalAI2.
On mission A in , Participants can access all data except the basic facts of the final evaluation test set , Test set in PubLayNet Publish when available .
On mission B in , We released the final evaluation test set three days before the participants submitted the final results . On mission a The evaluation phase of , We received it from 78 Submitted by a large number of participants from different teams 281 A submission . The results of these two tasks show that , The most advanced algorithms have impressive performance , A significant improvement over the previously reported results , This opens up the possibility for the practical application of high performance .
Tabular information is common in all kinds of documents . Compared with natural language , Tables provide a way to summarize large amounts of data in a more compact and structured format . The table also provides a format , Help readers find and compare information . This competition aims to promote
Automatic recognition of unstructured forms The study of .
Participants in this task need to develop a model , The model can convert the images of table data into corresponding images HTML Code , This is PubMed Central 2021 Held HTML The table shows the... After the competition . Generated by task participants HTML The code should correctly represent the structure of the table and the content of each cell . The content of the cell should contain the definition text style （ Include bold 、 Italics 、 Delete 、 Superscripts and subscripts ） Of HTML Mark .HTML The code does not need to rebuild the appearance of the table , For example, border line 、 Background color or font 、 Font size or font color .
There are other table recognition challenges , Mainly at the International Conference on document analysis and identification （ICDAR） On the organization .
ICDAR 2013 The form competition is the first competition on form detection and recognition
ICDAR 2013 The table competition includes 156 A form , Methods used to evaluate form detection and form recognition ; However , No training data provided .
ICDAR 2019 The form detection and recognition competition provides training for form detection and recognition 、 Verification and test samples （ A total of 3600 individual ）
. Two types of documents , Historical handwriting and programming model , Are provided in image format .
ICDAR 2019 The competition consists of three tasks ：1） Determine the table area ;2） Identify a table structure with a given table region ;3） Identify the table structure without a given table region .ground truth Bounding box that includes only table cells , Exclude cell contents .
our Task B The competition presents a more challenging task ： The model needs to rely only on table images , Identify the table structure and the cell contents of the table . let me put it another way , The model needs to infer the tree structure of the table and each leaf node （ Header \ Body cell ） Properties of （ Content 、 Row span 、 Column span ）. Besides , We don't provide cell location 、 Adjacency or row / Column split middle comment , These are needed to train most existing table recognition models . We only provide the final result of the tree representation for supervision . We believe that this will motivate participants to develop new image to structure mapping models .
This task uses PubTabNet Data sets （v2.0.0）
PubTabNet Contains more than 500k Training samples and 9k Validation samples , It provides ground truthHTML Code , And the location of non empty table cells . Participants can use training data to train their models , The validation data are used for model selection and super parameter adjustment .9k+ Final evaluation set （ Only images , No notes ） Before the end of the final evaluation stage of the finals 3 Day release . In the final stage, participants submitted their results on this episode .
Use TEDS（ Similarity based on tree editing distance ） Metrics
 Evaluate the submitted content . T E D S TEDS TEDS Use
 The tree editing distance proposed in measures the similarity between two tables . The cost of insert and delete operations is 1. When e d i t edit edit The nodes no Replace with ns when , If no or ns No td, The price is 1. When no and ns All are td when , If no and ns The column span or row span of is different , Then the replacement cost is 1. otherwise , The alternative cost is no and ns Standardization between contents L e v e n s h t e i n Levenshtein Levenshtein similarity
（ stay [0,1] in ）. Last , Between two trees T E D TED TED The calculation for the
among E d i t D i s t EditDist EditDist Represents the editing distance of the tree , ∣ T ∣ |T| ∣T∣ yes T T T Number of nodes in . The table recognition performance of a method on a set of test samples is defined as the relationship between the recognition result of each sample and the basic truth T E D S TEDS TEDS The average of the scores .
The competition is divided into three stages . The format verification stage runs through the whole competition , Participants can use the mini development set provided by us to verify whether their result files meet our submission requirements . The development phase starts from the beginning of the game to the end of the game 3 God . At this stage , Participants can submit the results of test samples , To validate their model . The final evaluation stage will be at the end of the competition 3 Days go on . At this stage, participants can submit the reasoning results of the final evaluation set . The final ranking and winning team are determined by the performance of the final evaluation stage .
surface 3.2 Shows different tasks B The size of the different data sets used in the phase .
|Mini development||20||Format Verification Phase|
|Final evaluation||9064||Final evaluation|
surface 3.2： Mission B Data set statistics
Mission B, We have 30 Submitted by a team 30 A submission , For the final evaluation stage . In the final evaluation , Use T E D S TEDS TEDS Top 10 performance systems such as
surface 4 Shown . Due to problems with the final evaluation data set , Areas not considered in the assessment , Mark in bold .
The first four systems have similar performance , And we see a more significant difference . As shown in the system description , They depend on a combination of several components , These components identify related components from the table image , Then combine them . Compared with the results previously reported , Using image to sequence method T E D S TEDS TEDS The performance of the indicator is better
 in , The data set is comparable with the test set of this competition , And derived from
surface 4: The overall result （ T E D S TEDS TEDS all） Break down into simple and complex tables
Team: Davar-Lab-OCR, Hikvision Research Institute
Davar Lab OCR Thesis and source code
The table recognition framework consists of two main processes ： Table cell generation and structure inference
（1） be based on MASK R-CNN Detection model establishment table cell generation . say concretely , The model has been trained , You can learn / Column aligned cell level bounding box , And the corresponding text content area mask . We introduce pyramid mask Supervision , And USES the HRNet-W48 cascade MASK R-CNN Large backbone to get reliable a l i g n e d aligned aligned b o u n d i n g bounding bounding b o x e s boxes boxes. Besides , We also train a single line text detection model and an attention based text recognition model to provide OCR Information . This can be achieved by selecting an instance that contains only one line of text . We also use multi-scale set to further improve the performance of cell and single line text detection model .
（2） In the structure inference stage , The bounding box of the cell can be adjusted horizontally according to the alignment overlap / Vertical connection . Then search through the largest group （Maximum Clique Search） The process generates lines \ Column information , In the process, you can easily find empty cells .
In order to deal with some special situations , We train another table detection model to filter text that does not belong to the table .
VCGroup Github repo:
In our way
[7,10,14] in , We divide the table content recognition task into four sub tasks ： Table structure identification 、 Text line detection 、 Text line recognition and box assignment . Our table structure recognition algorithm is based on
MASTER custom ,MASTER It is a robust image text recognition algorithm .
PSENet Used to detect each line of text in the table image . For text line recognition , Our model is also based on
MASTER. Last , In the text box assignment phase , We will
PSENet The detected text box is associated with the structure item of table structure prediction reconstruction , And fill the recognized text line content into the corresponding item . Our proposed method is effective for 9115 Of a validation sample T E D S TEDS TEDS The score is 96.84%, In the final evaluation stage 9064 A sample of T E D S TEDS TEDS The score is 96.32%.
Team: Tomorrow Advancing Life(TAL)
TAL The system consists of two schemes ：
（1） Detect through the meter 、 Line detection 、 Column detection 、 Cell detection and text line detection 5 A detection model is used to reconstruct the table structure . choice
Mask R-CNN As this 5 A baseline for the detection model , Targeted optimization for different detection tasks . In the identification section , Input the results of unit detection and text line detection into
CRNN In the model , Get the recognition result corresponding to each unit .
（2） The recovery of the table structure is regarded as
img2seq problem . In order to shorten the decoding length , We replace the contents of each cell with different numbers . These numbers come from text line detection results . And then we use CNN Encode the image , The transformer model is used to decode the structure of the table . then , have access to CRNN The model obtains the corresponding text line content .
The above two schemes can get complete table structure and content recognition results . We have a set of selection rules , It combines the advantages of the two schemes , To output the best final result .
Team: PaodingAI, Beijing Paoding Technology Co., Ltd
The team ： baoding AI, Beijing Baoding Technology Co., Ltd
baoding AI The system is divided into three main parts ： Text block detection 、 Text block recognition and table structure recognition . The text block detector consists of
MMDetection Detector cascade provided
rcnn r50 2x Model training .
The text block recognizer consists of SAR TF model training . The table structure recognizer is our own recognition of
 The implementation of the model proposed in . In addition to the above model , We also use models and rules to deal with simple classification 、<b And white space characters . Our system is not an end-to-end model , There is no integration method .
Team: Kaen Context, Kakao Enterprise
The company is located in the southwest of Gyeonggi do city, South Korea
In order to effectively solve the problem of table recognition , We used 12 The layer is limited to the linearity of the decoder transformer structure
Data preparation ： We use RGB Images （ No need to rescale ） As input condition , Consolidated HTML The code is used as the target text sequence . We reshape a tabular image into a series of flat patches
（N,883）, among 8 Is the width and height of each image patch ,N It's the number of patches . then , We use a linear projection layer to map the image sequence to 512 dimension . The target text sequence is converted into 512 Dimension embedding , And attached to the end of the projected image sequence . Last , We add different location codes to text and image sequences , So that our model can distinguish them .
Training ： The stitched image text sequence is used as the input of the model , The model is trained by cross entropy loss under the teacher forced algorithm .
inference ： The output of our model is through
beam Search for sampling
Antonacopoulos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: A realistic dataset for performance evaluation of document layout analysis. In: 2009 10th International Conference on Document Analysis and Recognition. pp. 296–300.IEEE (2009)
Clausner, C., Antonacopoulos, A., Pletschacher, S.: Icdar2017 competition on recognition of documents with complex layouts-rdcl2017.In: 2017 14th IAPR In- ternational Conference on Document Analysis and Recognition (ICDAR). vol. 1, pp. 1404–1410. IEEE (2017)
Clausner, C., Papadopoulos, C., Pletschacher, S., Antonacopoulos, A.: The enp image and ground truth dataset of historical newspapers.In: 2015 13th International Conference on Document Analysis andRecognition (ICDAR). pp. 931–935.IEEE (2015)
Gao, L., Huang, Y., Li, Y., Yan, Q., Fang, Y., Dejean, H., Kleber, F., Lang, E.M.:ICDAR 2019 competition on table detection and recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1510–1515. IEEE (Sep 2019).https://doi.org/10.1109/ICDAR.2019.00166
G¨ obel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: 201312th International Conference on Document Analysis and Recognition. pp. 1449–1453. IEEE (2013)
Grygoriev, A., Degtyarenko, I., Deriuga, I., Polotskyi, S., Melnyk, V., Zakharchuk,D., Radyvonenko, O.: HCRNN: A novel architecture for fast online handwrittenstroke classification. In: Proc. of Int. Conf.on Document Analysis and Recognition(2021)
He, Y., Qi, X., Ye, J., Gao, P., Chen, Y., Li, B., Tang, X., Xiao, R.: Pingan- vcgroup’s solution for icdar 2021 competition on scientific table image recognitionto latex. arXiv (2021)
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are rnns: Fastautoregressive transformers with linear attention. In:International Conference onMachine Learning. pp. 5156–5165. PMLR(2020)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, andreversals. In: Soviet physics doklady. vol. 10, pp.707–710. Soviet Union (1966)
Lu, N., Yu, W., Qi, X., Chen, Y., Gong, P., Xiao, R., Bai, X.: Master: Multi-aspectnon-local network for scene text recognition.Pattern Recognition (2021)
Pawlik, M., Augsten, N.: Tree edit distance: Robust and memory-efficient. Infor-mation Systems 56, 157–173 (2016)
Staar, P.W., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: A machinelearning platform to ingest documents at scale. In:Proceedings of the 24th ACM SIGKDD International Conference onKnowledge Discovery & Data Mining. pp.774–782 (2018)ICDAR 2021Competition on Scientific Literature Parsing 13
Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splittingand merging for table structure decomposition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR).pp. 114–121. IEEE (2019)
Ye, J., Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Xiao, R.: Pingan-vcgroup’s solution for icdar 2021 competition on scientific literature parsing task b: Table recognition to html. arXiv (2021)
Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extrac-tor (gte): A framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp.697–706 (2021)
Zhong, X., ShafieiBavani, E., Yepes, A.J.: Image-based table recognition: data,model, and evaluation. arXiv preprint arXiv:1911.10683 (2019)
Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document lay-out analysis. In: 2019 International Conference on Document Analysis and Recog-nition (ICDAR). pp. 1015–1022. IEEE (2019)
- 逻辑回归（Logistic Regression）
- 【源码解析】StyleNeRF 之Train_encoder.py
- （Transfer Learning and fine tuning）迁移学习与微调
- ICDAR 2021竞赛 科学文献分析——表格识别综述部分（剩余部分是文档布局分析）
- chrome 安装axure 插件
- P4 learning - Basic forwarding
- The contents of the input box are displayed on the right
- Clickhouse 22.3 lts release
- Programmer flirting special ~ ~ ~ nice H5 cube creative photo album, resources free!!! A gift from a programmer to a girl is very suitable for a young lady!
- [missing scan tool] awvs, appscan download and installation (with network disk link)
- paramiko下载大文件出错问题 sftp
- 5.3 binary tree_ Code implementation of optimized heap and Top-k problem
- 5.4 binary tree_ Code implementation of various traversal and calculation
- Record: com mysql. cj. jdbc. exceptions. CommunicationsException: Communications link failure... [effective through personal test]
- Record: 1221 - incorrect usage of Union and order by [effective through personal test]
- [force deduction] backtracking 1 - Foundation + combination
- What role does cloud computing play in building intelligence?
- Abstract - the shortest novel of 2016
- Fiddler packet capture guide 05: breaking points
- AM57x 多核SoC开发板——GPMC的多通道AD采集综合案例手册（上）
- 【日常训练】面试题 01.05. 一次编辑
- 【日常训练】384. 打乱数组
- AM57x 多核SoC开发板——GPMC的多通道AD采集综合案例手册（下）
- EDA technology and market analysis
- 一招win7 c盘瘦身
- Day 1:轮转数组
- C'est la capacité, c'est la culture.
- Am57x multi-core SoC development board -- GPMC's comprehensive case manual for multi-channel AD acquisition (Part 2)
- [daily training] 384 Scramble array
- [daily training] interview question 01.05 One edit
- Am57x multi-core SoC development board -- GPMC's comprehensive case manual for multi-channel AD acquisition (Part I)
- Ad7606 / ad7616 make zynq more powerful in the field of energy and power, and can realize 16 / 32 / 64 channel ad synchronous sampling