当前位置:网站首页>ICDAR 2021 competition scientific literature analysis - table identification summary (the rest is document layout analysis)
ICDAR 2021 competition scientific literature analysis - table identification summary (the rest is document layout analysis)
2022-05-14 14:00:25【Zheng Jianyu JY】
Mission B For the table identification part , This article only looks at table recognition
Abstract ( Is not important , If you want to look directly at the table recognition part, you can skip ).
The scientific literature contains important information about cutting-edge innovations in different fields . The development of automatic document processing has promoted the rapid development of natural language information processing . However , Scientific literature is usually in the form of Unstructured PDF Format
Provide . although PDF Very suitable for saving basic visual elements on canvas , Such as character 、 line 、 Shape, etc. , To present to humans , But the machine is right PDF Automatic format processing brings many challenges . There are more than 2.5 One trillion PDF file , These problems are also common in many other important applications .
A key challenge in automatically extracting information from scientific literature is , Documents usually contain Unnatural language
The content of , Such as Graphics and tables
. However , These contents usually illustrate the key results of research 、 Information or summary . In order to fully understand the scientific literature , The automation system must be able to recognize the layout of documents , And parse the non natural language content into machine-readable format . our ICDAR 2021 Scientific literature analysis competition (ICDAR2021 SLP) To promote progress in document understanding .ICDAR221-SLP utilize PubLayNet
and PubTabNet
Data sets , Provide hundreds of thousands of training and evaluation examples . On mission A( Document layout recognition ) in , Submission with the highest performance combines object detection and professional solutions for different categories . stay Mission B“ Form identification ”
in ,top Submission depends on the method of identifying table components and the post-processing method of generating table structure and content . The results of both tasks show impressive performance , It also opens up the possibility for the practical application of high performance .
1、 introduction ( Is not important , Skippable , Need to see again )
Portable document format (PDF) Documents can be found everywhere , The number of documents in multiple industries exceeds 2.5 One trillion [12]
, Including insurance documents 、 Medical documents and peer-reviewed scientific articles .PDF It is one of the main sources of online and offline knowledge . although PDF Very suitable for saving basic elements on canvas ( character 、 line 、 shape 、 Image, etc ), For different operating systems or devices for human use , But it's not a format that machines can understand .
At present, most document understanding methods rely on deep learning , This requires a lot of training examples . We use the PubMed Central1 Automatic generation of large data sets .PubMed Central National Institutes of health / A large collection of full-text articles in the biomedical field provided by the National Medical Library .
As of today ,PubMed Central Have 2476 Nearly of the journals 700 Ten thousand full-text articles , This makes it possible to study a large number of document understanding problems with different article styles . Our dataset uses PubMed Central Generated by a subset of , This subset is issued under a commercially available knowledge sharing license .
The competition is divided into two tasks , One is to understand the document layout by asking participants to identify several types of information in the document page ( Mission A), The other is by asking participants to generate table images HTML Version to understand the table ( Mission B
).IBM Research AI The leaderboard system is used to collect and evaluate the information submitted by participants . The system is based on EvalAI2.
On mission A in , Participants can access all data except the basic facts of the final evaluation test set , Test set in PubLayNet Publish when available . On mission B in
, We released the final evaluation test set three days before the participants submitted the final results . On mission a The evaluation phase of , We received it from 78 Submitted by a large number of participants from different teams 281 A submission . The results of these two tasks show that , The most advanced algorithms have impressive performance , A significant improvement over the previously reported results , This opens up the possibility for the practical application of high performance .
3、 Mission B—— Form identification
Tabular information is common in all kinds of documents . Compared with natural language , Tables provide a way to summarize large amounts of data in a more compact and structured format . The table also provides a format , Help readers find and compare information . This competition aims to promote Automatic recognition of unstructured forms
The study of .
Participants in this task need to develop a model , The model can convert the images of table data into corresponding images HTML Code , This is PubMed Central 2021 Held HTML The table shows the... After the competition . Generated by task participants HTML The code should correctly represent the structure of the table and the content of each cell . The content of the cell should contain the definition text style ( Include bold 、 Italics 、 Delete 、 Superscripts and subscripts ) Of HTML Mark .HTML The code does not need to rebuild the appearance of the table , For example, border line 、 Background color or font 、 Font size or font color .
3.1、 Related work
There are other table recognition challenges , Mainly at the International Conference on document analysis and identification (ICDAR) On the organization .ICDAR 2013
The form competition is the first competition on form detection and recognition [5]
.ICDAR 2013
The table competition includes 156 A form , Methods used to evaluate form detection and form recognition ; However , No training data provided .ICDAR 2019
The form detection and recognition competition provides training for form detection and recognition 、 Verification and test samples ( A total of 3600 individual )[4]
. Two types of documents , Historical handwriting and programming model , Are provided in image format .ICDAR 2019
The competition consists of three tasks :1) Determine the table area ;2) Identify a table structure with a given table region ;3) Identify the table structure without a given table region .ground truth Bounding box that includes only table cells , Exclude cell contents .
our Task B The competition presents a more challenging task : The model needs to rely only on table images , Identify the table structure and the cell contents of the table . let me put it another way , The model needs to infer the tree structure of the table and each leaf node ( Header \ Body cell ) Properties of ( Content 、 Row span 、 Column span ). Besides , We don't provide cell location 、 Adjacency or row / Column split middle comment , These are needed to train most existing table recognition models . We only provide the final result of the tree representation for supervision . We believe that this will motivate participants to develop new image to structure mapping models .
3.2、 data
This task uses PubTabNet Data sets (v2.0.0)[16]
.PubTabNet
Contains more than 500k Training samples and 9k Validation samples , It provides ground truthHTML Code , And the location of non empty table cells . Participants can use training data to train their models , The validation data are used for model selection and super parameter adjustment .9k+ Final evaluation set ( Only images , No notes ) Before the end of the final evaluation stage of the finals 3 Day release . In the final stage, participants submitted their results on this episode .
Use TEDS( Similarity based on tree editing distance ) Metrics [16]
Evaluate the submitted content . T E D S TEDS TEDS Use [11]
The tree editing distance proposed in measures the similarity between two tables . The cost of insert and delete operations is 1. When e d i t edit edit The nodes no Replace with ns when , If no or ns No td, The price is 1. When no and ns All are td when , If no and ns The column span or row span of is different , Then the replacement cost is 1. otherwise , The alternative cost is no and ns Standardization between contents L e v e n s h t e i n Levenshtein Levenshtein similarity [9]
( stay [0,1] in ). Last , Between two trees T E D TED TED The calculation for the
among E d i t D i s t EditDist EditDist Represents the editing distance of the tree , ∣ T ∣ |T| ∣T∣ yes T T T Number of nodes in . The table recognition performance of a method on a set of test samples is defined as the relationship between the recognition result of each sample and the basic truth T E D S TEDS TEDS The average of the scores .
The competition is divided into three stages . The format verification stage runs through the whole competition , Participants can use the mini development set provided by us to verify whether their result files meet our submission requirements . The development phase starts from the beginning of the game to the end of the game 3 God . At this stage , Participants can submit the results of test samples , To validate their model . The final evaluation stage will be at the end of the competition 3 Days go on . At this stage, participants can submit the reasoning results of the final evaluation set . The final ranking and winning team are determined by the performance of the final evaluation stage . surface 3.2
Shows different tasks B The size of the different data sets used in the phase .
Split | Size | Phase |
---|---|---|
Training | 500,777 | N/A |
Development | 9,115 | N/A |
Mini development | 20 | Format Verification Phase |
Test | 9138 | Development |
Final evaluation | 9064 | Final evaluation |
surface 3.2: Mission B Data set statistics
3.3、 result
about Mission B
, We have 30 Submitted by a team 30 A submission , For the final evaluation stage . In the final evaluation , Use T E D S TEDS TEDS Top 10 performance systems such as surface 4
Shown . Due to problems with the final evaluation data set , Areas not considered in the assessment , Mark in bold .
The first four systems have similar performance , And we see a more significant difference . As shown in the system description , They depend on a combination of several components , These components identify related components from the table image , Then combine them . Compared with the results previously reported , Using image to sequence method T E D S TEDS TEDS The performance of the indicator is better [17]
. stay [17]
in , The data set is comparable with the test set of this competition , And derived from PubMed Central
.
surface 4: The overall result ( T E D S TEDS TEDS all) Break down into simple and complex tables [16]
3.4、 System description ( It only describes part of )
Team: Davar-Lab-OCR, Hikvision Research Institute
Davar Lab OCR Thesis and source code
The table recognition framework consists of two main processes : Table cell generation and structure inference
(1) be based on MASK R-CNN Detection model establishment table cell generation . say concretely , The model has been trained , You can learn / Column aligned cell level bounding box , And the corresponding text content area mask . We introduce pyramid mask Supervision , And USES the HRNet-W48 cascade MASK R-CNN Large backbone to get reliable a l i g n e d aligned aligned b o u n d i n g bounding bounding b o x e s boxes boxes. Besides , We also train a single line text detection model and an attention based text recognition model to provide OCR Information . This can be achieved by selecting an instance that contains only one line of text . We also use multi-scale set to further improve the performance of cell and single line text detection model .
(2) In the structure inference stage , The bounding box of the cell can be adjusted horizontally according to the alignment overlap / Vertical connection . Then search through the largest group (Maximum Clique Search) The process generates lines \ Column information , In the process, you can easily find empty cells .
In order to deal with some special situations , We train another table detection model to filter text that does not belong to the table .
Team: VCGroup
VCGroup Github repo:
In our way [7,10,14]
in , We divide the table content recognition task into four sub tasks : Table structure identification 、 Text line detection 、 Text line recognition and box assignment . Our table structure recognition algorithm is based on MASTER
custom ,MASTER It is a robust image text recognition algorithm .PSENet
Used to detect each line of text in the table image . For text line recognition , Our model is also based on MASTER
. Last , In the text box assignment phase , We will PSENet
The detected text box is associated with the structure item of table structure prediction reconstruction , And fill the recognized text line content into the corresponding item . Our proposed method is effective for 9115 Of a validation sample T E D S TEDS TEDS The score is 96.84%, In the final evaluation stage 9064 A sample of T E D S TEDS TEDS The score is 96.32%.
Team: Tomorrow Advancing Life(TAL)
TAL The system consists of two schemes :
(1) Detect through the meter 、 Line detection 、 Column detection 、 Cell detection and text line detection 5 A detection model is used to reconstruct the table structure . choice Mask R-CNN
As this 5 A baseline for the detection model , Targeted optimization for different detection tasks . In the identification section , Input the results of unit detection and text line detection into CRNN
In the model , Get the recognition result corresponding to each unit .
(2) The recovery of the table structure is regarded as img2seq
problem . In order to shorten the decoding length , We replace the contents of each cell with different numbers . These numbers come from text line detection results . And then we use CNN Encode the image , The transformer model is used to decode the structure of the table . then , have access to CRNN The model obtains the corresponding text line content .
The above two schemes can get complete table structure and content recognition results . We have a set of selection rules , It combines the advantages of the two schemes , To output the best final result .
Team: PaodingAI, Beijing Paoding Technology Co., Ltd
The team : baoding AI, Beijing Baoding Technology Co., Ltd
baoding AI The system is divided into three main parts : Text block detection 、 Text block recognition and table structure recognition . The text block detector consists of MMDetection
Detector cascade provided rcnn r50 2x
Model training .
The text block recognizer consists of SAR TF model training . The table structure recognizer is our own recognition of [13]
The implementation of the model proposed in . In addition to the above model , We also use models and rules to deal with simple classification 、<b And white space characters . Our system is not an end-to-end model , There is no integration method .
Team: Kaen Context, Kakao Enterprise
The company is located in the southwest of Gyeonggi do city, South Korea
In order to effectively solve the problem of table recognition , We used 12 The layer is limited to the linearity of the decoder transformer structure [8]
.
Data preparation : We use RGB Images ( No need to rescale ) As input condition , Consolidated HTML The code is used as the target text sequence . We reshape a tabular image into a series of flat patches (N,883)
, among 8 Is the width and height of each image patch ,N It's the number of patches . then , We use a linear projection layer to map the image sequence to 512 dimension . The target text sequence is converted into 512 Dimension embedding , And attached to the end of the projected image sequence . Last , We add different location codes to text and image sequences , So that our model can distinguish them .
Training : The stitched image text sequence is used as the input of the model , The model is trained by cross entropy loss under the teacher forced algorithm .
inference : The output of our model is through beam Search for
sampling (beam=32)
.
reference :
Antonacopoulos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: A realistic dataset for performance evaluation of document layout analysis. In: 2009 10th International Conference on Document Analysis and Recognition. pp. 296–300.IEEE (2009)
Clausner, C., Antonacopoulos, A., Pletschacher, S.: Icdar2017 competition on recognition of documents with complex layouts-rdcl2017.In: 2017 14th IAPR In- ternational Conference on Document Analysis and Recognition (ICDAR). vol. 1, pp. 1404–1410. IEEE (2017)
Clausner, C., Papadopoulos, C., Pletschacher, S., Antonacopoulos, A.: The enp image and ground truth dataset of historical newspapers.In: 2015 13th International Conference on Document Analysis andRecognition (ICDAR). pp. 931–935.IEEE (2015)
Gao, L., Huang, Y., Li, Y., Yan, Q., Fang, Y., Dejean, H., Kleber, F., Lang, E.M.:ICDAR 2019 competition on table detection and recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1510–1515. IEEE (Sep 2019).https://doi.org/10.1109/ICDAR.2019.00166
G¨ obel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: 201312th International Conference on Document Analysis and Recognition. pp. 1449–1453. IEEE (2013)
Grygoriev, A., Degtyarenko, I., Deriuga, I., Polotskyi, S., Melnyk, V., Zakharchuk,D., Radyvonenko, O.: HCRNN: A novel architecture for fast online handwrittenstroke classification. In: Proc. of Int. Conf.on Document Analysis and Recognition(2021)
He, Y., Qi, X., Ye, J., Gao, P., Chen, Y., Li, B., Tang, X., Xiao, R.: Pingan- vcgroup’s solution for icdar 2021 competition on scientific table image recognitionto latex. arXiv (2021)
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are rnns: Fastautoregressive transformers with linear attention. In:International Conference onMachine Learning. pp. 5156–5165. PMLR(2020)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, andreversals. In: Soviet physics doklady. vol. 10, pp.707–710. Soviet Union (1966)
Lu, N., Yu, W., Qi, X., Chen, Y., Gong, P., Xiao, R., Bai, X.: Master: Multi-aspectnon-local network for scene text recognition.Pattern Recognition (2021)
Pawlik, M., Augsten, N.: Tree edit distance: Robust and memory-efficient. Infor-mation Systems 56, 157–173 (2016)
Staar, P.W., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: A machinelearning platform to ingest documents at scale. In:Proceedings of the 24th ACM SIGKDD International Conference onKnowledge Discovery & Data Mining. pp.774–782 (2018)ICDAR 2021Competition on Scientific Literature Parsing 13
Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splittingand merging for table structure decomposition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR).pp. 114–121. IEEE (2019)
Ye, J., Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Xiao, R.: Pingan-vcgroup’s solution for icdar 2021 competition on scientific literature parsing task b: Table recognition to html. arXiv (2021)
Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extrac-tor (gte): A framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp.697–706 (2021)
Zhong, X., ShafieiBavani, E., Yepes, A.J.: Image-based table recognition: data,model, and evaluation. arXiv preprint arXiv:1911.10683 (2019)
Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document lay-out analysis. In: 2019 International Conference on Document Analysis and Recog-nition (ICDAR). pp. 1015–1022. IEEE (2019)
边栏推荐
- 逻辑回归(Logistic Regression)
- 【源码解析】StyleNeRF 之Train_encoder.py
- moderlarts第一次学习
- 为什么调用了BrandMapper里面的addBrand方法,这个方法已经给我一个返回值了,也在方法里面提交事务了,但是数据库还是没有增加数据
- 2022年5月7日刷题
- 2022.5.7-----leetcode.433
- 2022年5月7号博客内容SSM框架项目管理-------------超市管理系统(第十一课针对订单管理修改表中的数据表的数据)
- C#,扫雷游戏(Minesweeper)之壹——作弊手段大曝光
- 【开源库学习】OpenThreads
- 【OSG】Examples
猜你喜欢
随机推荐
- postman循环调用同一个接口
- (Transfer Learning and fine tuning)迁移学习与微调
- ICDAR 2021竞赛 科学文献分析——表格识别综述部分(剩余部分是文档布局分析)
- Webpach打包器的如何使用
- 2021-IEEE论文-深度神经网络在文档图像表格识别中的应用现状及性能分析
- vscode搭建go开发环境
- chrome 安装axure 插件
- P4 learning - Basic forwarding
- The contents of the input box are displayed on the right
- Clickhouse 22.3 lts release
- Programmer flirting special ~ ~ ~ nice H5 cube creative photo album, resources free!!! A gift from a programmer to a girl is very suitable for a young lady!
- [missing scan tool] awvs, appscan download and installation (with network disk link)
- paramiko下载大文件出错问题 sftp
- 5.3 binary tree_ Code implementation of optimized heap and Top-k problem
- 5.4 binary tree_ Code implementation of various traversal and calculation
- Record: com mysql. cj. jdbc. exceptions. CommunicationsException: Communications link failure... [effective through personal test]
- Record: 1221 - incorrect usage of Union and order by [effective through personal test]
- CLIP学习笔记
- [force deduction] backtracking 1 - Foundation + combination
- What role does cloud computing play in building intelligence?
- Abstract - the shortest novel of 2016
- Fiddler packet capture guide 05: breaking points
- SAP:SWITCH用法
- AD7606/AD7616使ZYNQ在能源电力领域如虎添翼,可实现16/32/64通道AD同步采样
- AM57x 多核SoC开发板——GPMC的多通道AD采集综合案例手册(上)
- 【日常训练】面试题 01.05. 一次编辑
- 【日常训练】384. 打乱数组
- AM57x 多核SoC开发板——GPMC的多通道AD采集综合案例手册(下)
- EDA technology and market analysis
- libmodus源码解读
- 一招win7 c盘瘦身
- Day 1:轮转数组
- 美团四大名著为什么不是三或五
- 是能力更是文化,談談IT系統的安全發布
- C'est la capacité, c'est la culture.
- Am57x multi-core SoC development board -- GPMC's comprehensive case manual for multi-channel AD acquisition (Part 2)
- [daily training] 384 Scramble array
- [daily training] interview question 01.05 One edit
- Am57x multi-core SoC development board -- GPMC's comprehensive case manual for multi-channel AD acquisition (Part I)
- Ad7606 / ad7616 make zynq more powerful in the field of energy and power, and can realize 16 / 32 / 64 channel ad synchronous sampling