当前位置:网站首页>Paper reading (47):dtfd-mil: double tier feature interpretation multiple instance learning for histopathology
Paper reading (47):dtfd-mil: double tier feature interpretation multiple instance learning for histopathology
2022-06-23 18:03:22【Inge】
List of articles
0 introduce
0.1 subject
0.2 background
Learn by example (MIL) stay Pathological whole image (histopathology whole slide images, WSIs) The application of classification is becoming more and more mature . However , Such targeted research still faces some difficulties , Such as Small sample queue (small sample cohorts) . In this context ,WSI Images ( package ) Limited number , And leaflet WSI The resolution is huge , Further lead to a large number of cropped blocks ( example ).
Tips: I wonder if the expression of the small sample queue is accurate ; I downloaded it before WSI Images , A single sample may have more than one G, It's really scary
0.3 Method
By introducing Pseudo packet (pseudo-bags) To virtually increase the number of packages , On this basis, a double (double-tier) MIL Framework to make effective use of its inherent characteristics . Besides , In attention based MIL The calculation example probability is deduced under the framework , The derivation is used to help build and analyze the proposed framework .
0.4 Bib
@inproceedings{
Zhang:2022:double,
author = {
Hong Run Zhang and Yan Da Meng and Yi Tian Zhao and Yi Hong Qiao and Xiao Yun Yang and Sarah E Coupland and Ya Lin Zheng},
title = {
{
DTFD}-{
MIL}: {
D}ouble-tier feature distillation multiple instance learning for histopathology whole slide image classification},
journal = {
Computer Vision and Pattern Recognition},
year = {
2022}
}
1 introduce
Whole image (WSI) Annotation is one of the major challenges in the field of computer vision , It is widely used in histopathology , It promotes the improvement of digital pathology on pathologists' workflow and diagnostic decision-making , It also stimulates the understanding of WSI Requirements for intelligent or automatic analysis tools . Single sheet WSI It's too big , from 100M To 10G Unequal . Because of this unique nature , Existing machine learning methods , For example, it is unrealistic to use natural images and medical image models directly ; Deep learning models require large-scale data and high-quality annotations .But, Pixel level label pairs WSI It can only be ( ̄▽ ̄)". So , In this way Small amount of annotation The question has aroused the great enthusiasm of researchers of deep learning , Such as weak supervision and semi supervision , And most of the weak supervision WSI Research can be characterized as MIL Research . stay MIL Within the framework of , One WSI As a package , It can contain thousands of blocks ( example ). As long as at least one instance is positive , Then WSI Being positive .
In the field of computer vision , There are many ways to MIL Try the problem . However ,WSI The innate nature of determines MIL Under the WSI The classification scheme is not as simple as other computer vision sub fields , because The only direct guidance information for training is hundreds of WSI The label of . This can lead to over fitting problems , That is, the machine learning model tends to fall into local minimum in the optimization process , The correlation between the learned characteristics and the target disease is low , So as to reduce the generalization ability of the model .
To solve the over fitting problem ,MIL Next WSI The guiding ideology of the research is to learn more information from fewer tags . Mutual example relationship (mutual-instance relation) Is one of the effective methods , Can be specified as space or feature distance , Or learn through neural modules , Such as cyclic neural network 、 converter , And graph neural networks .
Most of the existing methods can be classified as Based on the attention mechanism Of (attention-based, AB_MIL), The main difference is in the calculation of attention score . However , stay AB-MIL It is considered infeasible to explicitly infer the instance probability under the framework , And as an alternative , Attention scores are often used as an indicator of positive activation . In this paper , We think Attention score Not a strict measure for this purpose , But in AB-MIL Derived under the framework Instance probability .
Given an oversize WSI, the Direct processing unit It's from WSI Polar blocks cropped in . by WSI Born MIL The purpose of the model is to identify the most distinctive blocks , Because it is most likely to trigger the tag of the package . However ,WSI There are fewer of them , There are countless blocks , And the label information is WSI Grade . Besides , Pathology WSI in , Positive examples corresponding to the lesion area often occupy only a small part of the tissue , This further leads to a very small number of positive instances . therefore , In cases where over fitting is most likely to result , It's still exciting to identify these positive examples
In recent years , Although there are many ways to use Mutual example information To enhance MIL performance , But they did not explicitly address the above reasons WSI Problems caused by essential characteristics . In order to alleviate the negative effects of these problems , We introduce... Into the algorithm framework Pseudo packet The concept of , That is, randomly divide a sheet WSI Examples in , The partition result corresponds to the pseudo packet . Each pseudo package will be assigned its parent package , That is, the label of the original package . This method can organically increase the number of packages , And ensure that there are only a few instances in the pseudo package , This is us Double layer characteristic distillation MIL Model Great idea of , Such as chart 1. In particular , One 1 Hierarchy AB-MIL The model is applied to all WSI In the pseudo package . However , There is one Risk issues It is a pseudo package from a positive package. In fact, there may be no positive instances in the pseudo package , In this way, it is assigned a wrong label .
Old fellow iron
To solve this problem , We distill an eigenvector from each pseudo packet , And build a vector like this 2 Hierarchy AB-MIL Model , Such as chart 3. After such distillation ,1 The hierarchical model will provide clear features , In order to offer 2 The hierarchical model obtains a better representation of the parent package . Besides , For characteristic distillation , We use deep learning features for visualization Grad-CAM ( Gradient based category activation graph , grad-based class activation map) The basic idea of the model , stay AB-MIL Within the framework of The instance probability is derived .
Essentially , Let's look at it from a novel perspective , That is, use double layers MIL Frame to deal with WSI problem , The main The contributions are as follows :
1) Introduce the concept of pseudo package , In response to WSI Insufficient dilemma ;
2) utilize Grad-CAM The basic idea of , from AB-MIL From the point of view of, the instance probability is directly derived , This can be used as a lot in the future MIL Extension of method ;
3) Push the probability to , Developed a two-tier MIL frame , And in two large public WSI The data set shows its advantages .
2 Method
2.1 review Grad-CAM and AB-MIL
2.1.1 Grad-CAM
An end-to-end deep learning image classification model usually includes two modules , That is, for high-level feature extraction Deep convolution network (deep convolution neural network, DCNN) And for classification Multi layer perception (multi-layer perceptron, MLP). An image is fed to DCNN Multiple feature maps can be obtained after , And an eigenvector can be obtained through the pooling function . In this way, the eigenvector is handed over to MLP, You can get the category probability 🤩, Such as chart 2 (a).
hypothesis DCNN The output characteristic diagram is U ∈ R D × W × H U\in\mathbb{R}^{D\times W\times H} U∈RD×W×H, among D D D Number of channels , D D D and H H H Is the dimension size . stay U U U Applying global average pooling on the packet will obtain the eigenvector representing the packet :
f = GAP W , H ( U ) ∈ R D (1) \tag{1} \boldsymbol{f}=\text{GAP}_{W,H}(U)\in\mathbb{R}^D f=GAPW,H(U)∈RD(1) among GAP W , H ( U ) \text{GAP}_{W,H}(U) GAPW,H(U) About W , H W,H W,H Average pooling of , namely f \boldsymbol{f} f Of the d d d Elements f d = 1 W H ∑ w = 1 , h = 1 W , H U w , h d f_d=\frac{1}{WH}\sum_{w=1,h=1}^{W,H}U_{w,h}^d fd=WH1∑w=1,h=1W,HUw,hd. Use f \boldsymbol{f} f As input ,MLP Export category c ∈ { 1 , 2 , c … , C } c\in\{1,2,c\dots,C\} c∈{ 1,2,c…,C} The logical value of s c s^c sc, It indicates that the current attribute belongs to c c c Class signal strength , It can be done by softmax Operation to obtain the predicted category probability . be based on Grad-CAM Of the c c c Class category activation graph is defined as the weighted sum of feature graph :
L c = ∑ d D β d c U d , β d c = 1 W H ∑ w , h W , H ( ∂ s c ∂ U w , h d ) (2) \tag{2} \boldsymbol{L}^c=\sum_{d}^D\beta_d^cU^d,\qquad\beta_d^c=\frac{1}{WH}\sum_{w,h}^{W,H}\left( \frac{\partial s^c}{\partial U_{w,h}^d} \right) Lc=d∑DβdcUd,βdc=WH1w,h∑W,H(∂Uw,hd∂sc)(2) among L c ∈ R W × H \boldsymbol{L}^c\in\mathbb{R}^{W\times H} Lc∈RW×H, L w , h c L_{w,h}^c Lw,hc yes L c \boldsymbol{L}^c Lc It's in position w , h w,h w,h Amplitude value of , Indicates that this position converges to the category c c c Intensity of :
L w , h c = ∑ d = 1 D β d c U w , h d (3) \tag{3} L_{w,h}^c=\sum_{d=1}^D\beta_d^cU_{w,h}^d Lw,hc=d=1∑DβdcUw,hd(3)
2.1.2 AB-MIL
Given that there is K K K Package of instances X = { x 1 , x 2 , … , x K } X=\{x_1,x_2,\dots,x_K\} X={ x1,x2,…,xK}, Each instance x k , k ∈ 1 , 2 , … , K x_k,k\in1,2,\dots,K xk,k∈1,2,…,K Hold hidden Tags y k y_k yk ( Unknowable ), among y k = 1 y_k=1 yk=1 Express positive , = 0 =0 =0 Negative .MIL The goal of is to detect whether the package contains at least one positive instance . The only thing you can use during the training phase is Package label , It is defined as :
Y = { 1 , if ∑ k = 1 K y k > 0 0 , otherwise (4) \tag{4} Y=\left\{ \begin{array}{ll} 1,&\qquad \text{if}\ \sum_{k=1}^Ky_k>0\\ 0,&\qquad\text{otherwise} \end{array} \right. Y={ 1,0,if ∑k=1Kyk>0otherwise(4) A simple way to solve this problem is to assign the label of the corresponding package to the instance , And train the classifier , Finally, through average pooling or maximum pooling, the predicted result of aggregation instances is packet labels . Another strategy is to use the learning package to express F \boldsymbol{F} F, Thus, the problem is simplified to the traditional classification task . This strategy is more effective , It can be seen as MIL Embedded learning is a kind of . Packet embedding Is customized as :
F = G ( { h k ∣ k = 1 , 2 , … , K } ) (5) \tag{5} \boldsymbol{F}=\text{G}(\{\boldsymbol{h_k|k=1,2,\dots,K}\}) F=G({ hk∣k=1,2,…,K})(5) among G \text{G} G Is the aggregation function , h k ∈ R d \boldsymbol{h}_k\in\mathbb{R}^d hk∈Rd Is the instance k k k Feature extraction . The typical convergence function is the attention mechanism :
F = ∑ k = 1 K α k h k ∈ R D (6) \tag{6} \boldsymbol{F}=\sum_{k=1}^K\alpha_k\boldsymbol{h}_k\in\mathbb{R}^D F=k=1∑Kαkhk∈RD(6) among α k \alpha_k αk Is the instance h k \boldsymbol{h}_k hk Acquisition weight of , D D D It's a vector F \boldsymbol{F} F and h k \boldsymbol{h}_k hk Dimensions . Such a mechanism, such as chart 2 (b) Shown . There are many ways to calculate attention scores , For example, classic AB-MIL The weight of is calculated as :
α k = exp { w T ( tanh ( V 1 h k ) ⊙ sigm ( V 2 h k ) ) } ∑ j = 1 K exp { w T ( tanh ( V 1 h j ) ⊙ sigm ( V 2 h j ) ) } (7) \tag{7} \alpha_k=\frac{\exp\{ \boldsymbol{w}^T(\tanh (\boldsymbol{V}_1\boldsymbol{h}_k) \odot\text{sigm}(\boldsymbol{V}_2\boldsymbol{h}_k)) \}}{\sum_{j=1}^K\exp\{ \boldsymbol{w}^T(\tanh (\boldsymbol{V}_1\boldsymbol{h}_j) \odot\text{sigm}(\boldsymbol{V}_2\boldsymbol{h}_j)) \}} αk=∑j=1Kexp{ wT(tanh(V1hj)⊙sigm(V2hj))}exp{ wT(tanh(V1hk)⊙sigm(V2hk))}(7) among w \boldsymbol{w} w、 V 1 \boldsymbol{V}_1 V1, as well as V 2 \boldsymbol{V}_2 V2 Is the acquisition parameter .
2.2 AB-MIL Derivation of case probability in
Even though MIL The packet embedding method has excellent performance , However, it seems infeasible to calculate the probability of instance category . This paper proves that in AB-MIL It is feasible to obtain the prediction probability of a single instance , Prove slightly . therefore , application Grad-CAM To AB-MIL It is feasible to directly infer the signal strength of an instance belonging to a certain category . And formula 2 similar , example k k k Belong to category c c c Of Signal strength Can be recorded as :
L k c = ∑ d = 1 D β d c h ^ k , d , β d c = 1 K ∑ i = 1 K ∂ s c ∂ h ^ k , d (8) \tag{8} L_k^c=\sum_{d=1}^D\beta_d^c\hat{h}_{k,d},\qquad\beta_{d}^c=\frac{1}{K}\sum_{i=1}^K\frac{\partial s_c}{\partial\hat{h}_{k,d}} Lkc=d=1∑Dβdch^k,d,βdc=K1i=1∑K∂h^k,d∂sc(8) among s c s_c sc yes MIL Classifiers about categories c c c Output logic of 、 h ^ k , d \hat{h}_{k,d} h^k,d yes h ^ k \hat{\boldsymbol{h}}_k h^k The elements of , as well as h ^ k = α k K h k \hat{\boldsymbol{h}}_k=\alpha_kK\boldsymbol{h}_k h^k=αkKhk. By using softmax function , Instance belongs to the third c c c The prediction probability of is :
p k c = exp ( L k c ) ∑ t = 1 C exp ( L k t ) (9) \tag{9} p_k^c=\frac{\exp(L_k^c)}{\sum_{t=1}^C\exp(L_k^t)} pkc=∑t=1Cexp(Lkt)exp(Lkc)(9)
2.3 Double layer characteristic distillation MIL
Given N N N A package (WSI), Each bag has K n K_n Kn An example , namely X n = { x n , k ∣ k = 1 , 2 , … , K n } , n ∈ { 1 , 2 , … , N } \boldsymbol{X}_n=\{ x_{n,k} | k=1,2,\dots,K_n\},n\in\{ 1,2,\dots,N \} Xn={ xn,k∣k=1,2,…,Kn},n∈{ 1,2,…,N}, Y n Y_n Yn Represents the real label of the package . The characteristics corresponding to each instance are recorded as h n , k \boldsymbol{h}_{n,k} hn,k, It is composed of neural network H \mathbf{H} H extract , namely h n , k = H ( x n , k ) \boldsymbol{h}_{n,k}=\boldsymbol{H}(x_{n,k}) hn,k=H(xn,k). The instances in each package are randomly divided into M M M Pseudo packets , The number of instances in the package is roughly even , X n = { X n m ∣ m = 1 , 2 , … , M } \boldsymbol{X}_n=\{ \boldsymbol{X}_n^m | m = 1,2,\dots,M \} Xn={ Xnm∣m=1,2,…,M}. The label of the pseudo package is marked as the label of its parent package , namely Y n m = Y n Y_n^m=Y_n Ynm=Yn.1 Hierarchy AB-MIL Model record T 1 \text{T}_1 T1, Used to process each pseudo packet , Then each pseudo packet passes T 1 \text{T}_1 T1 The packet probability obtained is :
y n m = T 1 ( { h k = H ( x k ) ∣ x k ∈ X n m } ) (10) \tag{10} y_n^m=\text{T}_1(\{ \boldsymbol{h}_k = \mathbf{H}(x_k)|x_k\in\boldsymbol{X}_n^m \}) ynm=T1({ hk=H(xk)∣xk∈Xnm})(10) T 1 \text{T}_1 T1 The loss function of the layer is defined based on cross entropy :
L 1 = − f r a c 1 M N ∑ n = 1 , m = 1 N , M Y n m log y n m + ( 1 − Y n m ) log ( 1 − y n m ) (11) \tag{11} \mathcal{L}_1=-frac{1}{MN}\sum_{n=1,m=1}^{N,M}Y_n^m\log y_n^m+(1-Y_n^m)\log(1-y_n^m) L1=−frac1MNn=1,m=1∑N,MYnmlogynm+(1−Ynm)log(1−ynm)(11) Then, the probability of each instance in the pseudo packet passes through the formula 8–9 get . Case based probability , The eigenvector of each pseudo packet can be obtained , Among them the first n n n Number of packages m m m The distillation result of a pseudo package is expressed as f ^ n m \hat{\boldsymbol{f}}_n^m f^nm. All distillation results are passed on to 2 Hierarchy AB-MIL T 2 \text{T}_2 T2, The result is the inference of each package label :
y ^ n = T 2 ( { f ^ n m ∣ m ∈ ( 1 , 2 , … , M ) } ) (12) \tag{12} \hat{y}_n=\text{T}_2\left( \left\{ \hat{\boldsymbol{f}}_n^m | m \in (1,2,\dots,M) \right\} \right) y^n=T2({ f^nm∣m∈(1,2,…,M)})(12) T 2 \text{T}_2 T2 The loss of is defined as :
L 2 = 1 N ∑ n = 1 N Y n log y ^ n + ( 1 − Y n ) log ( 1 − y ^ n ) (13) \tag{13} \mathcal{L}_2=\frac{1}{N}\sum_{n=1}^NY_n\log\hat{y}_n+(1-Y_n)\log(1-\hat{y}_n) L2=N1n=1∑NYnlogy^n+(1−Yn)log(1−y^n)(13) Classified Total loss by :
L = arg min θ 1 L 1 + arg min θ 2 L 2 (14) \tag{14} \mathcal{L}=\argmin_{\boldsymbol{\theta}_1}\mathcal{L}_1+\argmin_{\boldsymbol{\theta}_2}\mathcal{L}_2 L=θ1argminL1+θ2argminL2(14) among θ 1 \boldsymbol{\theta}_1 θ1 and θ 2 \boldsymbol{\theta}_2 θ2 It's network parameters .
It should be noted that there are a large number of noise tags in the pseudo packet , Random partitioning does not guarantee that every positive and pseudo packet contains at least one positive instance . Deep learning has a tolerance for noise labels . Besides , The noise level can be roughly the same as M M M hook , Ablation experiments will then be used to evaluate M M M Impact on final performance .
Four characteristic distillation strategies will be considered :
MaxS (maximum selection): T 1 \text{T}_1 T1 After processing , The characteristics of instances with maximum positive probability in pseudo packets are passed to T 2 \text{T}_2 T2;
MaxMinS (maxMin selection): Choose two ;
MAS (maximum attention score selection): Choose the one with the largest attention score ;
AFS (aggregated feature selection): Through the formula 6 Converge .
边栏推荐
- Unsupervised feature learning, clustering and density estimation in unsupervised learning
- P1483 sequence transformation
- 【公式编辑测试】生成函数常用性质及其他(普通生成函数指数生成函数Dirichlet生成函数)
- A super wonderful calculation about% I have seen so far!!!!!!!
- ZOJ 2332 gems (maximum flow composition)
- 顺序表的静态实现
- 现代密码学3.2--计算安全/Computationally security
- C语言:计算M=a+aa+aaa+aaaa+……的前n项之和,例如:如果a=3,n=5,M=3+33+333+3333+33333
- 深度学习的实用层面<吴恩达深度学习_第2课_week1>
- 2019年ccpc哈爾濱站J題 Gym - 102394J Justifying the Conjecture
猜你喜欢
【 Mathématiques combinatoires: 3】 principe de tolérance et d'exclusion
PTA-斐波那契数列(I)
Find the sum of the maximum common divisor and the minimum common multiple of two numbers
数独填写器
Chapter 1 - learning notes of signal processing in real time speech processing practice guide
Modern cryptography 3.2 -- computational security
C language: calculate the sum of the first n items of M = a + AA + AAA + AAAA +... For example, if a = 3, n = 5, M = 3 + 33 + 333 + 3333 + 33333
Nc655 calcul du bétail
Output triangle character array (20 points)
求两个正整数的最大公约数
随机推荐
- Trouver le plus grand diviseur commun de deux entiers positifs
- C语言程序代码 21
- Modèle: Du Jiao shao (inversion de Mobius, théorie des nombres)
- 初学者笔记(输入两个正整数,输出其最大公约数。输入两个正整数m和n,数据之间用空格隔开。)一道题的多种代码
- 【 mathématiques combinatoires】 fonction de génération d'index (prouver que la fonction de génération d'index résout l'arrangement Multi - ensembles)
- 【 mathématiques combinatoires】 équations récursives (résolution d'équations récursives linéaires non homogènes à coefficient constant | types standard et solutions générales d'équations récursives | démonstration de solutions générales d'équations récursi
- 【 mathématiques combinatoires】 identités combinatoires (somme du produit de l'identité combinatoire 1 | preuve 1 | somme du produit de l'identité combinatoire 2 | preuve 2)
- 【 Mathématiques combinatoires 】 identités combinatoires (somme de l'élément supérieur variable 1 identité combinatoire | résumé des trois méthodes de preuve des identités combinatoires | preuve de la somme de l'élément supérieur variable 1 identité combin
- 【 mathématiques combinatoires】 fonction de génération d'indices (concept de fonction de génération d'indices | fonction de génération d'indices de permutation = fonction de génération commune de combinaisons | exemple de fonction de génération d'indices)
- Experiment 1 system response and system stability
- 使用辗转相除法求两个数的最大公约数
- (每日一练c语言)移动数组中的元素
- EE5407: 时空无线通信
- Ee5407: space time wireless communication
- SDUT-OJ按位AND和按位OR
- 【吴恩达深度学习】01_week4_quiz Key concepts on Deep Neural Networks
- 【数字信号处理】相关函数 ( 相关系数与相关函数 | 相关函数定义 )
- [digital signal processing] correlation function (convolution and exchangeability | correlation function does not have exchangeability | derivation process)
- Destination of mung bean frog (Luogu) (expected DP)
- 排列组合计数原则
- 自然常数e的矩阵指数
- 密码学之计算安全加密相关概念
- 离散信号的时域分析01
- C language 100 question exercise plan 46 - integer four arithmetic operation + remainder operation (sign in / punch out question)
- 789. 数的范围(二分查找)
- F. Fair distribution
- Image processing -- partial image enhancement learning in spatial domain and frequency domain
- [bzoj 3744] gty's sister sequence (block + tree array)
- Codeforces round 768 (Div. 2) C. and matching
- abc D - Polynomial division
- 1008数组元素循环右移
- C5 廣義逆矩陣
- 从OWF到密码学方案
- From OWF to cryptography scheme
- Thesis reading (47): dtfd-mil: double tier feature interpretation multiple instance learning for histopathology
- FFT explanation and C language implementation
- 理解DCT与DST【二】:离散余弦变换
- [paper reading] semi supervised learning with ladder networks