#### DMCA

## Effective feature construction by maximum common subgraph sampling (2011)

Venue: | MACHINE LEARNING |

Citations: | 6 - 1 self |

### Citations

14016 |
Computers and Intractibility: A Guide to the Theory of NPcompleteness W.H
- Garey, S
- 1983
(Show Context)
Citation Context ... in gray. H, denoted G H, if and only if G is isomorphic to a subgraph of H. The subgraph isomorphism problem, that is, the problem of deciding whether G is subgraph isomorphic to H, is NP-complete =-=[7]-=-; this also holds for outerplanar graphs. A block-and-bridge-preserving (BBP) subgraph isomorphism from G to H is a subgraph isomorphism from G to H, denoted G v H, such that (i) {u, v} ∈ E(G) is a br... |

1195 | Graph Theory
- Diestel
- 2010
(Show Context)
Citation Context ... for the feature generation method discussed in Sect. 2.3. 2.1 Graph Theoretical Concepts We now formally define the necessary graph theoretical concepts. For an overview of graph theory, we refer to =-=[6]-=-. A labeled graph is a quadruple G(V,E,Σ, λ), with V a finite set of vertices and E ⊆ {{u, v} | u, v ∈ V } a set of edges. Σ is a finite set of labels and λ : V ∪ E → Σ is a function assigning a label... |

743 | Statistical comparisons of classifiers over multiple data-sets
- Demšar
(Show Context)
Citation Context ...thods in two ways. On the one hand, generalization over datasets follows from the win/loss-ratio. In particular, we use the Friedman test combined with a Nemenyi post-hoc test to compute significance =-=[14]-=-. The Friedman test is a non-parametric test for statistical comparisons of classifiers over multiple datasets. It ranks the algorithms for each dataset separately, with the best performing algorithm ... |

643 | gSpan: Graph-based substructure pattern mining
- Yan, Han
- 2002
(Show Context)
Citation Context ...ncy, or top-k according to a correlation measure (e.g., χ2). Graph mining systems then perform a complete search through the entire graph space, enumerating all subgraphs satisfying these constraints =-=[1, 2]-=- or even exhaustively enumerating all possible subgraphs [3]. Usually the resulting patterns are not used directly. Instead, they are used as features in combination with traditional machine learning ... |

570 | Learning to classify text using support vector machines - Joachims - 2001 |

545 |
A note on inductive generalization
- Plotkin
- 1970
(Show Context)
Citation Context ...eral generalization. Two such frameworks are well-known [16]: when working under θ-subsumption, the minimally general generalization is unique and is therefore called the least general generalization =-=[18]-=-, while when working under OI-subsumption [16] it is – as the MCS – not necessarily unique. On the other hand, the size of the least general generalization under θ-subsumption of two objects may be as... |

452 | Algorithms for the assignment and transportation problems
- Munkres
- 1957
(Show Context)
Citation Context ...g between the set of children of Gri and the set of children of Hsj (matching BSSs with BSSs and BPSs with BPSs, and checking that the connecting edges have identical labels) using Munkres’ algorithm =-=[8]-=-. The weights represent the size of an MCS between the pairs of children. In order to find an MCS of two BSSs G|oG[xi,r[ and H|oH [yj ,s[ (splitting a block BG and a block BH respectively), where r an... |

312 | Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions
- Provost, Fawcett
- 1997
(Show Context)
Citation Context ...dered state-of-theart for the classification of small molecules [10]. As implementation we used SVMlight [11]. To evaluate the classification models, we use the area under the ROC curve (AUROC) score =-=[12]-=- and the H score introduced by Hand [13], who shows that AUROC fails to take into account the relative costs of misclassifications of different classifiers. The H score does not suffer from this probl... |

174 | A graph distance metric based on the maximal common subgraph
- Bunke, Shearer
- 1998
(Show Context)
Citation Context ...general generalization is closely related to that of a distance measure. For instance, the notion of maximum common subgraph under subgraph isomorphism is used in Bunke and Shearer’s distance measure =-=[27]-=-, while the one we are using (based on BBP-subgraph isomorphism) was used by Schietgat et al. to construct a metric [4]. Furthermore, as kernels can be viewed as a kind of similarity measure our work ... |

145 | A survey of kernels for structured data. - Gartner - 2003 |

139 | Frequent sub-structure-based approaches for classifying chemical compounds
- Deshpande, Kuramochi, et al.
- 2003
(Show Context)
Citation Context ...n subgraphs of interest are formulated and all subgraphs satisfying the constraints are generated. A wide variety of different constraints has been considered in graph mining, such as frequency-based =-=[1, 22]-=-, generality-based, using one or two classes, imposing syntactic constraints, and combinations of those with particular subclasses of patterns such as paths [23]. In addition, there has been research ... |

112 |
Propositionalization approaches to relational data mining
- Kramer, Lavrač, et al.
- 2001
(Show Context)
Citation Context ...n be regarded as propositionalizing a relational or graph-based representation as is common in logical and relational learning [16]. Various techniques have been used to generate features of interest =-=[17]-=-. Our approach differs from these propositionalization approaches in that it works bottom-up and also that it computes pairwise minimally general generalizations of the examples that can be used as fe... |

92 |
Molecular feature mining in hiv data
- Kramer, Raedt, et al.
- 2001
(Show Context)
Citation Context ...aph mining, such as frequency-based [1, 22], generality-based, using one or two classes, imposing syntactic constraints, and combinations of those with particular subclasses of patterns such as paths =-=[23]-=-. In addition, there has been research on correlated pattern mining [2], where the goal is to find the top-k patterns according to a statistical significance measure such as χ2 or information gain. In... |

84 | Maximum common subgraph isomorphism algorithms for the matching of chemical structures.
- Raymond, Willett
- 2002
(Show Context)
Citation Context ...on, based on homomorphism is more common. There are different ways to define and compute maximum common subgraphs. A large number of approaches can be related to distance metrics [19]. Raymond et al. =-=[20]-=- give an elaborate overview of existing similarity measures for molecules that are graph-based. Most of these algorithms avoid the computational complexity by computing approximate values, as the maxi... |

75 | Similarity-based virtual screening using 2D fingerprints. Drug Discov.
- Willett
- 2006
(Show Context)
Citation Context ...d against three state-of-the-art methods that construct features for graphs: one method that performs correlated subgraph mining [2] and two methods that exhaustively enumerate all possible subgraphs =-=[3, 10]-=-. We describe these methods using the notation introduced in Sect. 2.3. First, we consider a correlated graph miner [2], which traverses a search space in order to find the top-k correlated graph patt... |

73 | Cyclic pattern kernels for predictive graph mining - Horváth, Gärtner, et al. - 2004 |

66 |
Measuring classifier performance: a coherent alternative to the area under the roc curve
- Hand
- 2009
(Show Context)
Citation Context ...tion of small molecules [10]. As implementation we used SVMlight [11]. To evaluate the classification models, we use the area under the ROC curve (AUROC) score [12] and the H score introduced by Hand =-=[13]-=-, who shows that AUROC fails to take into account the relative costs of misclassifications of different classifiers. The H score does not suffer from this problem. For all experiments, a stratified 10... |

51 |
Information theoretical analysis of multivariate correlation
- Watanabe
- 1960
(Show Context)
Citation Context ...urther discriminated by any classification method. Hence, a high uniqueness is a desirable property. Second, we report the generalization of the mutual information measure, i.e. the total correlation =-=[15]-=- (also known as the multivariate constraint or multiinformation) to express the amount of redundancy existing among the set of features considered as random variables. The total correlation (TC) is de... |

45 | Limitations of Learning Via Embeddings in Euclidean Half Spaces,
- Ben-David, Eiron, et al.
- 2002
(Show Context)
Citation Context ...ure space dimensions will be poorly correlated with the target function. As a consequence, even when using large margin classifiers, one can fail to obtain models with good generalization performance =-=[31]-=-. In order to tackle these issues, several remedies have been proposed, from down-weighting the contribution of larger fragments and/or bounding a priori their size, to a direct manipulation of the Gr... |

43 |
Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity
- Swamidass, Chen, et al.
- 2005
(Show Context)
Citation Context ...ll line, approximately 3,500 compounds are provided together with information on their cancer-inhibiting action, which defines a binary classification problem. We use the datasets of Swamidass et al. =-=[9]-=-, which are available from the authors upon request. Each molecule is described in the Tripos Sybyl MOL2 format.3 From this we extract a graph in which each vertex corresponds to an atom and each edge... |

42 | Comparison of descriptor spaces for chemical compound retrieval and classification.
- Wale, Watson, et al.
- 2007
(Show Context)
Citation Context ...R-MCS Fk(G∗,G∗) graphs no F-MCS arg maxk freq(F(G∗,G∗),G) graphs no C-MCS arg maxk χ 2(F(G∗,G∗),G) graphs no State-of-the-art methods C-GP arg maxk χ 2(A,G) [2] graphs no A-GP size(freq(A,G) ≥ 1) ≤ s =-=[3]-=- graphs no FP2 size(freq(P,G) ≥ 1) ≤ s [10] sequences yes Parameter settings for MCS variants For R-MCS, F-MCS and C-MCS, we chose k = 1000. Since R-MCS is a non-deterministic method, it was always ru... |

42 | Distance induction in first order logic
- Sebag
- 1997
(Show Context)
Citation Context ...the fact that choosing random features can be better than choosing them according to specific criteria, have already been noted in other contexts e.g., for selecting features in distance construction =-=[25]-=-. Recently, the randomization idea has also been suggested in the area of pattern mining. Chaoji et al. [26] have introduced a feature construction method that obtains good patterns by sampling under ... |

39 | Frequent subgraph mining in outerplanar graphs - Horváth, Ramon, et al. - 2006 |

31 | Logical and relational learning
- Raedt
- 2008
(Show Context)
Citation Context ...work is related to various streams of research. Firstly, our technique can be regarded as propositionalizing a relational or graph-based representation as is common in logical and relational learning =-=[16]-=-. Various techniques have been used to generate features of interest [17]. Our approach differs from these propositionalization approaches in that it works bottom-up and also that it computes pairwise... |

27 | S.: Don't be afraid of simpler patterns
- Bringmann, Zimmermann, et al.
(Show Context)
Citation Context ...ncy, or top-k according to a correlation measure (e.g., χ2). Graph mining systems then perform a complete search through the entire graph space, enumerating all subgraphs satisfying these constraints =-=[1, 2]-=- or even exhaustively enumerating all possible subgraphs [3]. Usually the resulting patterns are not used directly. Instead, they are used as features in combination with traditional machine learning ... |

17 | Graphrank: Statistical modeling and mining of significant subgraphs in the feature space
- He, Singh
- 2006
(Show Context)
Citation Context ...or molecular applications – a much simpler approach without strong guarantees may well achieve better results both in terms of predictive performance and efficiency. It is interesting to note that in =-=[24]-=- the authors propose to rank the subgraphs returned by a frequent graph miner according to a notion of statistical significance7 and show that in a chemical database the selected features are typicall... |

14 | ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns
- Chaoji, Hasan, et al.
- 2008
(Show Context)
Citation Context ... already been noted in other contexts e.g., for selecting features in distance construction [25]. Recently, the randomization idea has also been suggested in the area of pattern mining. Chaoji et al. =-=[26]-=- have introduced a feature construction method that obtains good patterns by sampling under diversity constraints. However, the suggested method requires the user to tune and specify two parameters th... |

9 | P.: Classi of small molecules by two- and three-dimensional decomposition kernels - Ceroni, Costa, et al. - 2007 |

8 | Deriving distance metrics from generality relations
- Raedt, Ramon
- 2009
(Show Context)
Citation Context ... the alternative notion, based on homomorphism is more common. There are different ways to define and compute maximum common subgraphs. A large number of approaches can be related to distance metrics =-=[19]-=-. Raymond et al. [20] give an elaborate overview of existing similarity measures for molecules that are graph-based. Most of these algorithms avoid the computational complexity by computing approximat... |

7 | H.: An efficiently computable graph-based metric for the classification of small molecules
- Schietgat, Ramon, et al.
- 2008
(Show Context)
Citation Context ...computing maximum common subgraphs in general is an NP-hard problem, a polynomial-time algorithm exists for outerplanar graphs in combination with the block-and-bridge-preserving subgraph isomorphism =-=[4]-=-. Outerplanar graphs can be embedded in the plane such that all of their vertices lie on the outside of the graph. It is known that 95% of the molecules in the NCI1 collection are outerplanar [5], whi... |

2 |
H.: Learning to classify structured data by graph propositionalization
- Karunaratne, Boström
(Show Context)
Citation Context ... is suitable for molecules [4]. An alternative approach to reduce the complexity could be to consider common substructures which can be computed more easily, such as multisets of common vertex labels =-=[21]-=-. However, an important drawback is that the more complex shared substructures are not taken into account. Secondly, our work is related to the common practice in constraint-based graph mining, where ... |