当前位置:网站首页>Chapter 13 Bayesian Network Practice

Chapter 13 Bayesian Network Practice

2022-08-06 07:06:41Sang Zhiwei 0208

1 The derivation, application and classification of Naive Bayes

1.1 Derivation of Naive Bayes

Naive Bayes is a supervised learning algorithm that applies Bayes' theorem based on the naive assumption that "features are independent" (ie: the probability of one feature appearing is independent of other features).

For a given eigenvectorx_{1},x_{2}...x_{n}, the probability of the class y can be calculated according to BayerThe Sterling formula gets:

P(y|x_{1},x_{2},...,x_{n})=\frac{P(y)P(x_{1},x_{2},..,x_{n}|y)}{P(x_{1},x_{2},..,x_{n})}

Use the naive independence assumption: P(x_{i}|y,x_{1},...,x_{i-1},x_{i+1},...,x_{n})=P(x_{i}|y)

With a given sample, P(x_{1},x_{2},...,x_{n})is a constant: P(y|x_{1},x_{2},...,x_{n})\propto P(y)\prod_{i=1}^{n}P(x_{i}|y)

Therefore\widehat{y}=\underset{y}{argmax}P(y)\prod_{i=1}^{n}P(x_{i}|y)

1.2 Applications of Naive Bayes

Naive Bayesian algorithm is widely used in real life, such as text classification, spam classification, credit evaluation, phishing website detection and so on.

1.3 Naive Bayesian Classification

  • Gaussian Naive Bayes——P(x_{i}|y)=\frac{1}{\sqrt{2\pi }\sigma _{y}}exp(-\frac{(x_{i}-\mu _{y})^{2}}{2\sigma _{y}^{2}}), use maximum likelihood estimation (MLE) for the parameters.
  • Multinomial Distribution Naive Bayes - For each class y, the parameter is \theta _{y}=(\theta _{y1},\theta _{y2},...,\theta _{yn}), where n is the number of features, P(x_{i}|y) with probability \theta _{yi}.The parameter \theta _{y}The result of using maximum likelihood estimation is:\widehat{\theta}_{yi}=\frac{N_{yi}+\alpha }{N_{y}+\alpha \cdot n}, \alpha \geq 0.Assuming the training set is T, then \left\{\begin{matrix} N_{yi}=\sum_{x\in T }x_{i}\\ N_{y}=\sum_{i=1}^{|T|}N_{yi} \end{matrix}\right.,Among them, \alpha =1 is called Laplace smoothing; \alpha <1It is called Lidstone smoothing.

2 Processing flow of text data

(1) Crawl data

(2) Segment the text, which can be divided into Chinese word segmentation and English word segmentation. English word segmentation can be done with spaces, and Chinese word segmentation can be done by jieba word segmentation. Refer to https://blog.csdn.net/qwertyuiop0208/articleMethod 1 of text feature extraction in /details/125251521.

(3) Data preprocessing (including data cleaning and correction, etc.) refer to https://blog.csdn.net/qwertyuiop0208/article/details/125926133

(4) Standardize the data

(5) Convert strings into vectors by feature extraction methods such as TF-IDF or Word2vec.

(6) Modeling and model evaluation with algorithms such as machine learning.

3 Use TF-IDF to get text features

If a word or phrase has a high probability of appearing in a certain article and rarely appears in other articles, it is considered that the word or phrase has a good ability to distinguish between categories and is suitable for classification.TF-IDF is used to evaluate the importance of a word to a document or one of the documents in a corpus.

For details, see: https://blog.csdn.net/qwertyuiop0208/article/details/125251521.

4 Use of Word2vec

The essence is to establish a 3-layer neural network, map all words into vectors of a certain length; take a certain window range as the shower of the current word, and estimate the words in the window.It contains two algorithms, skip-gram and CBOW. The biggest difference between them is that skip-gram predicts the words around the center word through the center word, while CBOW predicts the center word through the surrounding words.

原网站

版权声明
本文为[Sang Zhiwei 0208]所创,转载请带上原文链接,感谢
https://chowdera.com/2022/218/202208060623123417.html

随机推荐