当前位置:网站首页>Analysis of query intention recognition

Analysis of query intention recognition

2020-11-06 01:32:11 Elementary school students in IT field

outline

Recently, I've been studying search technology , In the work, it mainly involves the realization of information search function . We used elasticsearch Search engine ,es Basics and es Advanced 1. Because of the need to iterate over the search function , So I continue to study the search principle and performance optimization . This paper mainly studies the following points :

What is search
Search for metrics
Intention recognition
query rewrite

What is search

The technical construction of a search engine mainly includes three parts :
(1) Yes query The understanding of the
(2) To the content ( file ) The understanding of the
(3) Yes query And content ( file ) Match and sort

General evaluation index of search

Basic indicators :
Recall rate (Recall)= Number of related documents checked out / Number of related documents , Also known as recall ,R∈[0,1]
Accuracy rate (Precision)= Number of related documents checked out / Number of documents checked out , Also known as the precision rate ,P∈[0,1]
F value : Recall rate R And accuracy P The harmonic mean of
The stage of search development :

At the beginning of application : Keyword based search
Application development period : Full text search based on main and subheadings
Maturity of application : Ranking optimization for search
LTR
The evolutionary period of application : Personalized search
Intention recognition /“ One thousand thousand ”/ Search suggestions, etc

Intention recognition

What is it?
Classify sentences or what we often say query Divide it into corresponding intention categories
Belong to “ Yes query The understanding of the ” part
It's essentially a question of classification
General process of intention recognition search :
S1. User's original query yes “michal jrdan”
S2. Query Correction The result of the module is :“Michael Jordan”
S3. Query Suggestion The result of module pull-down prompt is :“Michael Jordan berkley” and “Michael Jordan NBA”, Suppose the user chooses “Michael Jordan berkley”
S4. Query Expansion The result of query extension of the model is :“Michael Jordan berkley” and “Michael I. Jordan berkley”
S5. Query Classification The result of module query and classification is :academic
S6. Last semantic tag (Semantic Tagging) Module for Named Entity Recognition 、 The result of attribute recognition is :[Michael Jordan: The person's name ][berkley:location]:academic
The premise of intention recognition
The division of intention : Skill / field

Requirements classification of user query :

(1) Navigation class
(2) Information class
(3) Transaction class

The concept is introduced :

A complete interaction between users and search engines is called a Search Session, stay Session The information provided in includes : User query words (Query), The title of the search result the user clicked (Title), If the user is Session During the change of query words ( For example, from Query1 -->Query2), Then subsequent searches and clicks will be recorded , Until the user leaves the search , be Session end .

The method of intention recognition

1. A list of words / Rule analysis
2. Based on the query click log – Generally a search log record will include time - Query string - Click on URL Record - Information such as position in the result .
3. Machine learning methods ( Mining Based on rules , be based on Bayes、LR、SVM And so on )– Classification problem
query The classification of
eg: Identify the attributes of each entity word , Go to the index and match the corresponding fields exactly , So as to improve the accuracy of recall
4. Based on Neural Networks ( Deep learning )–FastText

The difficulty of intention recognition

1、 The input is not standard , I have already introduced , The expression of the same appeal by different users is different .
2、 Multi purpose , The query term is :” water ”, It's mineral water , Or make-up water for girls .
3、 Data cold start . When user behavior data is small , It's hard to get accurate intentions .
4、 There is no fixed evaluation standard .pv,ipv,ctr,cvr This kind of quantifiable index is the overall evaluation of the search system , There is no standard quantitative index for user intention prediction .
query rewrite
query rewrite , Category related , Named entity recognition and
query Rewriting includes :
query error correction – If the search engine returns an empty result / Or too little , At this time, the processing of spelling correction should be added
query Expand :
eg. “Michael Jordan berkley” and “Michael I. Jordan berkley”
(1) Synonym extension table
(2) Use word vectors to expand synonyms
(3) If query No corresponding return , Then expand the original according to the historical data of users query
query Delete – Decide which to discard / Some words ( Entity recognition )

Reference material
https://www.jianshu.com/p/e46eae028af3
https://blog.csdn.net/shijing_0214/article/details/71250327
https://blog.csdn.net/shijing_0214/article/details/71080642

版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢