Bole: an easy to use and powerful open source library of pytorch recommendation system
2020-11-10 13:33:40 【osc_cdixgnd】
source ：RUC AI Box
this paper about 3500 word , Recommended reading 5 minute
4 class 53 A model ,27 Data sets , Another recommendation system artifact ！
[ Reading guide ] Are you still doubting your life because the recommended model can't be reproduced ？ Are you still hesitating about how to get started ？ Are you still at a loss for the complexity of data processing ？
Let's take a look at the newly released recommendation algorithm framework ！
RecBole （ Chinese name ：" Bole ", Meaning taking " Bole in the world , And then there's the horse "）, By Renmin University of China AI Box Team and Beijing University of Posts and Telecommunications 、 The scientific research team of East China Normal University Joint development product .
The framework implements the recommendation model for different tasks in the recommendation domain , Own from Data processing 、 Model development 、 Algorithm training to scientific evaluation One stop whole process hosting .
stay RecBole In the frame , Users only need to set a few simple configuration parameters （ file 、 Command line 、 There are many ways to run time parameters ） We can quickly implement various models on different datasets , At the same time, its simple development interface is very convenient for related researchers to carry out secondary development and add new model support . This framework has been open source code and corresponding papers .
Don't talk much , Look directly at the function ！ We support ：
53 Kind of Model （ Most of them are newest Deep learning model ）
27 individual Data set （ It covers four tasks The most commonly used The experimental data set of ）
A variety of evaluation methods （ cover All the mainstream The evaluation method of , Support one key settings ）.
Automatic parameter adjustment （ Embedded practical Hyper parametric search algorithm , Support flexible setting range ）
This toolkit can meet most of the recommendation related research needs .
" Bole " The recommended system library will be committed to continuous development and maintenance , Keep the version stable , At the same time, constantly planning more practical 、 Powerful features .
Address of thesis ：https://arxiv.org/abs/2011.01731
Project home address ：https://recbole.io
project Github Address ：https://github.com/RUCAIBox/RecBole
Project communication email group ：email@example.com
The framework is introduced
RecBole The framework has five core features .
1. be based on PyTorch The first mock exam
RecBole In the design, it simplifies the development difficulty of the recommended model as much as possible , Will be the most concise 、 Easy to develop and use the interface to expose to users , The whole framework is based on Python Development , Avoid complicated version dependency and environment configuration as much as possible , Let users really focus on model development and testing . Compared with other similar frameworks , This framework only needs to be in Python Next click Install to enjoy the most complete model and dataset support .
2. Highly flexible and extensible data structure
Data aspect , This framework is fully managed , The user only needs to give the original data in the prescribed format （ Or use the provided script to process ）, And then the simple parameter configuration , The data can be cleaned automatically by the framework 、 Division 、 Get ready , At the same time, the data required for each step is packaged in a user-friendly manner , Users only need to pay attention to the forward data flow in the model to complete the development and evaluation of the new model .
In this framework , The overall data flow is ：Raw Input →Atomic Files →Dataset→ Dataloader → Algorithms . The management of raw data is based on Pandas Realization , Support all kinds of screening and cutting methods , Users can directly operate on the high-level training data and completely trust the underlying data processing to the framework . In the whole data stream , Two new data formats are designed in this framework , The following is a detailed introduction to .
This framework is based on torch.Tensor Data types further design an internal data structure Interaction, This structure can be seen as a result of Tensor A dictionary made up of , The user can get a... Directly from the feature name Batch The data of , Feed the model directly for training , At the same time, it can be like Tensor Call all kinds of methods to customize .
besides , In order to realize the unified management of each data set 、 Unified use , A new data storage format has been developed in this framework , It can support all common datasets and realize efficient storage and loading , Include optional 6 File types , and 4 Data categories . For user private datasets , Only need to deal with this file format, data management can be carried out automatically under this framework .
at present RecBole Support 6 Kind of atomic file , They distinguish by suffix .
Each atomic file can be seen as m That's ok n List of tables （ Not counting the header ）, Represents that the file is stored n Different characteristics , common m Bar record . The first line of the file is the header , The form of each column in the header is feat_name:feat_type, Indicates the name and type of the column feature . Convention features are one of four types , They can all be conveniently converted to press batch The tensor of organization .
3. Rich models and datasets
So far, the hottest research is 5 The above data structure and storage format can be used to adapt the recommended scenarios (Social recommendation Will support in the near future ), This framework can automatically select the corresponding data file according to the model category , Users can also develop new combinations , Implement your own unique recommendation model .
The current four categories of frameworks are integrated 53 A recommendation model and 27 The recommended task evaluation data set is commonly used in academic circles , Provides the most extensive and unified model evaluation and comparison criteria . At present, the framework contains the following models ：
The supported datasets are as follows （ Users need to download their own raw data copy, Then use the preprocessing script provided by this framework to process or download the processed dataset directly from the address provided ）：
4. be based on GPU Accelerated and efficient evaluation
With other libraries will C++ As evaluation, acceleration is different , This framework is based entirely on Python Realization , Make full use of GPU To optimize the parallel tensor operation , Basically put the whole evaluation process in GPU On , The perfect unity of simplicity and efficiency is realized .
At present, most of the evaluation indicators in the recommended tasks are TopK indicators , When calculating this kind of index, we need to take the first one for each user K results , This step is quite time-consuming , In response to this question , This framework attempts to parallelize the process using matrix , Use Mask and Padding Technology builds all the evaluation samples in one n×m In the matrix of , Using a CUDA Of Topk() Methods to speed up the process , After that, we will use NumPy Broadcast mechanism to calculate the recommendation matrix and the target matrix , Get the value of each index , The most computationally intensive part of the whole process can be completely placed in GPU Execute above , It's a great relief CPU Evaluation pressure of , It also improves the overall evaluation speed . Accelerate the visualization process of specific methods, as shown in the figure below .
In view of the sampling bias problem mentioned in many studies , This framework supports the use of efficient implementation of full ranking evaluation to eliminate , So , In the evaluation part of this framework, an optimization algorithm is designed for total sorting , And open up a special interface to the model that needs to carry out full ranking evaluation , After actual testing , In the total ranking evaluation, the advantages of this framework are very obvious （ notes ： The following speed measurement results are native test results of development members ）.
5. Standard and rich evaluation methods
Facing advanced users and secondary developers , At the same time, it provides a very flexible evaluation interface , Users can use simple code and parameters to implement different combinations of sampling and data segmentation , And pack the commonly used combinations , Fast configuration . As far as we know , This is the recommended open source framework that supports the most comprehensive evaluation method at present . Support different data set segmentation methods 、 Sampling method, etc . The detailed meaning and difference of each evaluation method can refer to our latest CIKM The passage of .
Simple to fit
Said so much , Is it hard to get started ？
It's not hard at all , Let's start from tutorial Let's see how to use this framework ！
RecBole Based on Python Open source library , As with our most commonly used libraries Conda、Pip、 Three installation methods of source code , Support at the same time Linux and Windows The operating platform of , Users can install and use the following simple commands ：
conda install -c aibox recbole or pip install recbole or git clone https://github.com/RUCAIBox/RecBole.git && cd RecBole pip install -e . --verbose
2. One click operation
We are Github Provides one click scripts , If you choose to install from source code , You can call ：
python run_recbole.py --model=BPR
With this command, you can directly in ml-100k On the dataset BPR Training and testing of models .
If you want to run other models and datasets , You can use the command line to call ：
python run_recbole.py --dataset=[dataset_name] --model=[model_name]
If you use pip or conda install , Just create a new one run.py file , Add the following two lines of code to realize the customized operation of the model and dataset .
from recbole.quick_start import run_recbole run_recbole(dataset='ml-100k', model='BPR')
meanwhile ,RecBole Provides a wealth of API. At present, recommendation system models emerge in endlessly , But there is often no unified evaluation setting , It is difficult to compare with each other . Considering the special functional requirements of senior researchers and secondary developers , We designed EvalSetting class , And provides a convenient API Achieve unified evaluation settings .
A general process is as follows ：
dataset = Dataset(config) # Load the original dataset es = EvalSetting(config) # Declare a profile setting es.group_by_user() # Group Set up es.temporal_ordering() # Order Set up es.leave_one_out() # Split Set up es.neg_sample_by(1000) # When evaluating 1:1000 Negative sampling builded_datasets = dataset.build(es) # Data sets
Statement EvalSetting After the object of the class , call API, Finish right Group、Order、Split、NegSample Set up . For example, the configuration of the above example is ： Group by user before splitting data 、 Interactive records are sorted by timestamp 、 Leave one way to divide the data 、1:1000 Negative sampling is used to generate test data . Then call Dataset Class build Method , Pass in the configured EvalSetting class , The data set can be processed according to this evaluation configuration , Return some processed Dataset object .
Of course , For some common evaluation settings , We support adding presets with one click . For example, the above example can also be written as ：
dataset = Dataset(config) # Load the original dataset es = EvalSetting(config) # Declare a profile setting es.TO_LS() es.uni1000() builded_datasets = dataset.build(es) # Data sets
4. Automatic parameter adjustment
Last , There's a surprise waiting for you , This framework is embedded with an automatic parameter adjustment tool , It can perfectly support the hyper parameter search on each model , After setting the search range, you can automatically search and save the parameters with one key .
from recbole.trainer import HyperTuning from recbole.quick_start import objective_function hp = HyperTuning(objective_function, algo='exhaustive', params_file=params_file, fixed_config_file_list=config_file_list) hp.run() hp.export_result(output_file='hyper_example.result')
Don't you have to worry about the adjustment of metaphysics anymore ？ Come and try , There will be more powerful functions waiting for you ！
What are you waiting for ？ Click to read the original text , Put it on RecBole frame , Repeat the classic recommendation model ！
Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Kaiyuan Li, Yushuo Chen,Yujie Lu, Hui Wang, Changxin Tian, Xingyu Pan, Yingqian Min, Zhichao Feng,Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, andJi-Rong Wen. 2020. RecBole: Towards a Unified, Comprehensive and EfficientFramework for Recommendation Algorithms.arXiv preprint arXiv:2011.01731(2020).
Wayne Xin Zhao, Junhua Chen, Pengfei Wang, Qi Gu, and Ji-Rong Wen. Revisiting Alternative Experimental Settings for Evaluating Top-N Item Recommendation Algorithms. In CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020, Mathieu d’Aquin, Stefan Dietze, Claudia Hauff, Edward Curry, and Philippe Cudré-Mauroux (Eds.). ACM, 2329–2332.
edit ： Huang Jiyan
- C++ 数字、string和char*的转换
- Won the CKA + CKS certificate with the highest gold content in kubernetes in 31 days!
- C + + number, string and char * conversion
- C + + Learning -- capacity() and resize() in C + +
- C + + Learning -- about code performance optimization
C + + programming experience (6): using C + + style type conversion
Latest party and government work report ppt - Park ppt
Online ID number extraction birthday tool
Field pointer? Dangling pointer? This article will help you understand!
GVRP of hcna Routing & Switching
- LeetCode 91. 解码方法
- Seq2seq implements chat robot
- [chat robot] principle of seq2seq model
- Leetcode 91. Decoding method
- HCNA Routing＆Switching之GVRP
- GVRP of hcna Routing & Switching
- HDU7016 Random Walk 2
- [Code+＃1]Yazid 的新生舞会
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- HDU7016 Random Walk 2
- [code + 1] Yazid's freshman ball
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- Qt Creator 自动补齐变慢的解决
- HALCON 20.11：如何处理标定助手品质问题
- HALCON 20.11：标定助手使用注意事项
- Solution of QT creator's automatic replenishment slowing down
- Halcon 20.11: how to deal with the quality problem of calibration assistant
- Halcon 20.11: precautions for use of calibration assistant
- "Top ten scientific and technological issues" announced| Young scientists 50 ² forum
- Reverse linked list
- JS data type
- Remember the bug encountered in reading and writing a file
- Singleton mode
- 在这个 N 多编程语言争霸的世界，C++ 究竟还有没有未来？
- In this world of N programming languages, is there a future for C + +?
- js Promise
- js 数组方法 回顾
- ES6 template characters
- js Promise
- JS array method review
- 【Golang】️走进 Go 语言️ 第一课 Hello World
- [golang] go into go language lesson 1 Hello World