当前位置:网站首页>Bole: an easy to use and powerful open source library of pytorch recommendation system

Bole: an easy to use and powerful open source library of pytorch recommendation system

2020-11-10 13:33:40 osc_cdixgnd

source :RUC AI Box

this paper about 3500 word , Recommended reading 5 minute

4 class 53 A model ,27 Data sets , Another recommendation system artifact !

[ Reading guide ] Are you still doubting your life because the recommended model can't be reproduced ? Are you still hesitating about how to get started ? Are you still at a loss for the complexity of data processing ?

Let's take a look at the newly released recommendation algorithm framework !

 

RecBole ( Chinese name :" Bole ", Meaning taking " Bole in the world , And then there's the horse "), By Renmin University of China AI Box Team and Beijing University of Posts and Telecommunications 、 The scientific research team of East China Normal University Joint development product .

The framework implements the recommendation model for different tasks in the recommendation domain , Own from Data processing 、 Model development 、 Algorithm training to scientific evaluation One stop whole process hosting .

stay RecBole In the frame , Users only need to set a few simple configuration parameters ( file 、 Command line 、 There are many ways to run time parameters ) We can quickly implement various models on different datasets , At the same time, its simple development interface is very convenient for related researchers to carry out secondary development and add new model support . This framework has been open source code and corresponding papers .

 

Don't talk much , Look directly at the function ! We support :

  • 53 Kind of Model ( Most of them are newest Deep learning model )

  • 27 individual Data set ( It covers four tasks The most commonly used The experimental data set of )

  • A variety of evaluation methods ( cover All the mainstream The evaluation method of , Support one key settings ).

  • Automatic parameter adjustment ( Embedded practical Hyper parametric search algorithm , Support flexible setting range )

This toolkit can meet most of the recommendation related research needs .

" Bole " The recommended system library will be committed to continuous development and maintenance , Keep the version stable , At the same time, constantly planning more practical 、 Powerful features .

      

Address of thesis :https://arxiv.org/abs/2011.01731

Project home address :https://recbole.io

project Github Address :https://github.com/RUCAIBox/RecBole

Project communication email group :recbole@outlook.com

The framework is introduced  

RecBole The framework has five core features .

1. be based on PyTorch The first mock exam

RecBole In the design, it simplifies the development difficulty of the recommended model as much as possible , Will be the most concise 、 Easy to develop and use the interface to expose to users , The whole framework is based on Python Development , Avoid complicated version dependency and environment configuration as much as possible , Let users really focus on model development and testing . Compared with other similar frameworks , This framework only needs to be in Python Next click Install to enjoy the most complete model and dataset support .

2. Highly flexible and extensible data structure

Data aspect , This framework is fully managed , The user only needs to give the original data in the prescribed format ( Or use the provided script to process ), And then the simple parameter configuration , The data can be cleaned automatically by the framework 、 Division 、 Get ready , At the same time, the data required for each step is packaged in a user-friendly manner , Users only need to pay attention to the forward data flow in the model to complete the development and evaluation of the new model .

In this framework , The overall data flow is :Raw Input →Atomic Files  →Dataset→ Dataloader  → Algorithms . The management of raw data is based on Pandas Realization , Support all kinds of screening and cutting methods , Users can directly operate on the high-level training data and completely trust the underlying data processing to the framework . In the whole data stream , Two new data formats are designed in this framework , The following is a detailed introduction to .

This framework is based on torch.Tensor Data types further design an internal data structure Interaction, This structure can be seen as a result of Tensor A dictionary made up of , The user can get a... Directly from the feature name Batch The data of , Feed the model directly for training , At the same time, it can be like Tensor Call all kinds of methods to customize .

besides , In order to realize the unified management of each data set 、 Unified use , A new data storage format has been developed in this framework , It can support all common datasets and realize efficient storage and loading , Include optional 6 File types , and 4 Data categories . For user private datasets , Only need to deal with this file format, data management can be carried out automatically under this framework .

at present RecBole Support 6 Kind of atomic file , They distinguish by suffix .

Each atomic file can be seen as m That's ok n List of tables ( Not counting the header ), Represents that the file is stored n Different characteristics , common m Bar record . The first line of the file is the header , The form of each column in the header is feat_name:feat_type, Indicates the name and type of the column feature . Convention features are one of four types , They can all be conveniently converted to press batch The tensor of organization .

3. Rich models and datasets

So far, the hottest research is 5 The above data structure and storage format can be used to adapt the recommended scenarios (Social recommendation Will support in the near future ), This framework can automatically select the corresponding data file according to the model category , Users can also develop new combinations , Implement your own unique recommendation model .

The current four categories of frameworks are integrated 53 A recommendation model and 27 The recommended task evaluation data set is commonly used in academic circles , Provides the most extensive and unified model evaluation and comparison criteria . At present, the framework contains the following models :

The supported datasets are as follows ( Users need to download their own raw data copy, Then use the preprocessing script provided by this framework to process or download the processed dataset directly from the address provided ):

4. be based on GPU Accelerated and efficient evaluation

With other libraries will C++ As evaluation, acceleration is different , This framework is based entirely on Python Realization , Make full use of GPU To optimize the parallel tensor operation , Basically put the whole evaluation process in GPU On , The perfect unity of simplicity and efficiency is realized .

At present, most of the evaluation indicators in the recommended tasks are TopK indicators , When calculating this kind of index, we need to take the first one for each user K results , This step is quite time-consuming , In response to this question , This framework attempts to parallelize the process using matrix , Use Mask and Padding Technology builds all the evaluation samples in one n×m In the matrix of , Using a CUDA Of Topk() Methods to speed up the process , After that, we will use NumPy Broadcast mechanism to calculate the recommendation matrix and the target matrix , Get the value of each index , The most computationally intensive part of the whole process can be completely placed in GPU Execute above , It's a great relief CPU Evaluation pressure of , It also improves the overall evaluation speed . Accelerate the visualization process of specific methods, as shown in the figure below .

In view of the sampling bias problem mentioned in many studies , This framework supports the use of efficient implementation of full ranking evaluation to eliminate , So , In the evaluation part of this framework, an optimization algorithm is designed for total sorting , And open up a special interface to the model that needs to carry out full ranking evaluation , After actual testing , In the total ranking evaluation, the advantages of this framework are very obvious ( notes : The following speed measurement results are native test results of development members ).

5. Standard and rich evaluation methods

Facing advanced users and secondary developers , At the same time, it provides a very flexible evaluation interface , Users can use simple code and parameters to implement different combinations of sampling and data segmentation , And pack the commonly used combinations , Fast configuration . As far as we know , This is the recommended open source framework that supports the most comprehensive evaluation method at present . Support different data set segmentation methods 、 Sampling method, etc . The detailed meaning and difference of each evaluation method can refer to our latest CIKM The passage of  [2].

Simple to fit  

Said so much , Is it hard to get started ?

It's not hard at all , Let's start from tutorial Let's see how to use this framework !

1. install

RecBole Based on Python Open source library , As with our most commonly used libraries Conda、Pip、 Three installation methods of source code , Support at the same time Linux and Windows The operating platform of ,  Users can install and use the following simple commands :



conda install -c aibox recbole


 or 


pip install recbole


 or 


git clone https://github.com/RUCAIBox/RecBole.git && cd RecBole
pip install -e . --verbose

2. One click operation

We are Github Provides one click scripts , If you choose to install from source code , You can call :

python run_recbole.py  --model=BPR

With this command, you can directly in ml-100k On the dataset BPR Training and testing of models .

If you want to run other models and datasets , You can use the command line to call :

python run_recbole.py --dataset=[dataset_name] --model=[model_name]


If you use pip or conda install , Just create a new one run.py file , Add the following two lines of code to realize the customized operation of the model and dataset . 

from recbole.quick_start import run_recbole
run_recbole(dataset='ml-100k', model='BPR')

3.  evaluating

meanwhile ,RecBole Provides a wealth of API. At present, recommendation system models emerge in endlessly , But there is often no unified evaluation setting , It is difficult to compare with each other . Considering the special functional requirements of senior researchers and secondary developers , We designed EvalSetting class , And provides a convenient API Achieve unified evaluation settings .

A general process is as follows :



dataset = Dataset(config)                      #  Load the original dataset 
es = EvalSetting(config)                       #  Declare a profile setting 
es.group_by_user()                             # Group  Set up 
es.temporal_ordering()                         # Order  Set up 
es.leave_one_out()                             # Split  Set up 
es.neg_sample_by(1000)                         #  When evaluating  1:1000  Negative sampling 


builded_datasets = dataset.build(es)           #  Data sets 

Statement EvalSetting After the object of the class , call API, Finish right Group、Order、Split、NegSample Set up . For example, the configuration of the above example is : Group by user before splitting data 、 Interactive records are sorted by timestamp 、 Leave one way to divide the data 、1:1000 Negative sampling is used to generate test data . Then call Dataset Class build Method , Pass in the configured EvalSetting class , The data set can be processed according to this evaluation configuration , Return some processed Dataset object . 

Of course , For some common evaluation settings , We support adding presets with one click . For example, the above example can also be written as :

dataset = Dataset(config)                     #  Load the original dataset 


es = EvalSetting(config)                       #  Declare a profile setting 
es.TO_LS()
es.uni1000()
builded_datasets = dataset.build(es)           #  Data sets 

4. Automatic parameter adjustment

Last , There's a surprise waiting for you , This framework is embedded with an automatic parameter adjustment tool , It can perfectly support the hyper parameter search on each model , After setting the search range, you can automatically search and save the parameters with one key .

from recbole.trainer import HyperTuning
from recbole.quick_start import objective_function


hp = HyperTuning(objective_function, algo='exhaustive',
                 params_file=params_file, fixed_config_file_list=config_file_list)


hp.run()


hp.export_result(output_file='hyper_example.result')

Don't you have to worry about the adjustment of metaphysics anymore ? Come and try , There will be more powerful functions waiting for you !

What are you waiting for ? Click to read the original text , Put it on RecBole frame , Repeat the classic recommendation model !

Reference:

[1]Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Kaiyuan Li, Yushuo Chen,Yujie Lu, Hui Wang, Changxin Tian, Xingyu Pan, Yingqian Min, Zhichao Feng,Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, andJi-Rong Wen. 2020. RecBole: Towards a Unified, Comprehensive and EfficientFramework for Recommendation Algorithms.arXiv preprint arXiv:2011.01731(2020).

[2]Wayne Xin Zhao, Junhua Chen, Pengfei Wang, Qi Gu, and Ji-Rong Wen. Revisiting Alternative Experimental Settings for Evaluating Top-N Item Recommendation Algorithms. In CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020, Mathieu d’Aquin, Stefan Dietze, Claudia Hauff, Edward Curry, and Philippe Cudré-Mauroux (Eds.). ACM, 2329–2332. 

edit : Huang Jiyan

版权声明
本文为[osc_cdixgnd]所创,转载请带上原文链接,感谢