当前位置:网站首页>How to run 40 regression models with a few lines of code

How to run 40 regression models with a few lines of code

2021-05-12 10:40:05 InfoQ

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":" This article was originally published in Towards Data Science Blog , The original author Ismael Arayjo to grant authorization ,InfoQ Translate and share ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" This article teaches you how to use it Lazy Predict Run more than 40 A machine learning model for regression projects ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Suppose you need to execute a regression machine learning project . You've analyzed your data , Did some data cleaning , Created some dummy variables , Now? , It's time to run the machine learning regression model . What are the top ten models you can think of ? Most people may not know that there are “ Ten regression models ”. If you don't know , Don't worry , Because at the end of this article , You can not only run 10 A machine learning regression model , And it works 40 Multiple machine learning regression models ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" A few weeks ago , I wrote an article on my blog called 《"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/how-to-run-30-machine-learning-models-with-2-lines-of-code-d0f94a537e52?fileGuid=2V67vFzHJsUWw1aV","title":"","type":null},"content":[{"type":"text","text":" How to run with a few lines of code 30 A machine learning model "}]},{"type":"text","text":"》("},{"type":"text","marks":[{"type":"italic"}],"text":"How to Run 30 Machine Learning Models with a Few Lines of Code"},{"type":"text","text":") The article , Very good response . actually , This is my most popular blog post so far . In that blog post , I created a category project to try Lazy Predict. Now? , I'm going to test in a regression project Lazy Predict. therefore , I'm going to use a typical Seattle house price dataset , stay Kaggle You can find ."}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Lazy Predict What is it? ?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" It doesn't need a lot of code ,Lazy Predict Can help build dozens of models , And help to understand which models work better without any parameter adjustment . The best way to explain how it works is to use a small project , Let's start now ."}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" Return to project use Lazy Predict"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" First , To install Lazy Predict, You can "},{"type":"codeinline","content":[{"type":"text","text":"pip install lazypredict"}]},{"type":"text","text":" Return the project to your terminal . It's very simple . Next , Let's import some libraries for this project . You can find the whole... Here Notebook."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Importing important libraries\nimport pyforest\nfrom lazypredict.Supervised import LazyRegressor\nfrom pandas.plotting import scatter_matrix\n# Scikit-learn packages\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.tree import DecisionTreeRegressor\nfrom sklearn.ensemble import ExtraTreesRegressor\nfrom sklearn import metrics\nfrom sklearn.metrics import mean_squared_error\n# Hide warnings\nimport warnings\nwarnings.filterwarnings(“ignore”)\n# Setting up max columns displayed to 100\npd.options.display.max_columns = 100\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" You can see that I imported "},{"type":"codeinline","content":[{"type":"text","text":"pyforest"}]},{"type":"text","text":" Instead of Pandas and Numpy. stay Notebook in ,PyForest You can import all the important libraries very quickly . I wrote a blog about it , You can "},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/how-to-import-all-python-libraries-with-one-line-of-code-2b9e66a5879f?fileGuid=2V67vFzHJsUWw1aV","title":"","type":null},"content":[{"type":"text","text":" here "}]},{"type":"text","text":" find . Next , Let's import the dataset ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import dataset\ndf = pd.read_csv('..\/data\/kc_house_data_train.csv', index_col=0)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Look at what this data set looks like ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7d\/74\/7d354df35a86b32db5e46cab56c13774.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Now let's check the data type ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Checking datatimes and null values\ndf.info()\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c9\/4c\/c90f2ca4702b58270fdfd1379f229a4c.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Here are a few things that caught my attention . The first one is "},{"type":"codeinline","content":[{"type":"text","text":"id"}]},{"type":"text","text":" Column has nothing to do with this small item . however , If you want to study the project more deeply , You should check for duplicates . in addition ,"},{"type":"codeinline","content":[{"type":"text","text":"date"}]},{"type":"text","text":" Column is an object type . It should be changed to DateTime type . Of these columns "},{"type":"codeinline","content":[{"type":"text","text":"zipcode"}]},{"type":"text","text":","},{"type":"codeinline","content":[{"type":"text","text":"lat"}]},{"type":"text","text":" and "},{"type":"codeinline","content":[{"type":"text","text":"long"}]},{"type":"text","text":" It may have little or no connection with price . However , Because the goal of this project is to demonstrate "},{"type":"codeinline","content":[{"type":"text","text":"lazy predict"}]},{"type":"text","text":", So I'll keep them ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Next , Before running the first model , Let's look at some statistics , To find out what needs to be changed ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/b0\/77\/b074dcfab6b42256c507a9c69c698a77.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Yes , I saw some interesting things . First , There is a house with 33 bedrooms , That can't be true . So I checked it on the Internet , It turns out that I use it "},{"type":"codeinline","content":[{"type":"text","text":"id"}]},{"type":"text","text":" Found the house , It actually has 3 bedrooms . You can "},{"type":"link","attrs":{"href":"https:\/\/www.zillow.com\/homedetails\/8033-Corliss-Ave-N-Seattle-WA-98103\/48795791_zpid\/?fileGuid=2V67vFzHJsUWw1aV","title":"","type":null},"content":[{"type":"text","text":" here "}]},{"type":"text","text":" Find the house . Besides , Some houses don't look like they have a bathroom . I'll include at least 1 A bathroom , So we can clean up the data ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Fixing house with 33 bedrooms\ndf[df['bedrooms'] == 33] = df[df['bedrooms'] == 3]\n# This will add 1 bathroom to houses without any bathroom\ndf['bathrooms'] = df.bedrooms.apply(lambda x: 1 if x < 1 else x)\n"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" Split training set and test set "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" We can now split the training set and the test set . But before that , Let's make sure the code doesn't appear "},{"type":"codeinline","content":[{"type":"text","text":"nan"}]},{"type":"text","text":" or "},{"type":"codeinline","content":[{"type":"text","text":"infinite"}]},{"type":"text","text":" Value ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Removing nan and infinite values\ndf.replace([np.inf, -np.inf], np.nan, inplace=True)\ndf.dropna(inplace=True)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Now divide the dataset into X and Y Two variables . I'll assign to the training set 75% Data set of , Give test set 25%."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Creating train test split\nX = df.drop(columns=['price])\ny = df.price\n# Call train_test_split on the data and capture the results\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=3,test_size=0.25)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" It's time to have some fun ! The following code will run 40 Multiple models , And display the R-Squared and RMSE. To prepare , Start !"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"reg = LazyRegressor(ignore_warnings=False, custom_metric=None)\nmodels, predictions = reg.fit(X_train, X_test, y_train, y_test)\nprint(models)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/47\/f9\/47146d1a8a5b72d05467e11816ed48f9.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" wow ! For the work spent on it , These results are very good . For a normal model , These are very good R-Squared and RMSE. As we can see , We're running 41 It's a normal model , And get the indicators we need , You can see the time spent on each model . Not bad at all . that , How do you know if these results are correct ? By running a model , We can see the results , See if it's close to what we got . Shall we test the histogram based gradient lifting regression tree ? If you've never heard of this algorithm , Don't worry about , Because I've never heard of it either . You can "},{"type":"link","attrs":{"href":"https:\/\/machinelearningmastery.com\/histogram-based-gradient-boosting-ensembles\/?fileGuid=2V67vFzHJsUWw1aV","title":"","type":null},"content":[{"type":"text","text":" here "}]},{"type":"text","text":" Find an article about it ."}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" Review the results "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" First , Let's use it scikit-learn Import this model ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Explicitly require this experimental feature\nfrom sklearn.experimental import enable_hist_gradient_boosting\n# Now you can import normally from ensemble\nfrom sklearn.ensemble import HistGradientBoostingRegressor\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Besides , We also created a function to check the metrics of the model ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Evaluation Functions\ndef rmse(model, y_test, y_pred, X_train, y_train):\nr_squared = model.score(X_test, y_test)\nmse = mean_squared_error(y_test, y_pred)\nrmse = np.sqrt(mse)\nprint(‘R-squared: ‘ + str(r_squared))\nprint(‘Mean Squared Error: ‘+ str(rmse))\n# Create model line scatter plot\ndef scatter_plot(y_test, y_pred, model_name):\nplt.figure(figsize=(10,6))\nsns.residplot(y_test, y_pred, lowess=True, color='#4682b4',\nline_kws={'lw': 2, 'color': 'r'})\nplt.title(str('Price vs Residuals for '+ model_name))\nplt.xlabel('Price',fontsize=16)\nplt.xticks(fontsize=13)\nplt.yticks(fontsize=13)\nplt.show()\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Last , Let's run the model and see the results ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Histogram-based Gradient Boosting Regression Tree\nhist = HistGradientBoostingRegressor()\nhist.fit(X_train, y_train)\ny_pred = hist.predict(X_test)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" look ! We use it Lazy Predict The result is very close to this one . It seems that it really works ."}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" The last thought "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Lazy Predict It's a magic Library , Easy to use , And it's very fast , It takes very little code to run a normal model . You can use 2 To 3 Line of code to manually set , You don't need to manually set up multiple normal models . Bear in mind , Don't take the results as the final model , The results should always be reviewed , To make sure the library works properly . As I mentioned in other blogs , Data science is a complex field ,Lazy Predict It doesn't replace the expertise of those who optimize the model . Please let me know how it works for you ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" The authors introduce :"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Ismael Araujo, Working in New York , Data scientist 、 Machine learning Engineer ."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" Link to the original text :"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/towardsdatascience.com\/how-to-run-40-regression-models-with-a-few-lines-of-code-5a24186de7d"}]}]}

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/05/20210512103756778A.html