文章詳情頁

基于 Python 實(shí)踐感知器分類算法

瀏覽：6日期：2022-06-30 10:36:02

Perceptron是用于二進(jìn)制分類任務(wù)的線性機(jī)器學(xué)習(xí)算法。它可以被認(rèn)為是人工神經(jīng)網(wǎng)絡(luò)的第一種和最簡(jiǎn)單的類型之一。絕對(duì)不是“深度”學(xué)習(xí)，而是重要的組成部分。與邏輯回歸相似，它可以快速學(xué)習(xí)兩類分類任務(wù)在特征空間中的線性分離，盡管與邏輯回歸不同，它使用隨機(jī)梯度下降優(yōu)化算法學(xué)習(xí)并且不預(yù)測(cè)校準(zhǔn)概率。

在本教程中，您將發(fā)現(xiàn)Perceptron分類機(jī)器學(xué)習(xí)算法。完成本教程后，您將知道：

Perceptron分類器是一種線性算法，可以應(yīng)用于二進(jìn)制分類任務(wù)。如何使用帶有Scikit-Learn的Perceptron模型進(jìn)行擬合，評(píng)估和做出預(yù)測(cè)。如何在給定的數(shù)據(jù)集上調(diào)整Perceptron算法的超參數(shù)。教程概述

本教程分為3個(gè)部分，共三個(gè)部分。他們是：

感知器算法 Perceptron與Scikit-學(xué)習(xí) 音調(diào)感知器超參數(shù) 感知器算法

Perceptron算法是兩類（二進(jìn)制）分類機(jī)器學(xué)習(xí)算法。它是一種神經(jīng)網(wǎng)絡(luò)模型，可能是最簡(jiǎn)單的神經(jīng)網(wǎng)絡(luò)模型類型。它由將一行數(shù)據(jù)作為輸入并預(yù)測(cè)類標(biāo)簽的單個(gè)節(jié)點(diǎn)或神經(jīng)元組成。這可以通過計(jì)算輸入的加權(quán)和和偏差（設(shè)置為1）來實(shí)現(xiàn)。模型輸入的加權(quán)總和稱為激活。

激活=權(quán)重*輸入+偏差

如果激活高于0.0，則模型將輸出1.0；否則，模型將輸出1.0。否則，將輸出0.0。

預(yù)測(cè)1：如果激活> 0.0

預(yù)測(cè)0：如果激活<= 0.0

假設(shè)輸入已乘以模型系數(shù)，如線性回歸和邏輯回歸，則優(yōu)良作法是在使用模型之前對(duì)數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化或標(biāo)準(zhǔn)化。感知器是線性分類算法。這意味著它將學(xué)習(xí)在特征空間中使用一條線（稱為超平面）將兩個(gè)類別分開的決策邊界。因此，適用于那些類別可以通過線性或線性模型（稱為線性可分離）很好地分離的問題。該模型的系數(shù)稱為輸入權(quán)重，并使用隨機(jī)梯度下降優(yōu)化算法進(jìn)行訓(xùn)練。一次將來自訓(xùn)練數(shù)據(jù)集的示例顯示給模型，模型進(jìn)行預(yù)測(cè)并計(jì)算誤差。然后，更新模型的權(quán)重以減少示例的誤差。這稱為Perceptron更新規(guī)則。對(duì)于訓(xùn)練數(shù)據(jù)集中的所有示例（稱為時(shí)期）都重復(fù)此過程。然后，使用示例更新模型的過程會(huì)重復(fù)很多次。在每批中，使用較小比例的誤差來更新模型權(quán)重，并且該比例由稱為學(xué)習(xí)率的超參數(shù)控制，通常將其設(shè)置為較小的值。這是為了確保學(xué)習(xí)不會(huì)太快發(fā)生，從而導(dǎo)致技能水平可能較低，這被稱為模型權(quán)重的優(yōu)化（搜索）過程的過早收斂。

權(quán)重（t + 1）=權(quán)重（t）+學(xué)習(xí)率*（expected_i ?預(yù)測(cè)值）* input_i

當(dāng)模型所產(chǎn)生的誤差降至較低水平或不再改善時(shí)，或者執(zhí)行了最大時(shí)期數(shù)時(shí)，訓(xùn)練將停止。

模型權(quán)重的初始值設(shè)置為較小的隨機(jī)值。另外，在每個(gè)訓(xùn)練紀(jì)元之前對(duì)訓(xùn)練數(shù)據(jù)集進(jìn)行混洗。這是設(shè)計(jì)使然，以加速和改善模型訓(xùn)練過程。因此，學(xué)習(xí)算法是隨機(jī)的，并且每次運(yùn)行都會(huì)獲得不同的結(jié)果。因此，優(yōu)良作法是使用重復(fù)評(píng)估來總結(jié)算法在數(shù)據(jù)集上的性能，并報(bào)告平均分類精度。學(xué)習(xí)率和訓(xùn)練時(shí)期數(shù)是算法的超參數(shù)，可以使用啟發(fā)式或超參數(shù)調(diào)整來設(shè)置。

現(xiàn)在我們已經(jīng)熟悉了Perceptron算法，現(xiàn)在讓我們探索如何在Python中使用該算法。

Perceptron 與 Scikit-Learn

可通過Perceptron類在scikit-learn Python機(jī)器學(xué)習(xí)庫中使用Perceptron算法。該類允許您配置學(xué)習(xí)率（eta0），默認(rèn)為1.0。

# define model model = Perceptron(eta0=1.0)

該實(shí)現(xiàn)還允許您配置訓(xùn)練時(shí)期的總數(shù)（max_iter），默認(rèn)為1,000。

# define model model = Perceptron(max_iter=1000)

Perceptron算法的scikit-learn實(shí)現(xiàn)還提供了您可能想探索的其他配置選項(xiàng)，例如提前停止和使用懲罰損失。我們可以通過一個(gè)有效的示例來演示Perceptron分類器。首先，讓我們定義一個(gè)綜合分類數(shù)據(jù)集。我們將使用make_classification（）函數(shù)創(chuàng)建一個(gè)包含1,000個(gè)示例的數(shù)據(jù)集，每個(gè)示例包含20個(gè)輸入變量。該示例創(chuàng)建并匯總了數(shù)據(jù)集。

# test classification dataset from sklearn.datasets import make_classification # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) # summarize the dataset print(X.shape, y.shape)

運(yùn)行示例將創(chuàng)建數(shù)據(jù)集并確認(rèn)數(shù)據(jù)集的行數(shù)和列數(shù)。

(1000, 10) (1000,)

我們可以通過 RepeatedStratifiedKFold類使用重復(fù)的分層k折交叉驗(yàn)證來擬合和評(píng)估Perceptron模型。我們將在測(cè)試裝置中使用10折和3次重復(fù)。

# create the model model = Perceptron()

下面列出了為綜合二進(jìn)制分類任務(wù)評(píng)估Perceptron模型的完整示例。

# evaluate a perceptron model on the dataset from numpy import mean from numpy import std from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import Perceptron # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) # define model model = Perceptron() # define model evaluation method cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # evaluate model scores = cross_val_score(model, X, y, scoring=’accuracy’, cvcv=cv, n_jobs=-1) # summarize result print(’Mean Accuracy: %.3f (%.3f)’ % (mean(scores), std(scores)))

運(yùn)行示例將在綜合數(shù)據(jù)集上評(píng)估Perceptron算法，并報(bào)告10倍交叉驗(yàn)證的三個(gè)重復(fù)中的平均準(zhǔn)確性。鑒于學(xué)習(xí)算法的隨機(jī)性，您的具體結(jié)果可能會(huì)有所不同。考慮運(yùn)行該示例幾次。在這種情況下，我們可以看到該模型實(shí)現(xiàn)了約84.7％的平均準(zhǔn)確度。

Mean Accuracy: 0.847 (0.052)

我們可能決定使用Perceptron分類器作為最終模型，并對(duì)新數(shù)據(jù)進(jìn)行預(yù)測(cè)。這可以通過在所有可用數(shù)據(jù)上擬合模型管道并調(diào)用傳遞新數(shù)據(jù)行的predict（）函數(shù)來實(shí)現(xiàn)。我們可以通過下面列出的完整示例進(jìn)行演示。

# make a prediction with a perceptron model on the dataset from sklearn.datasets import make_classification from sklearn.linear_model import Perceptron # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) # define model model = Perceptron() # fit model model.fit(X, y) # define new data row = [0.12777556,-3.64400522,-2.23268854,-1.82114386,1.75466361,0.1243966,1.03397657,2.35822076,1.01001752,0.56768485] # make a prediction yhat = model.predict([row]) # summarize prediction print(’Predicted Class: %d’ % yhat)

運(yùn)行示例將使模型適合模型并為新的數(shù)據(jù)行進(jìn)行類標(biāo)簽預(yù)測(cè)。

Predicted Class: 1

接下來，我們可以看一下配置模型的超參數(shù)。

調(diào)整感知器超參數(shù)

必須為您的特定數(shù)據(jù)集配置Perceptron算法的超參數(shù)。也許最重要的超參數(shù)是學(xué)習(xí)率。較高的學(xué)習(xí)速度可能會(huì)使模型學(xué)習(xí)速度加快，但可能是以降低技能為代價(jià)的。較小的學(xué)習(xí)率可以得到性能更好的模型，但是訓(xùn)練模型可能需要很長時(shí)間。您可以在本教程中了解有關(guān)探索學(xué)習(xí)率的更多信息：訓(xùn)練深度學(xué)習(xí)神經(jīng)網(wǎng)絡(luò)時(shí)如何配置學(xué)習(xí)率通常以較小的對(duì)數(shù)刻度（例如1e-4（或更小）和1.0）測(cè)試學(xué)習(xí)率。在這種情況下，我們將測(cè)試以下值：

# define grid grid = dict() grid[’eta0’] = [0.0001, 0.001, 0.01, 0.1, 1.0]

下面的示例使用GridSearchCV類以及我們定義的值網(wǎng)格演示了這一點(diǎn)。

# grid search learning rate for the perceptron from sklearn.datasets import make_classification from sklearn.model_selection import GridSearchCV from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import Perceptron # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) # define model model = Perceptron() # define model evaluation method cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # define grid grid = dict() grid[’eta0’] = [0.0001, 0.001, 0.01, 0.1, 1.0] # define search search = GridSearchCV(model, grid, scoring=’accuracy’, cvcv=cv, n_jobs=-1) # perform the search results = search.fit(X, y) # summarize print(’Mean Accuracy: %.3f’ % results.best_score_) print(’Config: %s’ % results.best_params_) # summarize all means = results.cv_results_[’mean_test_score’] params = results.cv_results_[’params’] for mean, param in zip(means, params): print('>%.3f with: %r' % (mean, param))

運(yùn)行示例將使用重復(fù)的交叉驗(yàn)證來評(píng)估配置的每種組合。鑒于學(xué)習(xí)算法的隨機(jī)性，您的具體結(jié)果可能會(huì)有所不同。嘗試運(yùn)行該示例幾次。在這種情況下，我們可以看到，學(xué)習(xí)率比默認(rèn)值小會(huì)導(dǎo)致更好的性能，學(xué)習(xí)率0.0001和0.001均達(dá)到約85.7％的分類精度，而默認(rèn)值1.0則達(dá)到約84.7％的精度。

Mean Accuracy: 0.857 Config: {’eta0’: 0.0001} >0.857 with: {’eta0’: 0.0001} >0.857 with: {’eta0’: 0.001} >0.853 with: {’eta0’: 0.01} >0.847 with: {’eta0’: 0.1} >0.847 with: {’eta0’: 1.0}

另一個(gè)重要的超參數(shù)是使用多少個(gè)時(shí)期來訓(xùn)練模型。這可能取決于訓(xùn)練數(shù)據(jù)集，并且可能相差很大。同樣，我們將以1到1e + 4的對(duì)數(shù)刻度探索配置值。

# define grid grid = dict() grid[’max_iter’] = [1, 10, 100, 1000, 10000]

我們將使用上次搜索中的良好學(xué)習(xí)率0.0001。

# define model model = Perceptron(eta0=0.0001)

下面列出了搜索訓(xùn)練時(shí)期數(shù)的網(wǎng)格的完整示例。

# grid search total epochs for the perceptron from sklearn.datasets import make_classification from sklearn.model_selection import GridSearchCV from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import Perceptron # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) # define model model = Perceptron(eta0=0.0001) # define model evaluation method cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # define grid grid = dict() grid[’max_iter’] = [1, 10, 100, 1000, 10000] # define search search = GridSearchCV(model, grid, scoring=’accuracy’, cvcv=cv, n_jobs=-1) # perform the search results = search.fit(X, y) # summarize print(’Mean Accuracy: %.3f’ % results.best_score_) print(’Config: %s’ % results.best_params_) # summarize all means = results.cv_results_[’mean_test_score’] params = results.cv_results_[’params’] for mean, param in zip(means, params): print('>%.3f with: %r' % (mean, param))

運(yùn)行示例將使用重復(fù)的交叉驗(yàn)證來評(píng)估配置的每種組合。鑒于學(xué)習(xí)算法的隨機(jī)性，您的具體結(jié)果可能會(huì)有所不同。嘗試運(yùn)行該示例幾次。在這種情況下，我們可以看到從10到10,000的時(shí)間段，分類精度幾乎相同。一個(gè)有趣的例外是探索同時(shí)配置學(xué)習(xí)率和訓(xùn)練時(shí)期的數(shù)量，以查看是否可以獲得更好的結(jié)果。

Mean Accuracy: 0.857 Config: {’max_iter’: 10} >0.850 with: {’max_iter’: 1} >0.857 with: {’max_iter’: 10} >0.857 with: {’max_iter’: 100} >0.857 with: {’max_iter’: 1000} >0.857 with: {’max_iter’: 10000}

以上就是基于 Python 實(shí)踐感知器分類算法的詳細(xì)內(nèi)容，更多關(guān)于Python 實(shí)踐感知器分類算法的資料請(qǐng)關(guān)注好吧啦網(wǎng)其它相關(guān)文章！

Python 編程

上一條：python 實(shí)現(xiàn)百度網(wǎng)盤非會(huì)員上傳超過500個(gè)文件的方法下一條：如何編寫python的daemon程序

相關(guān)文章：

1. python GUI庫圖形界面開發(fā)之PyQt5動(dòng)態(tài)(可拖動(dòng)控件大小)布局控件QSplitter詳細(xì)使用方法與實(shí)例2. CSS3實(shí)例分享之多重背景的實(shí)現(xiàn)(Multiple backgrounds)3. js開發(fā)中的頁面、屏幕、瀏覽器的位置原理（高度寬度）說明講解（附圖）4. CSS清除浮動(dòng)方法匯總5. 不要在HTML中濫用div6. XML入門的常見問題(三)7. Python數(shù)據(jù)分析JupyterNotebook3魔法命令詳解及示例8. 父div高度不能自適應(yīng)子div高度的解決方案9. ASP動(dòng)態(tài)include文件10. vue跳轉(zhuǎn)頁面常用的幾種方法匯總

排行榜

					
					python GUI庫圖形界面開發(fā)之PyQt5動(dòng)態(tài)(可拖動(dòng)控件大小)布局控件QSplitter詳細(xì)使用方法與實(shí)例
java語言實(shí)現(xiàn)猜數(shù)字游戲
springboot使JUL實(shí)現(xiàn)日志管理功能
Android實(shí)現(xiàn)動(dòng)態(tài)改變shape.xml中圖形的顏色
python 基于卡方值分箱算法的實(shí)現(xiàn)示例
IDEA下lombok安裝及找不到get,set的問題的解決方法
python實(shí)現(xiàn)web郵箱掃描的示例(附源碼)
JAVA中String介紹及常見面試題小結(jié)
python GUI庫圖形界面開發(fā)之PyQt5滑塊條控件QSlider詳細(xì)使用方法與實(shí)例
python開發(fā)實(shí)例之Python的Twisted框架中Deferred對(duì)象的詳細(xì)用法與實(shí)例
python使用ctypes庫調(diào)用DLL動(dòng)態(tài)鏈接庫