文章詳情頁

python爬蟲利器之requests庫的用法(超全面的爬取網頁案例)

瀏覽：66日期：2022-07-02 08:14:28

requests庫

利用pip安裝:pip install requests

基本請求

req = requests.get('https://www.baidu.com/')req = requests.post('https://www.baidu.com/')req = requests.put('https://www.baidu.com/')req = requests.delete('https://www.baidu.com/')req = requests.head('https://www.baidu.com/')req = requests.options(https://www.baidu.com/)1.get請求

參數是字典，我們可以傳遞json類型的參數：

import requestsfrom fake_useragent import UserAgent#請求頭部庫headers = {'User-Agent':UserAgent().random}#獲取一個隨機的請求頭url = 'https://www.baidu.com/s'#網址params={ 'wd':'豆瓣' #網址的后綴}requests.get(url,headers=headers,params=params)

python爬蟲利器之requests庫的用法(超全面的爬取網頁案例)

返回了狀態碼，所以我們要想獲取內容，需要將其轉成text：

#get請求headers = {'User-Agent':UserAgent().random}url = 'https://www.baidu.com/s'params={ 'wd':'豆瓣'}response = requests.get(url,headers=headers,params=params)response.text2.post 請求

參數也是字典，也可以傳遞json類型的參數：

import requests from fake_useragent import UserAgentheaders = {'User-Agent':UserAgent().random}url = 'https://www.baidu.cn/index/login/login' #登錄賬號密碼的網址params = { 'user':'1351351335',#賬號 'password':'123456'#密碼}response = requests.post(url,headers=headers,data=params)response.text

python爬蟲利器之requests庫的用法(超全面的爬取網頁案例)

因為這里需要一個登錄的網頁，我這里就隨便用了一個，沒有登錄，所以顯示的結果是這樣的，如果想要測試登錄的效果，請找一個登錄的頁面去嘗試一下。

3.IP代理

采集時為避免被封IP，經常會使用代理，requests也有相應的proxies屬性。

#IP代理import requests from fake_useragent import UserAgentheaders = {'User-Agent':UserAgent().random}url = 'http://httpbin.org/get' #返回當前IP的網址proxies = { 'http':'http://yonghuming:[email protected]:8088'#http://用戶名:密碼@IP:端口號 #'http':'https://182.145.31.211:4224'# 或者IP：端口號}requests.get(url,headers=headers,proxies=proxies)

代理IP可以去：快代理去找，也可以去購買。http://httpbin.org/get。這個網址是查看你現在的信息：

python爬蟲利器之requests庫的用法(超全面的爬取網頁案例)

4.設置訪問超時時間

可以通過timeout屬性設置超時時間，一旦超過這個時間還沒獲取到響應內容，就會提示錯誤。

#設置訪問時間requests.get('http://baidu.com/',timeout=0.1)

python爬蟲利器之requests庫的用法(超全面的爬取網頁案例)

5.證書問題(SSLError:HTTP)

ssl驗證。

import requests from fake_useragent import UserAgent #請求頭部庫url = 'https://www.12306.cn/index/' #需要證書的網頁地址headers = {'User-Agent':UserAgent().random}#獲取一個隨機請求頭requests.packages.urllib3.disable_warnings()#禁用安全警告response = requests.get(url,verify=False,headers=headers)response.encoding = 'utf-8' #用來顯示中文，進行轉碼response.text

python爬蟲利器之requests庫的用法(超全面的爬取網頁案例)

6.session自動保存cookies

import requestsfrom fake_useragent import UserAgentheaders = {'User-Agent':UserAgent().chrome}login_url = 'https://www.baidu.cn/index/login/login' #需要登錄的網頁地址params = { 'user':'yonghuming',#用戶名 'password':'123456'#密碼}session = requests.Session() #用來保存cookie#直接用session 歹意requests response = session.post(login_url,headers=headers,data=params)info_url = 'https://www.baidu.cn/index/user.html' #登錄完賬號密碼以后的網頁地址resp = session.get(info_url,headers=headers)resp.text

因為我這里沒有使用需要賬號密碼的網頁，所以顯示這樣：

python爬蟲利器之requests庫的用法(超全面的爬取網頁案例)

我獲取了一個智慧樹的網頁

#cookie import requestsfrom fake_useragent import UserAgentheaders = {'User-Agent':UserAgent().chrome}login_url = 'https://passport.zhihuishu.com/login?service=https://onlineservice.zhihuishu.com/login/gologin' #需要登錄的網頁地址params = { 'user':'12121212',#用戶名 'password':'123456'#密碼}session = requests.Session() #用來保存cookie#直接用session 歹意requests response = session.post(login_url,headers=headers,data=params)info_url = 'https://onlne5.zhhuishu.com/onlinWeb.html#/stdetInex' #登錄完賬號密碼以后的網頁地址resp = session.get(info_url,headers=headers)resp.encoding = 'utf-8'resp.text

python爬蟲利器之requests庫的用法(超全面的爬取網頁案例)

7.獲取響應信息

代碼含義 resp.json() 獲取響應內容（以json字符串） resp.text 獲取相應內容（以字符串） resp.content 獲取響應內容（以字節的方式） resp.headers 獲取響應頭內容 resp.url 獲取訪問地址 resp.encoding 獲取網頁編碼 resp.request.headers 請求頭內容 resp.cookie 獲取cookie

到此這篇關于python爬蟲利器之requests庫的用法(超全面的爬取網頁案例)的文章就介紹到這了,更多相關python爬蟲requests庫用法內容請搜索好吧啦網以前的文章或繼續瀏覽下面的相關文章希望大家以后多多支持好吧啦網！

Python 編程

上一條：Python 使用SFTP和FTP實現對服務器的文件下載功能下一條：python使用smtplib模塊發送郵件

相關文章：

1. IntelliJ IDEA創建web項目的方法2. 存儲于xml中需要的HTML轉義代碼3. python numpy中setdiff1d的用法說明4. HTTP協議常用的請求頭和響應頭響應詳解說明（學習）5. python基礎之匿名函數詳解6. ASP中實現字符部位類似.NET里String對象的PadLeft和PadRight函數7. Python多線程實現支付模擬請求過程解析8. ASP.NET MVC通過勾選checkbox更改select的內容9. Python Request類源碼實現方法及原理解析10. python實現與redis交互操作詳解

排行榜

					
					IntelliJ IDEA創建web項目的方法
利用Python實現Excel的文件間的數據匹配功能
python實現與redis交互操作詳解
Python多線程實現支付模擬請求過程解析
Python寫捕魚達人的游戲實現
Django如何繼承AbstractUser擴展字段
在vue項目中引用Antv G2,以餅圖為例講解
docker容器調用yum報錯的解決辦法
IntelliJ IDEA導入項目的方法
django創建css文件夾的具體方法
ASP中實現字符部位類似.NET里String對象的PadLeft和PadRight函數