文章目录
爬取人民邮电出版社–新书推荐板块
思路
1.获取能返回bookTagId的接口,将bookTagId获取下来,以便于批量保存图书信息
2.切换图书类型,找到调用的新的接口,找到返回的对应类型下的图书信息(预览部分)。注意,在调用该接口时携带了参数。
3.打开一本图书,找到对应接口,根据调用接口时携带的bookId参数获取返回的对应图书的具体信息
代码实现
import requests
import json
from openpyxl import Workbook
wb = Workbook()
ws = wb.worksheets[0]
url = r'https://www.ptpress.com.cn/recommendBook/getRecommendTypeListForPortal'
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.51",
"Accept-Encoding": "gzip, deflate, br"
}
lists = []
res = requests.get(url=url, headers=headers)
res.encoding = 'utf-8'
bookTag = json.loads(res.text)
bookTags = []
for bookTag_one in bookTag['data']:
bookTagId = bookTag_one['bookTagId']
url_tag = r'https://www.ptpress.com.cn/recommendBook/getRecommendBookListForPortal'
data = {
'bookTagId': bookTagId
}
res_Tag = requests.get(url=url_tag, params=data)
bookIds = json.loads(res_Tag.text)
for bookId in bookIds['data']:
list_one = []
bookId_one = bookId['bookId']
url_Id = r'https://www.ptpress.com.cn/bookinfo/getBookDetailsById'
data = {
'bookId': bookId_one
}
res_Id = requests.post(url=url_Id, data=data)
books = json.loads(res_Id.text)
author, bookName, discountPrice, isbn = books['data']['author'], books['data']['bookName'], books['data'][
'discountPrice'], books['data']['bookDetail']['data']['isbn']
list_one.append(bookName)
list_one.append(author)
list_one.append(discountPrice)
list_one.append(isbn)
lists.append(list_one)
for list in lists:
for i in list:
ws.cell(lists.index(list)+1, list.index(i)+1, i)
wb.save("人民邮电出版社.xlsx")
表格效果图
文章评论