实例详解python3使用requests模块爬取页面内容

本篇文章主要介绍了python3使用requests模块爬取页面内容的实战演练，具有一定的参考价值，有兴趣的可以了解一下

>>> payload = {'newwindow': '1', 'q': 'python爬虫', 'oq': 'python爬虫'}

>>> r = requests.get("https://HdhCmsTestgoogle测试数据/search", params=payload)

>>> import requests

>>> r = requests.get('https://github测试数据/timeline.json')

>>> r.text

>>> r = requests.get('http://HdhCmsTestcnblogs测试数据/')

>>> r.encoding

'utf-8'

>>> r = requests.get('http://HdhCmsTestcnblogs测试数据/')

>>> r.status_code

200

#!/usr/bin/env python
# -*- coding: utf-8 -*-
_author_ = 'GavinHsueh'

import requests
import bs4

#要抓取的目标页码地址
url = 'http://HdhCmsTestranzhi.org/book/ranzhi/about-ranzhi-4.html'

#抓取页码内容，返回响应对象
response = requests.get(url)

#查看响应状态码
status_code = response.status_code

#使用BeautifulSoup解析代码,并锁定页码指定标签内容
content = bs4.BeautifulSoup(response.content.decode("utf-8"), "lxml")
element = content.find_all(id='book')

print(status_code)
print(element)

程序运行返回爬去结果：

抓取成功

关于爬去结果乱码问题

其实起初我是直接用的系统默认自带的python2操作的，但在抓取返回内容的编码乱码问题上折腾了老半天，google了多种解决方案都无效。在被python2“整疯“之后，只好老老实实用python3了。对于python2的爬取页面内容乱码问题，欢迎各位前辈们分享经验，以帮助我等后生少走弯路。

后记

python的爬虫相关模块有很多，除了requests模块，再如urllib和pycurl以及tornado等。相比而言，我个人觉得requests模块是相对简单易上手的了。通过文本，大家可以迅速学会使用python的requests模块爬取页码内容。本人能力有限，如果文章有任何错误欢迎不吝赐教，其次如果大家有任何关于python爬去页面内容的疑难杂问，也欢迎和大家一起交流讨论。

以上就是实例详解python3使用requests模块爬取页面内容的详细内容，更多请关注Gxl网其它相关文章！

声明：本文来自网络，不代表【好得很程序员自学网】立场，转载请注明出处：http://haodehen.cn/did81729

更新时间：2022-10-19 阅读：44次