详情页抓取的6个数据:新闻标题、评论数、时间、来源、正文、责任编辑。
首先,我们先将评论数整理成函数形式表示:
1 import requests 2 import json 3 import re 4 5 comments_url = '{}&group=&compress=0&ie=utf-8&oe=utf-8&page=1&page_size=20' 6 7 def getCommentsCount(newsURL): 8 ID = re.search('doc-i(.+).shtml', newsURL) 9 newsID = ID.group(1)10 commentsURL = requests.get(comments_url.format(newsID))11 commentsTotal = json.loads(commentsURL.text.strip('var data='))12 return commentsTotal['result']['count']['total']13 14 news = ''15 print(getCommentsCount(news))
声明:本文来自网络,不代表【好得很程序员自学网】立场,转载请注明出处:http://haodehen.cn/did84888