http://code.google.com/p/pyv8/, pyv8爬虫专用
class HTTPRefererProcessor(urllib2.BaseHandler):
def __init__(self):
self.referer = None
def http_request(self, request):
if ((self.referer is not None) and
not request.has_header("Referer")):
request.add_unredirected_header("Referer", self.referer)
return request
def http_response(self, request, response):
self.referer = response.geturl()
return response
https_request = http_request
https_response = http_response
def main():
cj = CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPCookieProcessor(cj),
HTTPRefererProcessor(),
)
urllib2.install_opener(opener)
urllib2.urlopen(url1) #打开第一个网址
urllib2.urlopen(url2) #打开第二个网址
if "__main__" == __name__:
main()
Extract cookies from HTTP response and store them in the CookieJar , where allowed by policy.
The CookieJar will look for allowable Set-Cookie and Set-Cookie2 headers in the response argument, and store cookies as appropriate (subject to the CookiePolicy.set_ok() method’s approval).
The response object (usually the result of a call to urllib2.urlopen() , or similar) should support an info() method, which returns an object with a getallmatchingheaders() method (usually a mimetools.Message instance).
The request object (usually a urllib2.Request instance) must support the methods get_full_url() , get_host() , unverifiable() , and get_origin_req_host() , as documented by urllib2 . The request is used to set default values for cookie-attributes as well as for checking that the cookie is allowed to be set.
http://fly5.com.cn/p/p-like/python_https.html
http://www.cnblogs.com/xiaoxia/archive/2010/08/04/1792461.html?login=1
http://xiudaima.appspot.com/code/detail/14001
查看更多关于http://code.google.com/p/pyv8/, pyv8爬虫专用的详细内容...