Centos下安装Scrapy
Centos下安装Scrapy
Scrapy是一个开源的机遇twisted框架的python的单机爬虫,该爬虫实际上包含大多数网页抓取的工具包,用于爬虫下载端以及抽取端。
安装环境:
centos5. 4 python2. 7.3
安装步骤:
1.下载python2.7 http://HdhCmsTestpython.org/ftp/python/2.7.3/Python-2.7.3.tgz
[root@zxy-websgs ~]# wget http: // HdhCmsTestpython.org/ftp/python/2.7.3/Python-2.7.3.tgz -P /opt [root@zxy -websgs opt]# tar xvf Python- 2.7 . 3 .tgz [root@zxy -websgs Python- 2.7 . 3 ]# ./ configure [root@zxy -websgs Python- 2.7 . 3 ]# make && make install
验证python2.7安装
[root@zxy-websgs Python- 2.7 . 3 ]# python2. 7 Python 2.7 . 3 (default, Feb 28 2013 , 03 : 08 : 43 ) [GCC 4.1 . 2 20080704 (Red Hat 4.1 . 2 - 50 )] on linux2 Type " help " , " copyright " , " credits " or " license " for more information. >>> exit()
2.安装setuptools,http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz
[root@zxy-websgs ~]# wget http: // pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz -P /opt/ [root@zxy-websgs opt]# tar zxvf setuptools- 0 .6c11. tar .gz [root@zxy -websgs setuptools- 0 .6c11]# python2. 7 setup.py install
3.安装Twisted
[root@zxy-websgs setuptools- 0 .6c11]# easy_install Twisted ...... Installed /usr/local/lib/python2. 7 /site-packages/Twisted- 12.3 . 0 -py2. 7 -linux- x86_64.egg ...... Installed /usr/local/lib/python2. 7 /site-packages/zope.interface- 4.0 . 4 -py2. 7 -linux-x86_64.egg
Twisted要安装zope.interface,可以从下面地址下载
zope.interface :http://pypi.python.org/packages/source/z/zope.interface/zope.interface-4.0.1.tar.gz
twisted: http://twistedmatrix测试数据/Releases/Twisted/12.1/Twisted-12.1.0.tar.bz2
5.安装w3lib
[root@zxy-websgs setuptools- 0 .6c11]# easy_install - U w3lib Searching for w3lib Reading http: // pypi.python.org/simple/w3lib/ Reading http: // github测试数据/scrapy/w3lib Best match: w3lib 1.2 Downloading http: // pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9e Processing w3lib- 1.2 . tar .gz Running w3lib - 1.2 /setup.py -q bdist_egg --dist- dir /tmp/easy_install-wm_1BB/w3lib- 1.2 /egg-dist-tmp- 2DQHY_ zip_safe flag not set; analyzing archive contents... Adding w3lib 1.2 to easy- install .pth file Installed /usr/local/lib/python2. 7 /site-packages/w3lib- 1.2 -py2. 7 .egg Processing dependencies for w3lib Finished processing dependencies for w3lib
w3lib :http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz
6.安装libxml2或者用easy_install安装lxml
[root@zxy-websgs lxml- 3.1 . 0 ]# easy_install lxml
验证lxml安装
[root@zxy-websgs lxml- 3.1 . 0 ]# python2. 7 Python 2.7 . 3 (default, Feb 28 2013 , 03 : 08 : 43 ) [GCC 4.1 . 2 20080704 (Red Hat 4.1 . 2 - 50 )] on linux2 Type " help " , " copyright " , " credits " or " license " for more information. >>> import lxml >>> exit()
也可以安装libxml2,官网上推荐安装2.6.28或者以上的版本,但在官网上没找到,我先是安装的2.6.9的版本,运行scrapy时报以下错误
Traceback (most recent call last ): File " /usr/local/bin/scrapy " , line 5 , in <module> pkg_resources.run_script( ' Scrapy==0.14.4 ' , ' scrapy ' ) File " build/bdist.linux-x86_64/egg/pkg_resources.py " , line 489 , in run_script File " build/bdist.linux-x86_64/egg/pkg_resources.py " , line 1207 , in run_script File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/EGG-INFO/scripts/scrapy " , line 4 , in <module> execute() File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py " , line 112 , in execute cmds = _get_commands_dict(inproject) File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py " , line 37 , in _get_commands_dict cmds = _get_commands_from_module( ' scrapy测试数据mands ' , inproject) File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py " , line 30 , in _get_commands_from_module for cmd in _iter_command_classes(module): File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py " , line 21 , in _iter_command_classes for module in walk_modules(module_name): File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/utils/misc.py " , line 65 , in walk_modules submod = __import__(fullpath, {}, {}, [ '' ]) File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/commands/shell.py " , line 8 , in <module> from scrapy.shell import Shell File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/shell.py " , line 14 , in <module> from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/__init__.py " , line 30 , in <module> from scrapy.selector.libxml2sel import * File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/libxml2sel.py " , line 12 , in <module> from .factories import xmlDoc_from_html, xmlDoc_from_xml File " /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/factories.py " , line 14 , in <module> libxml2.HTML_PARSE_NOERROR + AttributeError: ' module ' object has no attribute ' HTML_PARSE_RECOVER '
升级到2.6.21版本以后解决了。
libxml2.6.1: ftp://xmlsoft.org/libxml2/python/libxml2-python-2.6.21.tar.gz
7.安装pyOpenSSL(这个是可选安装的,主要为了使scrapy能够支持https)
用easy_install pyOpenSSL安装的是pyOpenSSL-0.13版本,没安装成功,于是手动下载.011版本来进行安装。
[root@zxy-websgs opt]# wget http: // launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz -P /opt [root@zxy-websgs opt]# tar zxvf pyOpenSSL- 0.11 . tar .gz [root@zxy -websgs pyOpenSSL- 0.11 ]# python2. 7 setup.py install
pyOpenSSL: http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
8.安装scrapy
[root@zxy-websgs pyOpenSSL- 0.11 ]# easy_install -U Scrapy
验证安装
[root@zxy-websgs pyOpenSSL- 0.11 ]# scrapy Scrapy 0.16 . 4 - no active project Usage: scrapy <command> [options] [args] Available commands: fetch Fetch a URL using the Scrapy downloader runspider Run a self - contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directory Use " scrapy <command> -h " to see more info about a command
scrapy: http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.4.tar.gz
总结:
pyOpenSSL单独安装的时候不成功,也可以先下载pyOpenSSL0.11进行安装,再使用easy_install -U Scrapy进行全程安装
标签: scrapy centos
作者: Leo_wl
出处: http://HdhCmsTestcnblogs测试数据/Leo_wl/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
版权信息