Centos下安装Scrapy

Scrapy是一个开源的机遇twisted框架的python的单机爬虫，该爬虫实际上包含大多数网页抓取的工具包，用于爬虫下载端以及抽取端。

安装环境:

centos5. 4  
python2.  7.3

安装步骤:

1.下载python2.7 http://HdhCmsTestpython.org/ftp/python/2.7.3/Python-2.7.3.tgz

 [root@zxy-websgs ~]#  wget  http: //  HdhCmsTestpython.org/ftp/python/2.7.3/Python-2.7.3.tgz -P /opt 
 
[root@zxy -websgs opt]#  tar  xvf Python- 2.7 . 3  .tgz 

[root@zxy -websgs Python- 2.7 . 3 ]# ./ configure 

[root@zxy -websgs Python- 2.7 . 3 ]#  make  &&  make   install

　验证python2.7安装

 [root@zxy-websgs Python- 2.7 . 3 ]# python2. 7  
Python   2.7 . 3  (default, Feb  28   2013 ,  03 : 08 : 43  ) 
[GCC   4.1 . 2   20080704  (Red Hat  4.1 . 2 - 50  )] on linux2
Type   "  help  " ,  "  copyright  " ,  "  credits  "  or  "  license  "   for   more   information.
 >>> exit()

2.安装setuptools,http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz

 [root@zxy-websgs ~]#  wget  http: //  pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz -P /opt/ 
[root@zxy-websgs opt]#  tar  zxvf setuptools- 0 .6c11. tar  .gz 
[root@zxy -websgs setuptools- 0 .6c11]# python2. 7  setup.py   install

3.安装Twisted

 [root@zxy-websgs setuptools- 0  .6c11]# easy_install Twisted
......
Installed  /usr/local/lib/python2. 7 /site-packages/Twisted- 12.3 . 0 -py2. 7 -linux- x86_64.egg
......
Installed  /usr/local/lib/python2. 7 /site-packages/zope.interface- 4.0 . 4 -py2. 7 -linux-x86_64.egg

Twisted要安装zope.interface,可以从下面地址下载

zope.interface :http://pypi.python.org/packages/source/z/zope.interface/zope.interface-4.0.1.tar.gz

twisted: http://twistedmatrix测试数据/Releases/Twisted/12.1/Twisted-12.1.0.tar.bz2

5.安装w3lib

 [root@zxy-websgs setuptools- 0 .6c11]# easy_install - U w3lib
Searching   for   w3lib
Reading http:  //  pypi.python.org/simple/w3lib/ 
Reading http: //  github测试数据/scrapy/w3lib 
Best match: w3lib  1.2  
Downloading http:  //  pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9e 
Processing w3lib- 1.2 . tar  .gz
Running w3lib - 1.2 /setup.py -q bdist_egg --dist- dir  /tmp/easy_install-wm_1BB/w3lib- 1.2 /egg-dist-tmp- 2DQHY_
zip_safe flag not set; analyzing archive contents...
Adding w3lib   1.2  to easy- install .pth  file  

Installed  /usr/local/lib/python2. 7 /site-packages/w3lib- 1.2 -py2. 7  .egg
Processing dependencies   for   w3lib
Finished processing dependencies   for  w3lib

w3lib :http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz

6.安装libxml2或者用easy_install安装lxml

 [root@zxy-websgs lxml- 3.1 . 0 ]# easy_install lxml

验证lxml安装

 [root@zxy-websgs lxml- 3.1 . 0 ]# python2. 7  
Python   2.7 . 3  (default, Feb  28   2013 ,  03 : 08 : 43  ) 
[GCC   4.1 . 2   20080704  (Red Hat  4.1 . 2 - 50  )] on linux2
Type   "  help  " ,  "  copyright  " ,  "  credits  "  or  "  license  "   for   more   information.
 >>>  import lxml
 >>> exit()

也可以安装libxml2,官网上推荐安装2.6.28或者以上的版本，但在官网上没找到，我先是安装的2.6.9的版本，运行scrapy时报以下错误

 Traceback (most recent call  last  ):
  File   "  /usr/local/bin/scrapy  " , line  5 ,  in  <module> 
    pkg_resources.run_script(  '  Scrapy==0.14.4  ' ,  '  scrapy  '  )
  File   "  build/bdist.linux-x86_64/egg/pkg_resources.py  " , line  489 ,  in   run_script
  File   "  build/bdist.linux-x86_64/egg/pkg_resources.py  " , line  1207 ,  in   run_script
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/EGG-INFO/scripts/scrapy  " , line  4 ,  in  <module> 
    execute()
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py  " , line  112 ,  in   execute
    cmds  =  _get_commands_dict(inproject)
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py  " , line  37 ,  in   _get_commands_dict
    cmds  = _get_commands_from_module( '  scrapy测试数据mands  '  , inproject)
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py  " , line  30 ,  in   _get_commands_from_module
      for  cmd  in   _iter_command_classes(module):
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py  " , line  21 ,  in   _iter_command_classes
      for  module  in   walk_modules(module_name):
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/utils/misc.py  " , line  65 ,  in   walk_modules
    submod  = __import__(fullpath, {}, {}, [ ''  ])
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/commands/shell.py  " , line  8 ,  in  <module> 
    from scrapy.shell import Shell
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/shell.py  " , line  14 ,  in  <module> 
    from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/__init__.py  " , line  30 ,  in  <module> 
    from scrapy.selector.libxml2sel import  * 
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/libxml2sel.py  " , line  12 ,  in  <module> 
    from .factories import xmlDoc_from_html, xmlDoc_from_xml
  File   "  /usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/factories.py  " , line  14 ,  in  <module> 
    libxml2.HTML_PARSE_NOERROR  +  AttributeError:   '  module  '   object  has no attribute  '  HTML_PARSE_RECOVER  '

升级到2.6.21版本以后解决了。

libxml2.6.1: ftp://xmlsoft.org/libxml2/python/libxml2-python-2.6.21.tar.gz

7.安装pyOpenSSL(这个是可选安装的，主要为了使scrapy能够支持https)

用easy_install pyOpenSSL安装的是pyOpenSSL-0.13版本，没安装成功，于是手动下载.011版本来进行安装。

 [root@zxy-websgs opt]#  wget  http: //  launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz -P /opt 
[root@zxy-websgs opt]#  tar  zxvf pyOpenSSL- 0.11 . tar  .gz 
[root@zxy -websgs pyOpenSSL- 0.11 ]# python2. 7  setup.py  install

pyOpenSSL: http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz

8.安装scrapy

 [root@zxy-websgs pyOpenSSL- 0.11 ]# easy_install -U Scrapy

验证安装

 [root@zxy-websgs pyOpenSSL- 0.11  ]# scrapy
Scrapy   0.16 . 4  -  no active project

Usage:
  scrapy  <command>  [options] [args]

Available commands:
  fetch         Fetch a URL using the Scrapy downloader
  runspider     Run a self - contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL   in   browser, as seen by Scrapy

  [   more   ]      More commands available when run from project directory

Use   "  scrapy <command> -h  "  to see  more   info  about a command

scrapy: http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.4.tar.gz

总结：

pyOpenSSL单独安装的时候不成功，也可以先下载pyOpenSSL0.11进行安装，再使用easy_install -U Scrapy进行全程安装

标签: scrapy centos

作者： Leo_wl

出处： http://HdhCmsTestcnblogs测试数据/Leo_wl/

本文版权归作者和博客园共有，欢迎转载，但未经作者同意必须保留此段声明，且在文章页面明显位置给出原文连接，否则保留追究法律责任的权利。

版权信息

查看更多关于Centos下安装Scrapy的详细内容...

声明：本文来自网络，不代表【好得很程序员自学网】立场，转载请注明出处：http://haodehen.cn/did46753

更新时间：2022-09-24 阅读：47次