好得很程序员自学网

<tfoot draggable='sEl'></tfoot>

apachesolr4.0.0ALPHA中文分析器IKAnalyzer4.0

apachesolr4.0.0ALPHA中文分析器IKAnalyzer4.0

最近看solr出了4.0ALPHA版本,管理界面比3.x漂亮,而且在和mmseg和lucene的SmartChineseAnalyzer、StandardAnalyzer、CJKAnalyzer比较之后,感觉IKAnalyzer比较好用!在配置IKAnalyzer的时候发现有些接口已经改变了,所以根据启动时出现的错误进行修改,所以有了4.0版本,已经测试可用!

如下为IKAnalyzer的新目录结构

IKAnalyzer4.0的jar包 ==> 下载

解压后把IKAnalyzer4.0.jar,IKAnalyzer.cfg,stopword.dic放到solr目录下的lib中

编辑solrconfig.xml添加

   <  lib   dir  ="http://www.cnblogs.com/dist/"   regex  ="apache-solr-analysis-extras-\d.*\.jar"   />  
   <  lib   dir  ="http://www.cnblogs.com/contrib/analysis-extras/lucene-libs"   regex  =".*\.jar"   /> 

编辑schema.xml添加

 <!--   IKAnalyzer   --> 
     <  fieldType   name  ="text_ik"   class  ="solr.TextField"   >   
       <  analyzer   class  ="org.wltea.analyzer.lucene.IKAnalyzer"  />   
       <  analyzer   type  ="index"  >   
       <  tokenizer   class  ="org.wltea.analyzer.solr.IKTokenizerFactory"   isMaxWordLength  ="false"  />   
         <  filter   class  ="solr.StopFilterFactory"   ignoreCase  ="true"   words  ="stopwords.txt"  />   
         <  filter   class  ="solr.WordDelimiterFilterFactory"   generateWordParts  ="1"   generateNumberParts  ="1"   catenateWords  ="1"   catenateNumbers  ="1"  
catenateAll  ="0"   splitOnCaseChange  ="1"  />   
         <  filter   class  ="solr.LowerCaseFilterFactory"  />   
         <  filter   class  ="solr.EnglishPossessiveFilterFactory"   protected  ="protwords.txt"  />   
         <  filter   class  ="solr.RemoveDuplicatesTokenFilterFactory"  />   
       </  analyzer  >   
       <  analyzer   type  ="query"  >   
         <  tokenizer   class  ="org.wltea.analyzer.solr.IKTokenizerFactory"   isMaxWordLength  ="false"  />   
         <  filter   class  ="solr.StopFilterFactory"   ignoreCase  ="true"   words  ="stopwords.txt"  />   
         <  filter   class  ="solr.WordDelimiterFilterFactory"   generateWordParts  ="1"   generateNumberParts  ="1"   catenateWords  ="1"   catenateNumbers  ="1"  
catenateAll  ="0"   splitOnCaseChange  ="1"  />  
         <  filter   class  ="solr.LowerCaseFilterFactory"  />   
         <  filter   class  ="solr.EnglishPossessiveFilterFactory"   protected  ="protwords.txt"  />   
         <  filter   class  ="solr.RemoveDuplicatesTokenFilterFactory"  />   
       </  analyzer  >   
     </  fieldType  > 

顺便也贴下SmartChineseAnalyzer的配置

     <!--   Chinese   --> 
     <  fieldType   name  ="text_zh-cn"   class  ="solr.TextField"   positionIncrementGap  ="100"  > 
       <  analyzer  > 
         <  tokenizer   class  ="solr.SmartChineseSentenceTokenizerFactory"  /> 
         <  filter   class  ="solr.SmartChineseWordTokenFilterFactory"  /> 
         <  filter   class  ="solr.LowerCaseFilterFactory"  /> 
         <  filter   class  ="solr.PositionFilterFactory"   /> 
         <  filter   class  ="solr.StopFilterFactory"   ignoreCase  ="false"   words  ="lang/stopwords_zh-cn.txt"   enablePositionIncrements  ="true"  /> 
       </  analyzer  > 
     </  fieldType  >

如果有什么问题请指出,跟大家一起学习进步!

 

 

标签:  lucene ,  solr ,  IKAnalyzer

作者: Leo_wl

    

出处: http://www.cnblogs.com/Leo_wl/

    

本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

版权信息

查看更多关于apachesolr4.0.0ALPHA中文分析器IKAnalyzer4.0的详细内容...

  阅读:42次