apachesolr4.0.0ALPHA中文分析器IKAnalyzer4.0
最近看solr出了4.0ALPHA版本,管理界面比3.x漂亮,而且在和mmseg和lucene的SmartChineseAnalyzer、StandardAnalyzer、CJKAnalyzer比较之后,感觉IKAnalyzer比较好用!在配置IKAnalyzer的时候发现有些接口已经改变了,所以根据启动时出现的错误进行修改,所以有了4.0版本,已经测试可用!
如下为IKAnalyzer的新目录结构
IKAnalyzer4.0的jar包 ==> 下载
解压后把IKAnalyzer4.0.jar,IKAnalyzer.cfg,stopword.dic放到solr目录下的lib中
编辑solrconfig.xml添加
< lib dir ="http://www.cnblogs.com/dist/" regex ="apache-solr-analysis-extras-\d.*\.jar" /> < lib dir ="http://www.cnblogs.com/contrib/analysis-extras/lucene-libs" regex =".*\.jar" />
编辑schema.xml添加
<!-- IKAnalyzer --> < fieldType name ="text_ik" class ="solr.TextField" > < analyzer class ="org.wltea.analyzer.lucene.IKAnalyzer" /> < analyzer type ="index" > < tokenizer class ="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength ="false" /> < filter class ="solr.StopFilterFactory" ignoreCase ="true" words ="stopwords.txt" /> < filter class ="solr.WordDelimiterFilterFactory" generateWordParts ="1" generateNumberParts ="1" catenateWords ="1" catenateNumbers ="1" catenateAll ="0" splitOnCaseChange ="1" /> < filter class ="solr.LowerCaseFilterFactory" /> < filter class ="solr.EnglishPossessiveFilterFactory" protected ="protwords.txt" /> < filter class ="solr.RemoveDuplicatesTokenFilterFactory" /> </ analyzer > < analyzer type ="query" > < tokenizer class ="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength ="false" /> < filter class ="solr.StopFilterFactory" ignoreCase ="true" words ="stopwords.txt" /> < filter class ="solr.WordDelimiterFilterFactory" generateWordParts ="1" generateNumberParts ="1" catenateWords ="1" catenateNumbers ="1" catenateAll ="0" splitOnCaseChange ="1" /> < filter class ="solr.LowerCaseFilterFactory" /> < filter class ="solr.EnglishPossessiveFilterFactory" protected ="protwords.txt" /> < filter class ="solr.RemoveDuplicatesTokenFilterFactory" /> </ analyzer > </ fieldType >
顺便也贴下SmartChineseAnalyzer的配置
<!-- Chinese --> < fieldType name ="text_zh-cn" class ="solr.TextField" positionIncrementGap ="100" > < analyzer > < tokenizer class ="solr.SmartChineseSentenceTokenizerFactory" /> < filter class ="solr.SmartChineseWordTokenFilterFactory" /> < filter class ="solr.LowerCaseFilterFactory" /> < filter class ="solr.PositionFilterFactory" /> < filter class ="solr.StopFilterFactory" ignoreCase ="false" words ="lang/stopwords_zh-cn.txt" enablePositionIncrements ="true" /> </ analyzer > </ fieldType >
如果有什么问题请指出,跟大家一起学习进步!
标签: lucene , solr , IKAnalyzer
作者: Leo_wl
出处: http://www.cnblogs.com/Leo_wl/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
版权信息查看更多关于apachesolr4.0.0ALPHA中文分析器IKAnalyzer4.0的详细内容...
声明:本文来自网络,不代表【好得很程序员自学网】立场,转载请注明出处:http://haodehen.cn/did48467