ElasticSearch使用elasticsearch-analysis-ik
感谢Medcl带来的ik分词插件。
为什么要用ik?
因为Elasticsearch本身不支持中文分词,使用默认的解析器会把中文分解成单个字,查询的时候很不方便。
下载ik插件
1 | https://github.com/medcl/elasticsearch-analysis-ik |
Maven打包,生成jar包
1 | mvn clean package |
安装插件
1 | plugin —install analysis-ik —url file:///#{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-1.4.0.zip |
其实这种方式就是将jar包拷贝到elasticsearch/plugins目录下。
然后,将elasticsearch-analysis-ik-master\config\ik目录复制到elasticsearch\config目录中。
修改elasticsearch/config/elasticsearch.yml文件,在最下方添加:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18################################## ik ################################
index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
char_filter: html_strip
ik_max_word:
type: ik
use_smart: false
ik_smart:
type: ik
use_smart: true
tokenizer:
ik_smart:
type: ik
use_smart: true
启动elasticsearch服务,至此ik插件就完成安装了。
现在我们来测试一下,首先我们来创建一个索引1
2
3
4
5
6
7
8
9
10
11
12
13POST /index #创建索引
POST /index/iktest/_mapping #创建映射
{
"iktest":{
"properties": {
"test":{
"type": "string",
"analyzer": "ik"
}
}
}
}
返回结果1
2
3{
"acknowledged": true
}
使用elasticsearch的分析器测试api1
2
3
4GET /index/_analyze?analyzer=ik&pretty=true
{
"地球如此大"
}
返回结果1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39{
"tokens": [
{
"token": "地球",
"start_offset": 8,
"end_offset": 10,
"type": "CN_WORD",
"position": 1
},
{
"token": "如此",
"start_offset": 10,
"end_offset": 12,
"type": "CN_WORD",
"position": 2
},
{
"token": "如",
"start_offset": 10,
"end_offset": 11,
"type": "CN_WORD",
"position": 3
},
{
"token": "此",
"start_offset": 11,
"end_offset": 12,
"type": "CN_CHAR",
"position": 4
},
{
"token": "大",
"start_offset": 12,
"end_offset": 13,
"type": "CN_WORD",
"position": 5
}
]
}
现在我们插入一条数据1
2
3
4POST /index/iktest/1
{
"test":"我们要保护地球"
}
1 | { |
搜索一下1
2
3
4
5
6
7
8
9
10POST /index/iktest/_search
{
"query": {
"term": {
"test": {
"value": "我们"
}
}
}
}
返回结果1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.11506981,
"hits": [
{
"_index": "index",
"_type": "iktest",
"_id": "1",
"_score": 0.11506981,
"_source": {
"test": "我们要保护地球"
}
}
]
}
}