首页手记 ElasticSearch 实用学习笔记

ElasticSearch 实用学习笔记

标签：

大数据微服务

ElasticSearch

Author: CodingGorit

Date: 2020年10月22日

Note：学习笔记记录自 B站狂神说：ElasticSearch 学习

一、学习大纲

安装
生态圈
分词器 lk
RestFul 操作 ES
CRUD
SpringBoot 继承 ElasticSearch （从原理分析！！！）
爬虫爬取数据！！！京东
实战，模拟全文检索

搜索相关使用 ES（大数据量下使用）

> Lucene 是一套信息检索工具包（Jar 包，不包含搜索引擎系统）！ Solr
>
> 包含的：索引结构！读写索引的工具！排序，搜索规则… 工具类
>
> Lucene 和 EslasticSearch 关系：
>
> ElasticSearch 是基于 Lucene 做了一些封装和增强

二、ElasticSearch 概述

简称 es

一个开源的高扩展的分布式全文检索引擎
近乎实时的存储，检索数据
es使用 java 开发并使用 Licene 作为其核心来实现所有索引和搜索功能
它的目的是通过简单的 RESTFul API，来隐藏 Lucene 的复杂性，从而让全文搜索变得简单

三、ElasticSearch 安装

JDK 1.8

下载，解压
熟悉目录：

bin: 启动文件
	config: 配置文件
	log4j: 日志文件
	jvm.options: java 虚拟机先关的配置
	elasticsearch.xml:	elasticsearch 的配置文件！
lib: 相关 jar 包
logs: 日志
modules: 功能模块
plugins: 插件 ik

启动，访问 9200
访问测试：localhost:9200

> 安装可视化插件 es head 插件

下载地址：https://github.com/mobz/elasticsearch-head/
启动

npm install
npm run start

在 elasticSearch.yml 配置跨域

http.cors.enabled: true
http.cors.allow-origin: "*"

安装 kibana

下载，解压
国际化

找到 config 下的 kibana.yml 文件，修改最后一行为 i18n.locale: “zh-CN”

四、ES 核心概念

索引
字段类型（mapping）
文档（documents）

集群、节点、索引、类型、文档、分片、映射是什么？

> ElasticSearch 是面向文档，关系型数据库和 elasticSearch 客观的对比！一切都是 JSON
>
> {
>
> }

名词对应

ElasticSearch	Relational DB
索引（indices）	数据库（database）
types	表（tables）
documents	行（rows）
fields	字段（columns）

elasticSearch （集群）中可以包含多个索引（数据库），每个索引中可以包含多个类型（表），每个类型下又包含多个文档（行），每个文档又包含多个字段（列）

物理设计

elasticSearch 一个就是一个集群

文档

一条条记录

user
	zs: 15
	ls: 22

类型

自动识别， string，

索引

数据库

五、IK 分词器插件

下载好的添加到 plugin 中

跳过，第 8 集

elasticsearch-plugin 可以通过这个命令来查看加载进来的插件
ik_smart（最少切分）和 ik_max_word（最细粒度划分）
kibana 测试
自定义分词

六、 Rest 风格说明

基础 Rest 命令

method	url 地址	描述
PUT	localhost:9200/索引名称/类型名称/文档 id	创建文档（指定文档 id）
POST	localhost:9200/索引名称/类型名称	创建文档（随机文档 id）
POST	localhost:9200/索引名称/类型名称/文档id/_update	修改文档
DELETE	localhost:9200/索引名称/类型名称/文档id	删除文档
GET	localhost:9200/索引名称/类型名称/文档id	查询文档通过文档 id
POST	localhost:9200/索引名称/类型名称/_seaarch	查询所有数据

> 基本测试

6.1 创建索引

创建一个索引

PUT /索引名/~类型名~/文档id
{
  "name":"Gorit",
  "age": 18,
  "gender": "male"
}

返回值，数据成功添加

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test",
  "_type" : "type1",
  "_id" : "1", 
  "_version" : 1, // 修改次数
  "result" : "created", // 状态
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

创建索引规则

PUT /test1/
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      },
      "birthday": {
        "type": "date"
      }
    }
  }
}

返回值

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test1"
}

es 默认配置字段类型！

6.2 查询

GET test

# 结果
{
  "test" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1603203146037",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "q47lWt_4ToOBo1rxQ1pPNw",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test"
      }
    }
  }
}


GET test1

{
  "test1" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "birthday" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1603203453667",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "a-upVXJwR7u7JZztTjyVGg",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test1"
      }
    }
  }
}

扩展：通过 _cat/ 可以获得 es 当前很多的信息

GET _cat/health

GET _cat/indices?v

6.3 修改索引

> 提交 PUT，覆盖即可

修改数据

PUT /test/type1/1
{
  "name":"Gorit111",
  "age": 18,
  "gender": "male"
}

修改结果

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test",
  "_type" : "type1",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

新的方法 POST 命令更新

POST /test/_doc/1/_update
{
  "doc": {
      "name":"张三"
  }
}

// 结果
{
  "_index" : "test",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "张三",
    "age" : 18,
    "gender" : "male"
  }
}

6.4 删除索引

> 删除索引！！！

DELETE test

通过 delete 命令实现删除，根据你的请求来判断删除的是索引还是文档

七、关于文档的操作

7.1 基本操作（复习巩固）

添加数据（添加多条记录）

PUT /gorit/user/1
{
  "name": "CodingGorit",
  "age": 23,
  "desc": "一个独立的个人开发者",
  "tags": ["Python","Java","JavaScript"]
}

PUT /gorit/user/2
{
  "name": "龙",
  "age": 20,
  "desc": "全栈工程师",
  "tags": ["Python","JavaScript"]
}

结果：

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "gorit",
  "_type" : "user",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

获取数据

GET /gorit/user/_search   # 查询所有数据

GET /gorit/user/1 # 查询单个数据

更新数据 PUT

PUT /gorit/user/3
{
  "name": "李四222",
  "age": 20,
  "desc": "Java开发工程师",
  "tags": ["Python","Java"]
}

# PUT 更新字段不完整，数据会被滞空

post _update ，推荐使用这种方式！

# 修改方式和 PUT 一样会使数据滞空
POST /gorit/user/1
{
  "doc": {
    "name": "coco"
  }
}

# 修改数据不会滞空, 效率更加高效
POST /gorit/user/1/_update
{
  "doc": {
    "name": "coco"
  }
}

简单的搜索！

# 查询一条记录
GET /gorit/user/1

# 查询所有
GET /gorit/user/_search

# 条件查询 [精确匹配] ，如果我们没有个这个属性设置字段，它会背默认设置为 keyword，这个 keyword 字段就是使用全匹配来匹配的，如果是 text 类型，模糊查询就会起效果
GET /gorit/user/_search?q=name:coco

7.2 复杂的查询搜索：select（排序、分页、高亮、模糊查询、精确查询）！

过滤加指定字段查询

GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "李四"
    }
  },
  "_source": ["name","desc"]
}

7.3 排序

GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "gorit"
    }
  },
  "sort": [
    {
     "age": {
       "order": "desc"
     }
    }
  ]
}

7.4 分页查询

使用字段 from 和 size 进行分页查询，方式和 limit pageSize 是一模一样的

from 从第几页开始
返回多少条数据

GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "李四"
    }
  },
  "sort": [
    {
     "age": {
       "order": "desc"
     }
    }
  ],
  "from": 0,
  "size": 1
}

7.5 filiter 区间查询

# 根据年龄的范围大小查询
GET /gorit/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "gorit"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "age": {
              "gte": 1,
              "lte": 25
            }
          }
        }
      ]
    }
  }
}

gt 大于
gte 大于等于
lt 小于
lte 小于等于

7.6 布尔值查询

must (and), 所有的条件都要符合 where id=1 and name = xxx

# 布尔查询
GET /gorit/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "gorit"
          }
        },{
          "match": {
            "age": "16"
          }
        }
      ]
    }
  }
}

7.7 匹配多个条件

同时匹配即可

# 多个条件用空格隔开，只要满足一个即可被查出，这个时候可以根据分值判断
GET /gorit/user/_search
{
  "query": {
    "match": {
      "tags": "Java Python"
    }
  }
}

7.7 精确查询

term 查询是直接通过倒排索引指定的词条进程精确的查找的！

关于分词

term，直接精确查询
match：会使用分词器解析！！（先分析文档，然后通过分析的文档进行查询！！！）

两个类型 text keyword

结论：

text 可分
keyword 不可再分

7.8 高亮查询

# 高亮查询, 搜索的结果，可以高亮显示, 也能添加自定义高亮条件
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "Gorit"
    }
  },
  "highlight": {
    "pre_tags": "<h3 class="key" style="color:#FF0000;">", 
    "post_tags": "</h3>", 
    "fields": {
      "name": {}
    }
  }
}

# 响应结果
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.6375021,
    "hits" : [
      {
        "_index" : "gorit",
        "_type" : "user",
        "_id" : "6",
        "_score" : 1.6375021,
        "_source" : {
          "name" : "Gorit",
          "age" : 16,
          "desc" : "运维工程师",
          "tags" : [
            "Linux",
            "c++",
            "python"
          ]
        },
        "highlight" : {
          "name" : [
            "<h3 class="key" style="color:#FF0000;">Gorit</h3>"
          ]
        }
      }
    ]
  }
}

这些 MySQL 也可以做，只是 MySQL 效率更低

匹配
按照条件匹配
精确匹配
区间范围匹配
匹配字段过滤
多条件查询
高亮查询
倒排索引

八、集成 SpringBoot

> 找官方文档

> 具体测试

创建索引
判断索引是否存在
删除索引
创建文档
操作文档

// 坐标依赖
		org.springframework.bootspring-boot-starter-data-elasticsearch

// 核心代码            
package cn.gorit;

import cn.gorit.pojo.User;
import com.alibaba.fastjson.JSON;
import javafx.scene.control.IndexRange;
import org.apache.lucene.util.QueryBuilder;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedRequest;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContent;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.json.JSONObject;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.http.codec.cbor.Jackson2CborDecoder;

import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.TimeUnit;

/**
 * es 7.6.2 API 测试
 */
@SpringBootTest
class DemoApplicationTests {

	// 名称匹配
	@Autowired
	@Qualifier("restHighLevelClient")
	private RestHighLevelClient client;

	@Test
	void contextLoads() {

	}
	// 索引的创建
	@Test
	void testCreateIndex() throws IOException {
		// 1. 创建索引请求  等价于 PUT /gorit_index
		CreateIndexRequest request = new CreateIndexRequest("gorit_index");
		// 2. 执行创建请求 IndicesClient, 请求后获得响应
		CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
		System.out.println(response);
	}

	// 测试获取索引，判断其是否存在
	@Test
	void testGetIndexExist() throws IOException {
		GetIndexRequest request = new GetIndexRequest("gorit_index");
		boolean exist = client.indices().exists(request,RequestOptions.DEFAULT);
		System.out.println(exist);
	}

	// 删除索引
	@Test
	void testDeleteIndex() throws IOException {
		DeleteIndexRequest request = new DeleteIndexRequest("gorit_index");
		// 删除
		AcknowledgedResponse delete	= client.indices().delete(request,RequestOptions.DEFAULT);
		System.out.println(delete.isAcknowledged());
	}

	// 添加文档
	@Test
	void testAddDocument() throws IOException {
		// 创建对象
		User u = new User("Gorit",3);
		// 创建请求
		IndexRequest request = new IndexRequest("gorit_index");

		// 规则 PUT /gorit_index/_doc/1
		request.id("1");
		request.timeout(TimeValue.timeValueSeconds(3));
		request.timeout("1s");

		// 将数据放入请求 json
		IndexRequest source = request.source(JSON.toJSONString(u), XContentType.JSON);
		// 客户端发送请求
		IndexResponse response = client.index(request, RequestOptions.DEFAULT);

		System.out.println(response.toString());
		System.out.println(response.status());// 返回对应的状态 CREATED
	}

	// 获取文档，判断存在  get /index/_doc/1
	@Test
	void testIsExists() throws IOException {
		GetRequest getRequest = new GetRequest("gorit_index", "1");

		// 不获取返回的 _source 的上下文了
		getRequest.fetchSourceContext(new FetchSourceContext(false));
		getRequest.storedFields("_none_");

		boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
		System.out.println(exists);
	}

	// 获取文档信息
	@Test
	void testGetDocument() throws IOException {
		GetRequest getRequest = new GetRequest("gorit_index", "1");
		GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(getResponse.getSourceAsString());
		System.out.println(getResponse); // 返回全部的内容和命令是一样的
	}

	// 更新文档信息
	@Test
	void testUpdateDocument() throws IOException {
		UpdateRequest updateRequest = new UpdateRequest("gorit_index", "1");
		updateRequest.timeout("1s");

		User user = new User("CodingGoirt", 18);
		updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);

		UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(updateResponse.status());
		System.out.println(updateResponse); // 返回全部的内容和命令是一样的
	}

	// 删除文档记录
	@Test
	void testDeleteDocument() throws IOException {
		DeleteRequest deleteRequest = new DeleteRequest("gorit_index", "1");
		deleteRequest.timeout("1s");

		DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(deleteResponse.status());
		System.out.println(deleteResponse); // 返回全部的内容和命令是一样的
	}

	// 特殊的，真的项目。 批量插入数据

	@Test
	void testBulkRequest() throws IOException {
		BulkRequest bulkRequest = new BulkRequest();
		bulkRequest.timeout("10s");

		ArrayList userList = new ArrayList&lt;&gt;();
		userList.add(new User("张三1",1));
		userList.add(new User("张三2",2));
		userList.add(new User("张三3",3));
		userList.add(new User("张三4",4));
		userList.add(new User("张三5",5));
		userList.add(new User("张三6",6));
		userList.add(new User("张三7",7));

		// 批处理请求
		for (int i=0;iorg.jsoupjsoup1.10.2com.alibabafastjson1.2.68org.springframework.bootspring-boot-starter-data-elasticsearchorg.springframework.bootspring-boot-starter-thymeleaforg.springframework.bootspring-boot-starter-weborg.springframework.bootspring-boot-devtoolsruntimetrueorg.springframework.bootspring-boot-configuration-processortrueorg.projectlomboklomboktrueorg.springframework.bootspring-boot-starter-testtestorg.junit.vintagejunit-vintage-engine

爬虫

配置文件

package cn.gorit.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * Spring 步骤
 * 1. 找对象
 * 2. 放到 spring 中使用
 * 3. 分析源码
 *
 * @Classname ElasticSearchConfig
 * @Description TODO
 * @Date 2020/10/21 17:20
 * @Created by CodingGorit
 * @Version 1.0
 */
@Configuration // xml -bean
public class ElasticSearchConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")
                )
        );
        return client;
    }

}

> 爬取京东搜索的内容

config 配置类

package cn.gorit.util;

import cn.gorit.pojo.Content;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.stereotype.Component;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

/**
 * @Classname HtmlParseUtil
 * @Description TODO
 * @Date 2020/10/21 23:17
 * @Created by CodingGorit
 * @Version 1.0
 */
@Component
public class HtmlParseUtil {

//    public static void main(String[] args) throws Exception {
//        new HtmlParseUtil().parseJD("英语").forEach(System.out::println);
//    }

    public List parseJD(String keyword) throws Exception {
        // 请求 url
        // 联网，不能获取 ajax 数据
        String url = "https://search.jd.com/Search?keyword=wd&amp;enc=utf-8";
        // 解析网页 (返回的  Document 对象）
        Document document = Jsoup.parse(new URL(url.replace("wd",keyword)),30000);
        // 获取所有节点标签
        Element element = document.getElementById("J_goodsList");
        // 获取所有的 li 元素
        Elements elements = element.getElementsByTag("li");
        // 获取元素中的内容
        List goodsList = new ArrayList&lt;&gt;();
        for (Element e: elements) {
            String img = e.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = e.getElementsByClass("p-price").eq(0).text();
            String title = e.getElementsByClass("p-name").eq(0).text();

            goodsList.add(new Content(title,img,price));
//            System.out.println(img);
//            System.out.println(price);
//            System.out.println(title);
        }
        return goodsList;
    }
}

Service 方法

package cn.gorit.service;

import cn.gorit.pojo.Content;
import cn.gorit.util.HtmlParseUtil;
import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;

/**
 * @Classname ContentService
 * @Description TODO
 * @Date 2020/10/22 18:44
 * @Created by CodingGorit
 * @Version 1.0
 */
@Service
public class ContentService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    // 不能直接使用，只要 Spring 容器
    public static void main(String[] args) throws Exception {
        new ContentService().parseContent("java");
    }

    // 1. 解析数据放入 es 索引中
    public Boolean parseContent (String keywords) throws Exception {
        // 获取查询到的列表的信息
        List contents = new HtmlParseUtil().parseJD(keywords);
        // 把查询到的数据放入 es 中
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("2m");

        for (int i=0;i &lt; contents.size();++i) {
            bulkRequest.add(
                    new IndexRequest("jd_goods")
                    .source(JSON.toJSONString(contents.get(i)),XContentType.JSON));
        }
        BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulkResponse.hasFailures();
    }

    // 2. 获取这些数据，实现基本的搜索功能
    public List&gt; searchPagehighLight   (String keyword, int pageNo,int pageSize) throws IOException {
        if (pageNo &lt;= 1)
            pageNo = 1;

        // 条件清晰
        SearchRequest searchRequest = new SearchRequest("jd_goods");

        SearchSourceBuilder builder = new SearchSourceBuilder();

        builder.from(pageNo);
        builder.size(pageSize);
        // 精准匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword);
        builder.query(termQueryBuilder);
        builder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        // 高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.requireFieldMatch(false);
        highlightBuilder.preTags("<span style="color:#FF0000;">");
        highlightBuilder.postTags("</span>");
        builder.highlighter(highlightBuilder);

        // 执行搜索
        searchRequest.source(builder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        ArrayList&gt; list= new ArrayList&lt;&gt;();
        for (SearchHit hit: searchResponse.getHits().getHits()) {
            // 解析高亮的字段
            Map highlightFields = hit.getHighlightFields();
            HighlightField title = highlightFields.get("title");
            Map sourceAsMap = hit.getSourceAsMap();// 原来的结果
            // 解析高亮字段，将原来的字段换成我们高亮的字段即可
            if (title != null) {
                Text[] fragments = title.fragments();
                StringBuilder nTitle = new StringBuilder();
                for (Text text:fragments) {
                    nTitle.append(text);
                }
                sourceAsMap.put("title",nTitle);
            }
            list.add(hit.getSourceAsMap()); // 高亮的字段替换为原来的内容即可
        }
        return list;
    }

    // 2. 获取这些数据，实现基本的搜索功能
    public List&gt; searchPage (String keyword, int pageNo,int pageSize) throws IOException {
        if (pageNo &lt;= 1)
            pageNo = 1;

        // 条件清晰
        SearchRequest searchRequest = new SearchRequest("jd_goods");

        SearchSourceBuilder builder = new SearchSourceBuilder();

        builder.from(pageNo);
        builder.size(pageSize);
        // 精准匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword);
        builder.query(termQueryBuilder);
        builder.timeout(new TimeValue(60, TimeUnit.SECONDS));


        // 执行搜索
        searchRequest.source(builder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        ArrayList&gt; list= new ArrayList&lt;&gt;();
        for (SearchHit hit: searchResponse.getHits().getHits()) {

            list.add(hit.getSourceAsMap()); // 高亮的字段替换为原来的内容即可
        }
        return list;
    }
}

Controller

package cn.gorit.controller;

import cn.gorit.pojo.Content;
import cn.gorit.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.io.IOException;
import java.util.List;
import java.util.Map;

/**
 * @Classname ContentController
 * @Description TODO
 * @Date 2020/10/22 18:45
 * @Created by CodingGorit
 * @Version 1.0
 */
@RestController
public class ContentController {

    @Autowired
    private ContentService service;

    /**
     * 将数据添加到 ES 中
     * @param keyword
     * @return
     * @throws Exception
     */
    @GetMapping("/parse/{keyword}")
    public Boolean pares(@PathVariable("keyword")  String keyword) throws Exception {
        return service.parseContent(keyword);
    }

    /**
     * 查询 ES 的数据
     * @param keyword
     * @param pageNo
     * @param pageSize
     * @return
     * @throws IOException
     */
    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
    public List&gt; search(@PathVariable("keyword") String keyword,@PathVariable("pageNo") int pageNo, @PathVariable("pageSize") int pageSize) throws IOException {
        if (pageNo == 0) {
            pageNo = 1;
        }
        return service.searchPage(keyword, pageNo, pageSize);
    }
}

前后端分离

POSTMAN 测试

搜索高亮

> 一套项目，多端运用

十、总结

ElasticSearch 基本使用
SpringBoot 整合 ES
实战搜索

> 个人开源项目（Coding-With-Java ）欢迎大家点赞

[TOC]

ElasticSearch

Author: CodingGorit

Date: 2020年10月22日

Note：学习笔记记录自 B站狂神说：ElasticSearch 学习

一、学习大纲

安装
生态圈
分词器 lk
RestFul 操作 ES
CRUD
SpringBoot 继承 ElasticSearch （从原理分析！！！）
爬虫爬取数据！！！京东
实战，模拟全文检索

搜索相关使用 ES（大数据量下使用）

二、ElasticSearch 概述

简称 es

一个开源的高扩展的分布式全文检索引擎
近乎实时的存储，检索数据
es使用 java 开发并使用 Licene 作为其核心来实现所有索引和搜索功能
它的目的是通过简单的 RESTFul API，来隐藏 Lucene 的复杂性，从而让全文搜索变得简单

三、ElasticSearch 安装

JDK 1.8

下载，解压
熟悉目录：

bin: 启动文件
	config: 配置文件
	log4j: 日志文件
	jvm.options: java 虚拟机先关的配置
	elasticsearch.xml:	elasticsearch 的配置文件！
lib: 相关 jar 包
logs: 日志
modules: 功能模块
plugins: 插件 ik

启动，访问 9200
访问测试：localhost:9200

> 安装可视化插件 es head 插件

下载地址：https://github.com/mobz/elasticsearch-head/
启动

npm install
npm run start

在 elasticSearch.yml 配置跨域

http.cors.enabled: true
http.cors.allow-origin: "*"

安装 kibana

下载，解压
国际化

找到 config 下的 kibana.yml 文件，修改最后一行为 i18n.locale: “zh-CN”

四、ES 核心概念

索引
字段类型（mapping）
文档（documents）

集群、节点、索引、类型、文档、分片、映射是什么？

> ElasticSearch 是面向文档，关系型数据库和 elasticSearch 客观的对比！一切都是 JSON
>
> {
>
> }

名词对应

ElasticSearch	Relational DB
索引（indices）	数据库（database）
types	表（tables）
documents	行（rows）
fields	字段（columns）

物理设计

elasticSearch 一个就是一个集群

文档

一条条记录

user
	zs: 15
	ls: 22

类型

自动识别， string，

索引

数据库

五、IK 分词器插件

下载好的添加到 plugin 中

跳过，第 8 集

elasticsearch-plugin 可以通过这个命令来查看加载进来的插件
ik_smart（最少切分）和 ik_max_word（最细粒度划分）
kibana 测试
自定义分词

六、 Rest 风格说明

基础 Rest 命令

method	url 地址	描述
PUT	localhost:9200/索引名称/类型名称/文档 id	创建文档（指定文档 id）
POST	localhost:9200/索引名称/类型名称	创建文档（随机文档 id）
POST	localhost:9200/索引名称/类型名称/文档id/_update	修改文档
DELETE	localhost:9200/索引名称/类型名称/文档id	删除文档
GET	localhost:9200/索引名称/类型名称/文档id	查询文档通过文档 id
POST	localhost:9200/索引名称/类型名称/_seaarch	查询所有数据

> 基本测试

6.1 创建索引

创建一个索引

PUT /索引名/~类型名~/文档id
{
  "name":"Gorit",
  "age": 18,
  "gender": "male"
}

返回值，数据成功添加

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test",
  "_type" : "type1",
  "_id" : "1", 
  "_version" : 1, // 修改次数
  "result" : "created", // 状态
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

创建索引规则

PUT /test1/
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      },
      "birthday": {
        "type": "date"
      }
    }
  }
}

返回值

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test1"
}

es 默认配置字段类型！

6.2 查询

GET test

# 结果
{
  "test" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1603203146037",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "q47lWt_4ToOBo1rxQ1pPNw",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test"
      }
    }
  }
}


GET test1

{
  "test1" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "birthday" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1603203453667",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "a-upVXJwR7u7JZztTjyVGg",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test1"
      }
    }
  }
}

扩展：通过 _cat/ 可以获得 es 当前很多的信息

GET _cat/health

GET _cat/indices?v

6.3 修改索引

> 提交 PUT，覆盖即可

修改数据

PUT /test/type1/1
{
  "name":"Gorit111",
  "age": 18,
  "gender": "male"
}

修改结果

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test",
  "_type" : "type1",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

新的方法 POST 命令更新

POST /test/_doc/1/_update
{
  "doc": {
      "name":"张三"
  }
}

// 结果
{
  "_index" : "test",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "张三",
    "age" : 18,
    "gender" : "male"
  }
}

6.4 删除索引

> 删除索引！！！

DELETE test

通过 delete 命令实现删除，根据你的请求来判断删除的是索引还是文档

七、关于文档的操作

7.1 基本操作（复习巩固）

添加数据（添加多条记录）

PUT /gorit/user/1
{
  "name": "CodingGorit",
  "age": 23,
  "desc": "一个独立的个人开发者",
  "tags": ["Python","Java","JavaScript"]
}

PUT /gorit/user/2
{
  "name": "龙",
  "age": 20,
  "desc": "全栈工程师",
  "tags": ["Python","JavaScript"]
}

结果：

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "gorit",
  "_type" : "user",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

获取数据

GET /gorit/user/_search   # 查询所有数据

GET /gorit/user/1 # 查询单个数据

更新数据 PUT

PUT /gorit/user/3
{
  "name": "李四222",
  "age": 20,
  "desc": "Java开发工程师",
  "tags": ["Python","Java"]
}

# PUT 更新字段不完整，数据会被滞空

post _update ，推荐使用这种方式！

# 修改方式和 PUT 一样会使数据滞空
POST /gorit/user/1
{
  "doc": {
    "name": "coco"
  }
}

# 修改数据不会滞空, 效率更加高效
POST /gorit/user/1/_update
{
  "doc": {
    "name": "coco"
  }
}

简单的搜索！

# 查询一条记录
GET /gorit/user/1

# 查询所有
GET /gorit/user/_search

# 条件查询 [精确匹配] ，如果我们没有个这个属性设置字段，它会背默认设置为 keyword，这个 keyword 字段就是使用全匹配来匹配的，如果是 text 类型，模糊查询就会起效果
GET /gorit/user/_search?q=name:coco

7.2 复杂的查询搜索：select（排序、分页、高亮、模糊查询、精确查询）！

过滤加指定字段查询

GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "李四"
    }
  },
  "_source": ["name","desc"]
}

7.3 排序

GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "gorit"
    }
  },
  "sort": [
    {
     "age": {
       "order": "desc"
     }
    }
  ]
}

7.4 分页查询

使用字段 from 和 size 进行分页查询，方式和 limit pageSize 是一模一样的

from 从第几页开始
返回多少条数据

GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "李四"
    }
  },
  "sort": [
    {
     "age": {
       "order": "desc"
     }
    }
  ],
  "from": 0,
  "size": 1
}

7.5 filiter 区间查询

# 根据年龄的范围大小查询
GET /gorit/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "gorit"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "age": {
              "gte": 1,
              "lte": 25
            }
          }
        }
      ]
    }
  }
}

gt 大于
gte 大于等于
lt 小于
lte 小于等于

7.6 布尔值查询

must (and), 所有的条件都要符合 where id=1 and name = xxx

# 布尔查询
GET /gorit/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "gorit"
          }
        },{
          "match": {
            "age": "16"
          }
        }
      ]
    }
  }
}

7.7 匹配多个条件

同时匹配即可

# 多个条件用空格隔开，只要满足一个即可被查出，这个时候可以根据分值判断
GET /gorit/user/_search
{
  "query": {
    "match": {
      "tags": "Java Python"
    }
  }
}

7.7 精确查询

term 查询是直接通过倒排索引指定的词条进程精确的查找的！

关于分词

term，直接精确查询
match：会使用分词器解析！！（先分析文档，然后通过分析的文档进行查询！！！）

两个类型 text keyword

结论：

text 可分
keyword 不可再分

7.8 高亮查询

# 高亮查询, 搜索的结果，可以高亮显示, 也能添加自定义高亮条件
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "Gorit"
    }
  },
  "highlight": {
    "pre_tags": "<h3 class="key" style="color:#FF0000;">", 
    "post_tags": "</h3>", 
    "fields": {
      "name": {}
    }
  }
}

# 响应结果
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.6375021,
    "hits" : [
      {
        "_index" : "gorit",
        "_type" : "user",
        "_id" : "6",
        "_score" : 1.6375021,
        "_source" : {
          "name" : "Gorit",
          "age" : 16,
          "desc" : "运维工程师",
          "tags" : [
            "Linux",
            "c++",
            "python"
          ]
        },
        "highlight" : {
          "name" : [
            "<h3 class="key" style="color:#FF0000;">Gorit</h3>"
          ]
        }
      }
    ]
  }
}

这些 MySQL 也可以做，只是 MySQL 效率更低

匹配
按照条件匹配
精确匹配
区间范围匹配
匹配字段过滤
多条件查询
高亮查询
倒排索引

八、集成 SpringBoot

> 找官方文档

> 具体测试

创建索引
判断索引是否存在
删除索引
创建文档
操作文档

// 坐标依赖
		org.springframework.bootspring-boot-starter-data-elasticsearch

// 核心代码            
package cn.gorit;

import cn.gorit.pojo.User;
import com.alibaba.fastjson.JSON;
import javafx.scene.control.IndexRange;
import org.apache.lucene.util.QueryBuilder;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedRequest;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContent;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.json.JSONObject;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.http.codec.cbor.Jackson2CborDecoder;

import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.TimeUnit;

/**
 * es 7.6.2 API 测试
 */
@SpringBootTest
class DemoApplicationTests {

	// 名称匹配
	@Autowired
	@Qualifier("restHighLevelClient")
	private RestHighLevelClient client;

	@Test
	void contextLoads() {

	}
	// 索引的创建
	@Test
	void testCreateIndex() throws IOException {
		// 1. 创建索引请求  等价于 PUT /gorit_index
		CreateIndexRequest request = new CreateIndexRequest("gorit_index");
		// 2. 执行创建请求 IndicesClient, 请求后获得响应
		CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
		System.out.println(response);
	}

	// 测试获取索引，判断其是否存在
	@Test
	void testGetIndexExist() throws IOException {
		GetIndexRequest request = new GetIndexRequest("gorit_index");
		boolean exist = client.indices().exists(request,RequestOptions.DEFAULT);
		System.out.println(exist);
	}

	// 删除索引
	@Test
	void testDeleteIndex() throws IOException {
		DeleteIndexRequest request = new DeleteIndexRequest("gorit_index");
		// 删除
		AcknowledgedResponse delete	= client.indices().delete(request,RequestOptions.DEFAULT);
		System.out.println(delete.isAcknowledged());
	}

	// 添加文档
	@Test
	void testAddDocument() throws IOException {
		// 创建对象
		User u = new User("Gorit",3);
		// 创建请求
		IndexRequest request = new IndexRequest("gorit_index");

		// 规则 PUT /gorit_index/_doc/1
		request.id("1");
		request.timeout(TimeValue.timeValueSeconds(3));
		request.timeout("1s");

		// 将数据放入请求 json
		IndexRequest source = request.source(JSON.toJSONString(u), XContentType.JSON);
		// 客户端发送请求
		IndexResponse response = client.index(request, RequestOptions.DEFAULT);

		System.out.println(response.toString());
		System.out.println(response.status());// 返回对应的状态 CREATED
	}

	// 获取文档，判断存在  get /index/_doc/1
	@Test
	void testIsExists() throws IOException {
		GetRequest getRequest = new GetRequest("gorit_index", "1");

		// 不获取返回的 _source 的上下文了
		getRequest.fetchSourceContext(new FetchSourceContext(false));
		getRequest.storedFields("_none_");

		boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
		System.out.println(exists);
	}

	// 获取文档信息
	@Test
	void testGetDocument() throws IOException {
		GetRequest getRequest = new GetRequest("gorit_index", "1");
		GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(getResponse.getSourceAsString());
		System.out.println(getResponse); // 返回全部的内容和命令是一样的
	}

	// 更新文档信息
	@Test
	void testUpdateDocument() throws IOException {
		UpdateRequest updateRequest = new UpdateRequest("gorit_index", "1");
		updateRequest.timeout("1s");

		User user = new User("CodingGoirt", 18);
		updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);

		UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(updateResponse.status());
		System.out.println(updateResponse); // 返回全部的内容和命令是一样的
	}

	// 删除文档记录
	@Test
	void testDeleteDocument() throws IOException {
		DeleteRequest deleteRequest = new DeleteRequest("gorit_index", "1");
		deleteRequest.timeout("1s");

		DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(deleteResponse.status());
		System.out.println(deleteResponse); // 返回全部的内容和命令是一样的
	}

	// 特殊的，真的项目。 批量插入数据

	@Test
	void testBulkRequest() throws IOException {
		BulkRequest bulkRequest = new BulkRequest();
		bulkRequest.timeout("10s");

		ArrayList userList = new ArrayList&lt;&gt;();
		userList.add(new User("张三1",1));
		userList.add(new User("张三2",2));
		userList.add(new User("张三3",3));
		userList.add(new User("张三4",4));
		userList.add(new User("张三5",5));
		userList.add(new User("张三6",6));
		userList.add(new User("张三7",7));

		// 批处理请求
		for (int i=0;iorg.jsoupjsoup1.10.2com.alibabafastjson1.2.68org.springframework.bootspring-boot-starter-data-elasticsearchorg.springframework.bootspring-boot-starter-thymeleaforg.springframework.bootspring-boot-starter-weborg.springframework.bootspring-boot-devtoolsruntimetrueorg.springframework.bootspring-boot-configuration-processortrueorg.projectlomboklomboktrueorg.springframework.bootspring-boot-starter-testtestorg.junit.vintagejunit-vintage-engine

爬虫

配置文件

package cn.gorit.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * Spring 步骤
 * 1. 找对象
 * 2. 放到 spring 中使用
 * 3. 分析源码
 *
 * @Classname ElasticSearchConfig
 * @Description TODO
 * @Date 2020/10/21 17:20
 * @Created by CodingGorit
 * @Version 1.0
 */
@Configuration // xml -bean
public class ElasticSearchConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")
                )
        );
        return client;
    }

}

> 爬取京东搜索的内容

config 配置类

package cn.gorit.util;

import cn.gorit.pojo.Content;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.stereotype.Component;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

/**
 * @Classname HtmlParseUtil
 * @Description TODO
 * @Date 2020/10/21 23:17
 * @Created by CodingGorit
 * @Version 1.0
 */
@Component
public class HtmlParseUtil {

//    public static void main(String[] args) throws Exception {
//        new HtmlParseUtil().parseJD("英语").forEach(System.out::println);
//    }

    public List parseJD(String keyword) throws Exception {
        // 请求 url
        // 联网，不能获取 ajax 数据
        String url = "https://search.jd.com/Search?keyword=wd&amp;enc=utf-8";
        // 解析网页 (返回的  Document 对象）
        Document document = Jsoup.parse(new URL(url.replace("wd",keyword)),30000);
        // 获取所有节点标签
        Element element = document.getElementById("J_goodsList");
        // 获取所有的 li 元素
        Elements elements = element.getElementsByTag("li");
        // 获取元素中的内容
        List goodsList = new ArrayList&lt;&gt;();
        for (Element e: elements) {
            String img = e.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = e.getElementsByClass("p-price").eq(0).text();
            String title = e.getElementsByClass("p-name").eq(0).text();

            goodsList.add(new Content(title,img,price));
//            System.out.println(img);
//            System.out.println(price);
//            System.out.println(title);
        }
        return goodsList;
    }
}

Service 方法

package cn.gorit.service;

import cn.gorit.pojo.Content;
import cn.gorit.util.HtmlParseUtil;
import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;

/**
 * @Classname ContentService
 * @Description TODO
 * @Date 2020/10/22 18:44
 * @Created by CodingGorit
 * @Version 1.0
 */
@Service
public class ContentService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    // 不能直接使用，只要 Spring 容器
    public static void main(String[] args) throws Exception {
        new ContentService().parseContent("java");
    }

    // 1. 解析数据放入 es 索引中
    public Boolean parseContent (String keywords) throws Exception {
        // 获取查询到的列表的信息
        List contents = new HtmlParseUtil().parseJD(keywords);
        // 把查询到的数据放入 es 中
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("2m");

        for (int i=0;i &lt; contents.size();++i) {
            bulkRequest.add(
                    new IndexRequest("jd_goods")
                    .source(JSON.toJSONString(contents.get(i)),XContentType.JSON));
        }
        BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulkResponse.hasFailures();
    }

    // 2. 获取这些数据，实现基本的搜索功能
    public List&gt; searchPagehighLight   (String keyword, int pageNo,int pageSize) throws IOException {
        if (pageNo &lt;= 1)
            pageNo = 1;

        // 条件清晰
        SearchRequest searchRequest = new SearchRequest("jd_goods");

        SearchSourceBuilder builder = new SearchSourceBuilder();

        builder.from(pageNo);
        builder.size(pageSize);
        // 精准匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword);
        builder.query(termQueryBuilder);
        builder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        // 高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.requireFieldMatch(false);
        highlightBuilder.preTags("<span style="color:#FF0000;">");
        highlightBuilder.postTags("</span>");
        builder.highlighter(highlightBuilder);

        // 执行搜索
        searchRequest.source(builder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        ArrayList&gt; list= new ArrayList&lt;&gt;();
        for (SearchHit hit: searchResponse.getHits().getHits()) {
            // 解析高亮的字段
            Map highlightFields = hit.getHighlightFields();
            HighlightField title = highlightFields.get("title");
            Map sourceAsMap = hit.getSourceAsMap();// 原来的结果
            // 解析高亮字段，将原来的字段换成我们高亮的字段即可
            if (title != null) {
                Text[] fragments = title.fragments();
                StringBuilder nTitle = new StringBuilder();
                for (Text text:fragments) {
                    nTitle.append(text);
                }
                sourceAsMap.put("title",nTitle);
            }
            list.add(hit.getSourceAsMap()); // 高亮的字段替换为原来的内容即可
        }
        return list;
    }

    // 2. 获取这些数据，实现基本的搜索功能
    public List&gt; searchPage (String keyword, int pageNo,int pageSize) throws IOException {
        if (pageNo &lt;= 1)
            pageNo = 1;

        // 条件清晰
        SearchRequest searchRequest = new SearchRequest("jd_goods");

        SearchSourceBuilder builder = new SearchSourceBuilder();

        builder.from(pageNo);
        builder.size(pageSize);
        // 精准匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword);
        builder.query(termQueryBuilder);
        builder.timeout(new TimeValue(60, TimeUnit.SECONDS));


        // 执行搜索
        searchRequest.source(builder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        ArrayList&gt; list= new ArrayList&lt;&gt;();
        for (SearchHit hit: searchResponse.getHits().getHits()) {

            list.add(hit.getSourceAsMap()); // 高亮的字段替换为原来的内容即可
        }
        return list;
    }
}

Controller

package cn.gorit.controller;

import cn.gorit.pojo.Content;
import cn.gorit.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.io.IOException;
import java.util.List;
import java.util.Map;

/**
 * @Classname ContentController
 * @Description TODO
 * @Date 2020/10/22 18:45
 * @Created by CodingGorit
 * @Version 1.0
 */
@RestController
public class ContentController {

    @Autowired
    private ContentService service;

    /**
     * 将数据添加到 ES 中
     * @param keyword
     * @return
     * @throws Exception
     */
    @GetMapping("/parse/{keyword}")
    public Boolean pares(@PathVariable("keyword")  String keyword) throws Exception {
        return service.parseContent(keyword);
    }

    /**
     * 查询 ES 的数据
     * @param keyword
     * @param pageNo
     * @param pageSize
     * @return
     * @throws IOException
     */
    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
    public List&gt; search(@PathVariable("keyword") String keyword,@PathVariable("pageNo") int pageNo, @PathVariable("pageSize") int pageSize) throws IOException {
        if (pageNo == 0) {
            pageNo = 1;
        }
        return service.searchPage(keyword, pageNo, pageSize);
    }
}

前后端分离

POSTMAN 测试

搜索高亮

> 一套项目，多端运用

十、总结

ElasticSearch 基本使用
SpringBoot 整合 ES
实战搜索

> 个人开源项目（Coding-With-Java ）欢迎大家点赞

点击查看更多内容

1人点赞

共同学习，写下你的评论

评论加载中...

作者其他优质文章

正在加载中

Gorit

学生

手记

篇

粉丝

获赞与收藏

244

关注作者，订阅最新文章

阅读免费教程

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

热搜

最近搜索清空

ElasticSearch 实用学习笔记

ElasticSearch

一、学习大纲

二、ElasticSearch 概述

三、ElasticSearch 安装

四、ES 核心概念

物理设计

文档

类型

索引

五、IK 分词器插件

六、 Rest 风格说明

6.1 创建索引

6.2 查询

6.3 修改索引

6.4 删除索引

七、关于文档的操作

7.1 基本操作 （复习巩固）

7.2 复杂的查询搜索：select（排序、分页、高亮、模糊查询、精确查询）！

7.3 排序

7.4 分页查询

7.5 filiter 区间查询

7.6 布尔值查询

7.7 匹配多个条件

7.7 精确查询

两个类型 text keyword

7.8 高亮查询

八、集成 SpringBoot

爬虫

前后端分离

搜索高亮

十、总结

ElasticSearch

一、学习大纲

二、ElasticSearch 概述

三、ElasticSearch 安装

四、ES 核心概念

物理设计

文档

类型

索引

五、IK 分词器插件

六、 Rest 风格说明

6.1 创建索引

6.2 查询

6.3 修改索引

6.4 删除索引

七、关于文档的操作

7.1 基本操作 （复习巩固）

7.2 复杂的查询搜索：select（排序、分页、高亮、模糊查询、精确查询）！

7.3 排序

7.4 分页查询

7.5 filiter 区间查询

7.6 布尔值查询

7.7 匹配多个条件

7.7 精确查询

两个类型 text keyword

7.8 高亮查询

八、集成 SpringBoot

爬虫

前后端分离

搜索高亮

十、总结

阅读免费教程

7.1 基本操作（复习巩固）

7.1 基本操作（复习巩固）