为了账号安全,请及时绑定邮箱和手机立即绑定

在 python 中通过 Elastic Search 搜索唯一值

在 python 中通过 Elastic Search 搜索唯一值

DIEA 2022-10-06 19:01:00
我正在尝试在“描述”列中获取唯一值。根据我的数据,我有很多类似的描述。我只想要独特的。con.search(index='data', body={        "aggs": {            "query": {                "match": {"description": query_input}            },            "size": 30,            "distinct_description": {            }        }    })然而,这根本行不通。有什么建议么。例子:{id: 1, state: "OP", description: "hot and humid"}{id: 2, state: "LO", description: "dry"}{id: 3, state: "WE", description: "hot and humid"}{id: 4, state: "OP", description: "green and vegetative"}{id: 5, state: "HP", description: "dry"}结果:{id: 1, state: "OP", description: "hot and humid"}{id: 2, state: "LO", description: "dry"}{id: 4, state: "OP", description: "green and vegetative"}
查看完整描述

1 回答

?
qq_遁去的一_1

TA贡献1725条经验 获得超7个赞

description.keyword您应该尝试对子字段进行术语聚合:


body = {

  "query": {

    "match": {"state": query_input}

  },

   "size":1000,

  "aggs": {

    "distinct_descriptions": {

      "terms": {

        "field": "description.keyword"

      }

    }

  }

}


result = con.search(index='data', body=body)

occurrences_list = list()

occurrences_dict = {"description":None, "score":None}

for res in result["aggregations"]["distinct_descriptions"]["buckets"]:

    occurrences_dict["description"] = {res['key'] : res['doc_count'] }

    occurrences_list.append( occurrences_dict )


for res in result["hits"]["hits"]:

    for elem in occurrences_list:

        if res["_source"]["description"] == elem['description']:

            if not elem["score"]:

                elem["score"] = res["_score"]

注意星期一产生的查询,现在还有一个大小参数,否则elasticsearch默认只检索20个命中


查看完整回答
反对 回复 2022-10-06
  • 1 回答
  • 0 关注
  • 131 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信