2 回答

TA贡献1806条经验 获得超8个赞
您可以立即进行过滤,将条件更改if ($perc != 100)为if ($perc > 20),以便只保留您想要删除的类似帖子。然后,您甚至可以完全跳过存储相似性,因为您已经有了要删除的帖子 ID 数组列表。
所以,当你有这样的代码时:
if ($perc > 20) {
$similarityPercentageArr[$currentPost['ID']][] = $comparePost['ID'];
}
然后,您可以像这样删除所有不需要的帖子:
$postsToRemove = [];
$postsToKeep = [];
foreach ($similarityPercentageArr as $postId => $similarPostIds) {
// this post has already appeared as similar somewhere, so its similar posts have already been added
if (in_array($postId, $postsToRemove)) {
continue;
}
$postsToKeep[] = $postId;
$postsToRemove = array_merge($postsToRemove, $similarPostIds);
}
现在您在 中拥有原始帖子 ID $postsToKeep,以及在 中的相似之处的 ID $postsToRemove。
我还会稍微优化一下代码,这样similar_text当您知道您正在将帖子与其自身进行比较时,您根本不会调用。因此,if (!is_null($comparePost['ID']))您将拥有if (!is_null($comparePost['ID']) && $comparePost['ID'] !== $currentPost['ID']).

TA贡献1817条经验 获得超14个赞
similar_text — Calculate the similarity between two strings
levenshtein — Calculate Levenshtein distance between two strings
soundex — Calculate the soundex key of a string
关于您的问题,在阅读后,似乎标题与您的查询不太匹配!
仅仅通过另一个条件还不够吗?
<?php
$posts = [
'post_count' => 3,
'posts' => [
[
'ID' => 1,
'post_content' => "Wrong do point avoid by fruit learn or in death. So passage however besides invited comfort elderly be me. Walls began of child civil am heard hoped my. Satisfied pretended mr on do determine by.",
],
[
'ID' => 2,
'post_content' => "Lorem ipsum dolor sit"
],
[
'ID' => 3,
'post_content' => "Months on ye at by esteem desire warmth former. Sure that that way gave any fond now. His boy middleton sir nor engrossed affection excellent."
],
[
'ID' => 4,
'post_content' => "Lorem ipsum dolor sit"
],
]
];
print_r($posts);
function getNonSimilarTexts($posts)
{
$similarityPercentageArr = array();
for ($i = 0; $i <= $posts['post_count']; $i++) {
// $posts->the_post();
$currentPost = $posts['posts'][$i];
if (!is_null($currentPost['ID'])) {
for ($y = 0; $y <= $posts['post_count']; $y++) {
$comparePost = $posts['posts'][$y];
if (!is_null($comparePost['ID'])) {
similar_text(strip_tags($currentPost['post_content']), strip_tags($comparePost['post_content']), $perc);
// similarity is 100 if self compare and more than 20
if ($perc != 100 && $perc > 20) {
array_push($similarityPercentageArr, [$currentPost['ID'], $comparePost['ID'], $perc]);
}
}
}
}
}
return $similarityPercentageArr;
}
$p = getNonSimilarTexts($posts);
print_r($p);
输出:
Array
(
[0] => Array
(
[0] => 1
[1] => 3
[2] => 23.145400593472
)
)
- 2 回答
- 0 关注
- 157 浏览
添加回答
举报