2 回答

TA贡献1868条经验 获得超4个赞
我会使用排除停用词列表中的单词来处理文本内容array_filter
,然后计算每个单词的出现次数array_count_values
,然后array_filter
计算出只出现一次的单词。然后您可以将剩余的单词(这将是输出数组的键)写入数据库。例如:
$content = "How technology is helping to change the way people think about the food on their plate and the food impact for them. Technology could have a role to play in raising awareness of the impact our diets have on the planet.";
$stopwords = array('how', 'is', 'to', 'the', 'way', 'on', 'and', 'for', 'a', 'in', 'of', 'our', 'have');
// count all words in $content not in the stopwords list
$counts = array_count_values(array_filter(explode(' ', strtolower($content)), function ($w) use ($stopwords) {
return !in_array($w, $stopwords);
}));
// filter out words only seen once
$counts = array_filter($counts, function ($v) { return $v > 1; });
// write those words to the database
foreach ($counts as $key => $value) {
$this->db->query("INSERT INTO news (news_id, news_content) VALUES ('$id', '$key')");
}
对于您的示例数据,最终结果$counts将是:
Array
(
[technology] => 2
[food] => 2
[impact] => 2
)

TA贡献1830条经验 获得超9个赞
我相信这里有很多选择。
这是我的解决方案:您可以使用search_array()它。如果在数组中的 in_array 中未找到其他针,则搜索数组返回 false。如果找到另一个词,它会返回密钥。
根据您的需要,您可以使用以下这些选项之一。
//Option 1
//Words that actually appear more than once...
$new_arr = array();
foreach($exp as $key=>$e) {
//Must be this word only (therefore the true-statement
$search = array_search($e, $exp, true);
if ($search !== false && $search != $key) {
$new_arr[] = $e;
}
}
//Option 2
//
//Your question was not totally clear so I add this code as well
//Words with asterixes before and after that appear more than once
$new_arr = array();
foreach($exp as $key=>$e) {
//Two asterixes at the beginning of the sting and two at the end
//strtolower sets **Technology** and **technology** as a duplicate of word
if (substr($e,0,2) == "**" && substr($e,-2,2) == "**") {
$search = array_search(strtolower($e), $exp);
if ($search !== false && $search != $key) {
$new_arr[] = $e;
}
}
}
for($j = 0; $j < count($new_arr); $j++){
$this->db->query("INSERT INTO news (news_id, news_content)
VALUES ('$id', $new_arr[$j])");
}
正如有人在评论中提到的那样,您应该通过在 INSERT 语句中输入这种方式来防止 SQL 注入(您应该这样做),但问题主要是关于在字符串中查找重复项以对它们执行某些操作,因此我不会更进一步有了那个评论。
结果数组$new_arr如下:(选项 1)
array (size=9)
0 => string 'the' (length=3)
1 => string 'the' (length=3)
2 => string '**food**' (length=8)
3 => string 'to' (length=2)
4 => string 'the' (length=3)
5 => string '**impact**' (length=10)
6 => string 'have' (length=4)
7 => string 'on' (length=2)
8 => string 'the' (length=3)
Technology和technology之所以不一样,是因为它在其中一个词中是大写的 T。
结果数组$new_arr如下:(选项 2)
array (size=3)
0 => string '**food**' (length=8)
1 => string '**Technology**' (length=14)
2 => string '**impact**' (length=10)
- 2 回答
- 0 关注
- 135 浏览
添加回答
举报