首页猿问快读大表。

快读大表。

慕工程0101907 2022-12-24 10:18:03

我的 csv 文件结构如下：1,0,2.2,0,0,0,0,1.2,00,1,2,4,0,1,0.2,0.1,00,0,2,3,0,0,0,1.2,2.10,0,0,1,2,1,0,0.2,0.10,0,1,0,2.1,0.1,0,1.20,0,2,3,0,1.1,0.1,1.20,0.2,0,1.2,2,0,3.2,00,0,1.2,0,2.2,0,0,1.1但有 10k 列和 10k 行。我想以这样的方式阅读它，在结果中我得到一个字典，其中 Key 作为行的索引，Value 作为 float 数组，其中包含该行中的每个值。现在我的代码看起来像这样： var lines = File.ReadAllLines(filePath).ToList(); var result = lines.AsParallel().AsOrdered().Select((line, index) => { var values = line?.Split(',').Where(v =>!string.IsNullOrEmpty(v)) .Select(f => f.Replace('.', ',')) .Select(float.Parse).ToArray(); return (index, values); }).ToDictionary(d => d.Item1, d => d.Item2);但它最多需要 30 秒才能完成，所以它很慢，我想优化它以使其更快一些。

查看完整描述

3 回答

一只斗牛犬

TA贡献1784条经验获得超2个赞

虽然您可以进行许多小的优化，但真正让您丧命的是垃圾收集器，因为所有的分配。

你的代码在我的机器上运行需要 12 秒。读取文件使用了这 12 秒中的 2 秒。

通过使用评论中提到的所有优化（使用File.ReadLines, StringSplitOptions.RemoveEmptyEntries，也使用float.Parse(f, CultureInfo.InvariantCulture)而不是调用string.Replace），我们将时间缩短到 9 秒。仍有很多分配已完成，尤其是File.ReadLines. 我们能做得更好吗？

只需在 app.config 中激活服务器 GC：

</runtime>

这样，使用您的代码执行时间下降到 6 秒，使用上述优化后执行时间下降到 3 秒。那时，文件 I/O 占用了超过 60% 的执行时间，因此不值得进一步优化。

代码的最终版本：

var lines = File.ReadLines(filePath);

var separator = new[] {','};

var result = lines.AsParallel().AsOrdered().Select((line, index) =>

{

var values = line?.Split(separator, StringSplitOptions.RemoveEmptyEntries)

.Select(f => float.Parse(f, CultureInfo.InvariantCulture)).ToArray();

return (index, values);

}).ToDictionary(d => d.Item1, d => d.Item2);

反对回复 2022-12-24

料青山看我应如是

TA贡献1772条经验获得超8个赞

用手动解析替换SplitandReplace并使用InvariantInfo接受句点作为小数点，然后删除浪费ReadAllLines().ToList()并AsParallel()在解析时从文件中读取，在我的 PC 上加速了大约四倍。

var lines = File.ReadLines(filepath);

var result = lines.AsParallel().AsOrdered().Select((line, index) => {

var values = new List<float>(10000);

var pos = 0;

while (pos < line.Length) {

var commapos = line.IndexOf(',', pos);

commapos = commapos < 0 ? line.Length : commapos;

var fs = line.Substring(pos, commapos - pos);

if (fs != String.Empty) // remove if no value is ever missing

values.Add(float.Parse(fs, NumberFormatInfo.InvariantInfo));

pos = commapos + 1;

}

return values;

}).ToList();

也用 a代替ToArray，因为它通常更快（优于）。valuesListToListToArray

反对回复 2022-12-24

哆啦的时光机

TA贡献1779条经验获得超6个赞

using Microsoft.VisualBasic.FileIO;

protected void CSVImport(string importFilePath)

{

string csvData = System.IO.File.ReadAllText(importFilePath, System.Text.Encoding.GetEncoding("WINDOWS-1250"));

foreach (string row in csvData.Split('\n'))

{

var parser = new TextFieldParser(new StringReader(row));

parser.HasFieldsEnclosedInQuotes = true;

parser.SetDelimiters(",");

string[] fields;

fields = parser.ReadFields();

//do what you need with data in array

}

反对回复 2022-12-24

3 回答
0 关注
172 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

快读大表。

快读大表。

3 回答

添加回答