3 回答
TA贡献1810条经验 获得超4个赞
headtailsed
sed 'NUMq;d' file
NUMsed '10q;d' filefile.
NUMqNUM.
dq
NUM
sed "${NUM}q;d" fileTA贡献1878条经验 获得超4个赞
设置
我只需要提取行的一个子集就可以对数据做任何有用的事情。 阅读每一行,直到我关心的值,将需要很长的时间。 如果解决方案读取了我关心的行,并继续读取文件的其余部分,那么它将浪费时间读取将近30亿个不相关的行,并且花费比需要长6倍的时间。
time
基线
head tail
$ time head -50000000 myfile.ascii | tail -1pgm_icnt = 0real 1m15.321s
切
$ time cut -f50000000 -d$'\n' myfile.ascii pgm_icnt = 0real 5m12.156s
AWK
exit
$ time awk 'NR == 50000000 {print; exit}' myfile.ascii
pgm_icnt = 0real 1m16.583sPerl
$ time perl -wnl -e '$.== 50000000 && print && exit;' myfile.ascii pgm_icnt = 0real 1m13.146s
SED
$ time sed "50000000q;d" myfile.ascii pgm_icnt = 0real 1m12.705s
地图档
结语
head tailsed
% = (runtime/baseline - 1) * 100)
第50,000,000行
00:01:12.705 (-00:00:02.616 = -3.47%)
sed00:01:13.146 (-00:00:02.175 = -2.89%)
perl00:01:15.321 (+00:00:00.000 = +0.00%)
head|tail00:01:16.583 (+00:00:01.262 = +1.68%)
awk00:05:12.156 (+00:03:56.835 = +314.43%)
cut
第500,000,000行
00:12:07.050 (-00:00:26.160)
sed00:12:11.460 (-00:00:21.750)
perl00:12:33.210 (+00:00:00.000)
head|tail00:12:45.830 (+00:00:12.620)
awk00:52:01.560 (+00:40:31.650)
cut
第3,338,559,320行
01:20:54.599 (-00:03:05.327)
sed01:21:24.045 (-00:02:25.227)
perl01:23:49.273 (+00:00:00.000)
head|tail01:25:13.548 (+00:02:35.735)
awk05:47:23.026 (+04:24:26.246)
cut
- 3 回答
- 0 关注
- 749 浏览
添加回答
举报
