为了账号安全,请及时绑定邮箱和手机立即绑定

【译】Apache Hadoop 系列之二(HDFS命令指南)

标签:
Hadoop

概要

所有的HDFS命令使用bin/hdfs脚本来调用。空参数运行该脚本将展示所有命令的介绍。

使用方法: hdfs [SHELL_OPTIONS] COMMAND [GENERIC_OPTIONS] [COMMAND_OPTIONS]

Hadoop有一个选项解析框架,它采用解析通用选项以及运行类。

COMMAND_OPTIONSDescription
--config --loglevelThe common set of shell options. These are documented on the Commands Manual page.
GENERIC_OPTIONSThe common set of options supported by multiple commands. See the Hadoop Commands Manual for more information.
COMMAND   COMMAND_OPTIONSVarious commands with their options are described in the following sections. The commands have been grouped into User Commands and Administration Commands.

用户命令

对Hadoop集群用户有用的诸多命令。

classpath

用法: hdfs classpath [--glob |--jar <path> |-h |--help]

COMMAND_OPTIONDescription
--globexpand wildcards
--jar pathwrite classpath as manifest in jar named path
-h, --helpprint help

打印获取Hadoop jar以及依赖库所需的类路径。如果不带参数调用,则打印由命令脚本设置的类路径,该脚本可能在类路径条目中包含通配符。其他选项在通配符扩展后打印类路径,或将类路径写入jar文件的清单中。后者在无法使用通配符且扩展类路径超过支持的最大命令行长度的环境中非常有用。

dfs

用法: hdfs dfs [COMMAND [COMMAND_OPTIONS]]
在hadoop支持的文件系统上运行一个文件系统命令。COMMAND_OPTIONS变量可以在文件系统shell指南中找到。

fetchdt

用法: hdfs fetchdt <opts> <token_file_path>

COMMAND_OPTIONDescription
--webservice NN_UrlUrl to contact NN on (starts with http or https)
--renewer nameName of the delegation token renewer
--cancelCancel the delegation token
--renewRenew the delegation token. Delegation token must have been fetched using the –renewer name option.
--printPrint the delegation token
token_file_pathFile path to store the token into.

从NameNode获取委托令牌。详细内容请参见 fetchdt

fsck

用法:

   hdfs fsck <path>
          [-list-corruptfileblocks |
          [-move | -delete | -openforwrite]
          [-files [-blocks [-locations | -racks | -replicaDetails | -upgradedomains]]]
          [-includeSnapshots]
          [-storagepolicies] [-maintenance] [-blockId <blk_Id>]
COMMAND_OPTIONDescription
pathStart checking from this path.
-deleteDelete corrupted files.
-filesPrint out files being checked.
-files -blocksPrint out the block report
-files -blocks -locationsPrint out locations for every block.
-files -blocks -racksPrint out network topology for data-node locations.
-files -blocks -replicaDetailsPrint out each replica details.
-files -blocks -upgradedomainsPrint out upgrade domains for every block.
-includeSnapshotsInclude snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it.
-list-corruptfileblocksPrint out list of missing blocks and files they belong to.
-moveMove corrupted files to /lost+found.
-openforwritePrint out files opened for write.
-storagepoliciesPrint out storage policy summary for the blocks.
-maintenancePrint out maintenance state node details.
-blockIdPrint out information about the block.

运行HDFS文件系统检查实用程序。详细内容请参见 fsck

getconf

用法:

hdfs getconf -namenodeshdfs getconf -secondaryNameNodeshdfs getconf -backupNodeshdfs getconf -includeFilehdfs getconf -excludeFilehdfs getconf -nnRpcAddresseshdfs getconf -confKey [key]
COMMAND_OPTIONDescription
-namenodesgets list of namenodes in the cluster.
-secondaryNameNodesgets list of secondary namenodes in the cluster.
-backupNodesgets list of backup nodes in the cluster.
-includeFilegets the include file path that defines the datanodes that can join the cluster.
-excludeFilegets the exclude file path that defines the datanodes that need to decommissioned.
-nnRpcAddressesgets the namenode rpc addresses
-confKey [key]gets a specific key from the configuration

从配置目录中获取配置信息,进行后处理。

groups

用法: hdfs groups [username ...]
返回指定的一个或多个用户的组信息。

lsSnapshottableDir

用法: hdfs lsSnapshottableDir [-help]

COMMAND_OPTIONDescription
-helpprint help

返回快照目录列表。当使用超级用户运行时,会返回所有快照目录,否则返回属于该用于的快照目录。

jmxget

用法: hdfs jmxget [-localVM ConnectorURL | -port port | -server mbeanserver | -service service]

COMMAND_OPTIONDescription
-helpprint help
-localVM ConnectorURLconnect to the VM on the same machine
-port mbean server portspecify mbean server port, if missing it will try to connect to MBean Server in the same VM
-serverspecify mbean server (localhost by default)
-service NameNode|DataNodespecify jmx service. NameNode by default.

从服务中dump JMX信息

oev

用法: hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE

必需的命令行参数:

COMMAND_OPTIONDescription
-i,--inputFile argedits file to process, xml (case insensitive) extension means XML format, any other filename means binary format
-o,--outputFile argName of output file. If the specified file exists, it will be overwritten, format of the file is determined by -p option

可选命令行参数:

COMMAND_OPTIONDescription
-f,--fix-txidsRenumber the transaction IDs in the input, so that there are no gaps or invalid transaction IDs.
-h,--helpDisplay usage information and exit
-r,--recoverWhen reading binary edit logs, use recovery mode. This will give you the chance to skip corrupt parts of the edit log.
-p,--processor argSelect which type of processor to apply against image file, currently supported processors are: binary (native binary format that Hadoop uses), xml (default, XML format), stats (prints statistics about edits file)
-v,--verboseMore verbose output, prints the input and output filenames, for processors that write to a file, also output to screen. On large image files this will dramatically increase processing time (default is false).

Hadoop离线编辑查看器。 有关详细信息,请参阅Offline Edits Viewer Guide

oiv

用法: hdfs oiv [OPTIONS] -i INPUT_FILE

必需的命令行参数:

COMMAND_OPTIONDescription
-i--inputFile input fileSpecify the input fsimage file (or XML file, if ReverseXML processor is used) to process.

可选命令行参数:

COMMAND_OPTIONDescription
-o,--outputFile output fileSpecify the output filename, if the specified output processor generates one. If the specified file already exists, it is silently overwritten. (output to stdout by default) If the input file is an XML file, it also creates an <outputFile>.md5.
-p,--processor processorSpecify the image processor to apply against the image file. Currently valid options are Web (default), XML, Delimited, FileDistribution and ReverseXML.
-addr addressSpecify the address(host:port) to listen. (localhost:5978 by default). This option is used with Web processor.
-maxSize sizeSpecify the range [0, maxSize] of file sizes to be analyzed in bytes (128GB by default). This option is used with FileDistribution processor.
-step sizeSpecify the granularity of the distribution in bytes (2MB by default). This option is used with FileDistribution processor.
-formatFormat the output result in a human-readable fashion rather than a number of bytes. (false by default). This option is used with FileDistribution processor.
-delimiter argDelimiting string to use with Delimited processor.
-t,--temp temporary dirUse temporary dir to cache intermediate result to generate Delimited outputs. If not set, Delimited processor constructs the namespace in memory before outputting text.
-h,--helpDisplay the tool usage and help information and exit.

针对image文件的hadoop离线查看器(hadoop2.4及以上版本)。详见 Offline Image Viewer Guide

oiv_legacy

用法: hdfs oiv_legacy [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE

COMMAND_OPTIONDescription
-i,--inputFile input fileSpecify the input fsimage file to process.
-o,--outputFile output fileSpecify the output filename, if the specified output processor generates one. If the specified file already exists, it is silently overwritten

可选命令行参数:

COMMAND_OPTIONDescription
-p|--processor processorSpecify the image processor to apply against the image file. Valid options are Ls (default), XML, Delimited, Indented, FileDistribution and NameDistribution.
-maxSize sizeSpecify the range [0, maxSize] of file sizes to be analyzed in bytes (128GB by default). This option is used with FileDistribution processor.
-step sizeSpecify the granularity of the distribution in bytes (2MB by default). This option is used with FileDistribution processor.
-formatFormat the output result in a human-readable fashion rather than a number of bytes. (false by default). This option is used with FileDistribution processor.
-skipBlocksDo not enumerate individual blocks within files. This may save processing time and outfile file space on namespaces with very large files. The Ls processor reads the blocks to correctly determine file sizes and ignores this option.
-printToScreenPipe output of processor to console as well as specified file. On extremely large namespaces, this may increase processing time by an order of magnitude.
-delimiter argWhen used in conjunction with the Delimited processor, replaces the default tab delimiter with the string specified by arg.
-h|--helpDisplay the tool usage and help information and exit.

针对老版本的image文件的hadoop离线查看器。详见HDFS Snapshot Documentation

version

用法: hdfs version
打印版本。

管理命令

对Hadoop集群管理员有用的诸多命令。

balancer

用法:

hdfs balancer
          [-policy <policy>]
          [-threshold <threshold>]
          [-exclude [-f <hosts-file> | <comma-separated list of hosts>]]
          [-include [-f <hosts-file> | <comma-separated list of hosts>]]
          [-source [-f <hosts-file> | <comma-separated list of hosts>]]
          [-blockpools <comma-separated list of blockpool ids>]
          [-idleiterations <idleiterations>]
          [-runDuringUpgrade]
COMMAND_OPTIONDescription
-policy <policy>atanode (default): Cluster is balanced if each datanode is balanced.blockpool: Cluster is balanced if each block pool in each datanode is balanced.
-threshold <threshold>Percentage of disk capacity. This overwrites the default threshold.
-exclude -f <hosts-file> | <comma-separated list of hosts>Excludes the specified datanodes from being balanced by the balancer.
-include -f <hosts-file> | <comma-separated list of hosts>Includes only the specified datanodes to be balanced by the balancer.
-source -f <hosts-file> | <comma-separated list of hosts>Pick only the specified datanodes as source nodes.
-blockpools <comma-separated list of blockpool ids>The balancer will only run on blockpools included in this list.
-idleiterations <iterations>Maximum number of idle iterations before exit. This overwrites the default idleiterations(5).
-runDuringUpgradeWhether to run the balancer during an ongoing HDFS upgrade. This is usually not desired since it will not affect used space on over-utilized machines.
-h|--helpDisplay the tool usage and help information and exit.

运行集群平衡器应用程序,管理员可以简单地通过Ctrl-C指令停止平衡器进程。更多信息详见Balancer

注意,blockpool策略比datanode策略更严格。

除上述命令选项外,从2.7.0之后还引入了一个固定功能,以防止某些副本被平衡器/移动器移动。该功能默认情况下禁用,可通过配置属性“dfs.datanode.block-pinning.enabled”来启用。启用时,此功能仅影响调用create()从而写入指定节点的数据块。对于HBase Regionserver等应用程序,我们希望维护数据局部性时,此功能非常有用。

cacheadmin

用法:

hdfs cacheadmin [-addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]]
hdfs cacheadmin [-modifyDirective -id <id> [-path <path>] [-force] [-replication <replication>] [-pool <pool-name>] [-ttl <time-to-live>]]
hdfs cacheadmin [-listDirectives [-stats] [-path <path>] [-pool <pool>] [-id <id>]]
hdfs cacheadmin [-removeDirective <id>]
hdfs cacheadmin [-removeDirectives -path <path>]
hdfs cacheadmin [-addPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>]]
hdfs cacheadmin [-modifyPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>]]
hdfs cacheadmin [-removePool <name>]
hdfs cacheadmin [-listPools [-stats] [<name>]]
hdfs cacheadmin [-help <command-name>]

更多信息请查看 HDFS Cache Administration Documentation

crypto

用法:

hdfs crypto -createZone -keyName <keyName> -path <path>hdfs crypto -listZones
hdfs crypto -provisionTrash -path <path>hdfs crypto -help <command-name>

更多信息请查看HDFS Transparent Encryption Documentation

datanode

用法: hdfs datanode [-regular | -rollback | -rollingupgrade rollback]

COMMAND_OPTIONDescription
-regularNormal datanode startup (default).
-rollbackRollback the datanode to the previous version. This should be used after stopping the datanode and distributing the old hadoop version.
-rollingupgrade rollbackRollback a rolling upgrade operation.

运行一个HDFS datanode

dfsadmin

用法:

    hdfs dfsadmin [-report [-live] [-dead] [-decommissioning] [-enteringmaintenance] [-inmaintenance]]    hdfs dfsadmin [-safemode enter | leave | get | wait | forceExit]
    hdfs dfsadmin [-saveNamespace]
    hdfs dfsadmin [-rollEdits]
    hdfs dfsadmin [-restoreFailedStorage true |false |check]
    hdfs dfsadmin [-refreshNodes]
    hdfs dfsadmin [-setQuota <quota> <dirname>...<dirname>]
    hdfs dfsadmin [-clrQuota <dirname>...<dirname>]
    hdfs dfsadmin [-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>]    hdfs dfsadmin [-clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>]    hdfs dfsadmin [-finalizeUpgrade]
    hdfs dfsadmin [-rollingUpgrade [<query> |<prepare> |<finalize>]]    hdfs dfsadmin [-refreshServiceAcl]
    hdfs dfsadmin [-refreshUserToGroupsMappings]
    hdfs dfsadmin [-refreshSuperUserGroupsConfiguration]
    hdfs dfsadmin [-refreshCallQueue]
    hdfs dfsadmin [-refresh <host:ipc_port> <key> [arg1..argn]]    hdfs dfsadmin [-reconfig <namenode|datanode> <host:ipc_port> <start |status |properties>]
    hdfs dfsadmin [-printTopology]
    hdfs dfsadmin [-refreshNamenodes datanodehost:port]
    hdfs dfsadmin [-getVolumeReport datanodehost:port]
    hdfs dfsadmin [-deleteBlockPool datanode-host:port blockpoolId [force]]    hdfs dfsadmin [-setBalancerBandwidth <bandwidth in bytes per second>]
    hdfs dfsadmin [-getBalancerBandwidth <datanode_host:ipc_port>]
    hdfs dfsadmin [-fetchImage <local directory>]
    hdfs dfsadmin [-allowSnapshot <snapshotDir>]
    hdfs dfsadmin [-disallowSnapshot <snapshotDir>]
    hdfs dfsadmin [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]    hdfs dfsadmin [-evictWriters <datanode_host:ipc_port>]
    hdfs dfsadmin [-getDatanodeInfo <datanode_host:ipc_port>]
    hdfs dfsadmin [-metasave filename]
    hdfs dfsadmin [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]    hdfs dfsadmin [-listOpenFiles]
    hdfs dfsadmin [-help [cmd]]

运行一个HDFS dfsadmin客户端。

dfsrouter

用法: hdfs dfsrouter

运行DFS router。详见Router

dfsrouteradmin

用法:

hdfs dfsrouteradmin
      [-add <source> <nameservice> <destination> [-readonly] -owner <owner> -group <group> -mode <mode>]
      [-rm <source>]
      [-ls <path>]
      [-safemode enter | leave | get]

journalnode

用法: hdfs journalnode
该命令启动一个journalnode,详见HDFS HA with QJM

namenode

用法:

hdfs namenode [-backup] |
          [-checkpoint] |
          [-format [-clusterid cid ] [-force] [-nonInteractive] ] |
          [-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |
          [-upgradeOnly [-clusterid cid] [-renameReserved<k-v pairs>] ] |
          [-rollback] |
          [-rollingUpgrade <rollback|downgrade |started> ] |
          [-finalize] |
          [-importCheckpoint] |
          [-initializeSharedEdits] |
          [-bootstrapStandby [-force] [-nonInteractive] [-skipSharedEditsCheck] ] |
          [-recover [-force] ] |
          [-metadataVersion ]



作者:Kooola大数据
链接:https://www.jianshu.com/p/0b64608291c9


点击查看更多内容
TA 点赞

若觉得本文不错,就分享一下吧!

评论

作者其他优质文章

正在加载中
  • 推荐
  • 评论
  • 收藏
  • 共同学习,写下你的评论
感谢您的支持,我会继续努力的~
扫码打赏,你说多少就多少
赞赏金额会直接到老师账户
支付方式
打开微信扫一扫,即可进行扫码打赏哦
今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与 放弃机会
意见反馈 帮助中心 APP下载
官方微信

举报

0/150
提交
取消