hdfscli 命令行
# hdfscli --helpHdfsCLI: a command line interface for HDFS.Usage: hdfscli [interactive] [-a ALIAS] [-v...] hdfscli download [-fsa ALIAS] [-v...] [-t THREADS] HDFS_PATH LOCAL_PATH hdfscli upload [-sa ALIAS] [-v...] [-A | -f] [-t THREADS] LOCAL_PATH HDFS_PATH hdfscli -L | -V | -hCommands: download Download a file or folder from HDFS. If a single file is downloaded, - can be specified as LOCAL_PATH to stream it to standard out. interactive Start the client and expose it via the python interpreter (using iPython if available). upload Upload a file or folder to HDFS. - can be specified as LOCAL_PATH to read from standard in.Arguments: HDFS_PATH Remote HDFS path. LOCAL_PATH Path to local file or directory.Options: -A --append Append data to an existing file. Only supported if uploading a single file or from standard in. -L --log Show path to current log file and exit. -V --version Show version and exit. -a ALIAS --alias=ALIAS Alias of namenode to connect to. -f --force Allow overwriting any existing files. -s --silent Don't display progress status. -t THREADS --threads=THREADS Number of threads to use for parallelization. 0 allocates a thread per file. [default: 0] -v --verbose Enable log output. Can be specified up to three times (increasing verbosity each time).Examples: hdfscli -a prod /user/foo hdfscli download features.avro dat/ hdfscli download logs/1987-03-23 - >>logs hdfscli upload -f - data/weights.tsv
要使用hdfscli,首先需要设置hdfscli的默认配置文件
# cat ~/.hdfscli.cfg [global]default.alias = dev[dev.alias]url = http://hadoop:50070user = root
python可用的客户端类:
InsecureClient(default)
TokenClient
上传或下载文件
使用hdfscli上传文件或文件夹(将hadoop文件夹上传到/hdfs)
# hdfscli upload --alias=dev -f /hadoop-2.4.1/etc/hadoop/ /hdfs
使用hdfscli下载/logs目录到操作系统的/root/test目录下
# hdfscli download /logs /root/test/
hdfscli 交互模式
[root@hadoop ~]# hdfscli --alias=devWelcome to the interactive HDFS python shell.The HDFS client is available as `CLIENT`.>>> CLIENT.list("/")[u'Demo', u'hdfs', u'logs', u'logss']>>> CLIENT.status("/Demo") {u'group': u'supergroup', u'permission': u'755', u'blockSize': 0, u'accessTime': 0, u'pathSuffix': u'', u'modificationTime': 1495123035501L, u'replication': 0, u'length': 0, u'childrenNum': 1, u'owner': u'root', u'type': u'DIRECTORY', u'fileId': 16389}>>> CLIENT.delete("logs/install.log")False>>> CLIENT.delete("/logs/install.log") True
与python接口的绑定
初始化客户端
1、导入client类,然后调用它的构造函数
>>> from hdfs import InsecureClient>>> client = InsecureClient("http://172.10.236.21:50070",user='ann')>>> client.list("/")[u'Demo', u'hdfs', u'logs', u'logss']
2、导入config类,加载一个已存在的配置文件并且从已存在的alias创建一个client,配置文件默认的读取文件为~/.hdfs_config.cfg
>>> from hdfs import Config>>> client=Config().get_client("dev")>>> client.list("/") [u'Demo', u'hdfs', u'logs', u'logss']
读文件
read()方法可从hdfs系统读取一个文件,但是它必须放在with块中,以确保每次都能正确关闭连接
>>> with client.read("/logs/yarn-env.sh",encoding="utf-8") as reader:... features=reader.read()... >>> print features
chunk_size参数将返回一个生成器,它使文件的内容变成流数据
>>> with client.read("/logs/yarn-env.sh",chunk_size=1024) as reader:... for chunk in reader:... print chunk...
delimiter参数同样返回一个生成器,文件内容是被指定符号分隔的
>>> with client.read("/logs/yarn-env.sh", encoding="utf-8", delimiter="\n") as reader:... for line in reader:... time.sleep(1)... print line
写文件
write方法用于写文件到hdfs(将本地文件kong.txt写入hdfs的/logs/kongtest.txt文件中)
>>> with open("/root/test/kong.txt") as reader, client.write("/logs/kongtest.txt") as writer:... for line in reader:... if line.startswith("-"):... writer.write(line)