博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
hdfs操作手册
阅读量:4605 次
发布时间:2019-06-09

本文共 4151 字,大约阅读时间需要 13 分钟。

hdfscli 命令行

# hdfscli --helpHdfsCLI: a command line interface for HDFS.Usage:  hdfscli [interactive] [-a ALIAS] [-v...]  hdfscli download [-fsa ALIAS] [-v...] [-t THREADS] HDFS_PATH LOCAL_PATH  hdfscli upload [-sa ALIAS] [-v...] [-A | -f] [-t THREADS] LOCAL_PATH HDFS_PATH  hdfscli -L | -V | -hCommands:  download                      Download a file or folder from HDFS. If a                                single file is downloaded, - can be                                specified as LOCAL_PATH to stream it to                                standard out.  interactive                   Start the client and expose it via the python                                interpreter (using iPython if available).  upload                        Upload a file or folder to HDFS. - can be                                specified as LOCAL_PATH to read from standard                                in.Arguments:  HDFS_PATH                     Remote HDFS path.  LOCAL_PATH                    Path to local file or directory.Options:  -A --append                   Append data to an existing file. Only supported                                if uploading a single file or from standard in.  -L --log                      Show path to current log file and exit.  -V --version                  Show version and exit.  -a ALIAS --alias=ALIAS        Alias of namenode to connect to.  -f --force                    Allow overwriting any existing files.  -s --silent                   Don't display progress status.  -t THREADS --threads=THREADS  Number of threads to use for parallelization.                                0 allocates a thread per file. [default: 0]  -v --verbose                  Enable log output. Can be specified up to three                                times (increasing verbosity each time).Examples:  hdfscli -a prod /user/foo  hdfscli download features.avro dat/  hdfscli download logs/1987-03-23 - >>logs  hdfscli upload -f - data/weights.tsv 

  

要使用hdfscli,首先需要设置hdfscli的默认配置文件

# cat ~/.hdfscli.cfg [global]default.alias = dev[dev.alias]url = http://hadoop:50070user = root

  python可用的客户端类:

    InsecureClient(default)

    TokenClient

 

 上传或下载文件

使用hdfscli上传文件或文件夹(将hadoop文件夹上传到/hdfs)

  # hdfscli upload --alias=dev -f /hadoop-2.4.1/etc/hadoop/ /hdfs

使用hdfscli下载/logs目录到操作系统的/root/test目录下  

  # hdfscli download /logs /root/test/

 

hdfscli 交互模式

[root@hadoop ~]# hdfscli --alias=devWelcome to the interactive HDFS python shell.The HDFS client is available as `CLIENT`.>>> CLIENT.list("/")[u'Demo', u'hdfs', u'logs', u'logss']>>> CLIENT.status("/Demo")  {u'group': u'supergroup', u'permission': u'755', u'blockSize': 0, u'accessTime': 0, u'pathSuffix': u'', u'modificationTime': 1495123035501L,  u'replication': 0, u'length': 0, u'childrenNum': 1, u'owner': u'root',  u'type': u'DIRECTORY', u'fileId': 16389}>>> CLIENT.delete("logs/install.log")False>>> CLIENT.delete("/logs/install.log")         True

  

与python接口的绑定

  初始化客户端

  1、导入client类,然后调用它的构造函数

>>> from hdfs import InsecureClient>>> client = InsecureClient("http://172.10.236.21:50070",user='ann')>>> client.list("/")[u'Demo', u'hdfs', u'logs', u'logss']

  2、导入config类,加载一个已存在的配置文件并且从已存在的alias创建一个client,配置文件默认的读取文件为~/.hdfs_config.cfg

>>> from hdfs import Config>>> client=Config().get_client("dev")>>> client.list("/")   [u'Demo', u'hdfs', u'logs', u'logss']

  

  读文件

  read()方法可从hdfs系统读取一个文件,但是它必须放在with块中,以确保每次都能正确关闭连接

>>> with client.read("/logs/yarn-env.sh",encoding="utf-8") as reader:...   features=reader.read()... >>> print features

  chunk_size参数将返回一个生成器,它使文件的内容变成流数据

>>> with client.read("/logs/yarn-env.sh",chunk_size=1024) as reader:...   for chunk in reader:...      print chunk...

  delimiter参数同样返回一个生成器,文件内容是被指定符号分隔的

>>> with client.read("/logs/yarn-env.sh", encoding="utf-8", delimiter="\n") as reader:...   for line in reader:...     time.sleep(1)...     print line

  写文件

write方法用于写文件到hdfs(将本地文件kong.txt写入hdfs的/logs/kongtest.txt文件中)

>>> with open("/root/test/kong.txt") as reader, client.write("/logs/kongtest.txt") as writer:...   for line in reader:...     if line.startswith("-"):...       writer.write(line)

  

 

转载于:https://www.cnblogs.com/kongzhagen/p/6877472.html

你可能感兴趣的文章
java 从键盘录入的三种方法
查看>>
使用jQuery和YQL,以Ajax方式加载外部内容
查看>>
pyspider 示例
查看>>
电路板工艺中的NPTH和PTH
查看>>
JNI实现JAVA和C++互相调用
查看>>
JAVA 笔记(一)
查看>>
js 循环读取 json的值
查看>>
c# 范型Dictionary实用例子
查看>>
C#实现动态页面静态化
查看>>
可选参数、命名参数、.NET的特殊类型、特性
查看>>
利用CGLib实现动态代理实现Spring的AOP
查看>>
面试之SQL(1)--选出选课数量>=2的学号
查看>>
IIS处理并发请求时出现的问题
查看>>
数学作业
查看>>
使用pycharm开发web——django2.1.5(二)创建一个app并做一些配置
查看>>
[ZPG TEST 105] 扑克游戏【Huffman】
查看>>
_bzoj2005 [Noi2010]能量采集
查看>>
pat 团体天梯赛 L3-010. 是否完全二叉搜索树
查看>>
烟草MES系统介绍-序
查看>>
优先队列小结
查看>>