How to move multiple files at once from local-server to HDFS inside python? How to move multiple files at once from local-server to HDFS inside python? hadoop hadoop

How to move multiple files at once from local-server to HDFS inside python?


add shell=True:

>>> subprocess.call(['hdfs', 'dfs', '-copyFromLocal', 'MyDir/*', '/path/to/hdfs/'], shell=True)

Read this post: Actual meaning of 'shell=True' in subprocess


I would write a function with subprocess that gives you output and error:

import subprocessdef run_cmd(args_list):    """    run linux commands    """    # import subprocess    print('Running system command: {0}'.format(' '.join(args_list)))    proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE)    s_output, s_err = proc.communicate()    s_return =  proc.returncode    return s_return, s_output, s_err

Then:

 import os for file in os.listdir('your-directory'):     run_cmd(['hadoop', 'fs', '-put', 'your-directory/{0}'.format(file), 'target-directory'])

That should loop through all of the files in your directory and put them in your desired HDFS directory


Append everything in the command into a single string and give parameter shell = True

subprocess.call('hdfs dfs -copyFromLocal MyDir/* /path/to/hdfs/', shell = True)