How to move multiple files at once from local-server to HDFS inside python?
add shell=True
:
>>> subprocess.call(['hdfs', 'dfs', '-copyFromLocal', 'MyDir/*', '/path/to/hdfs/'], shell=True)
Read this post: Actual meaning of 'shell=True' in subprocess
I would write a function with subprocess that gives you output and error:
import subprocessdef run_cmd(args_list): """ run linux commands """ # import subprocess print('Running system command: {0}'.format(' '.join(args_list))) proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE) s_output, s_err = proc.communicate() s_return = proc.returncode return s_return, s_output, s_err
Then:
import os for file in os.listdir('your-directory'): run_cmd(['hadoop', 'fs', '-put', 'your-directory/{0}'.format(file), 'target-directory'])
That should loop through all of the files in your directory and put them in your desired HDFS directory
Append everything in the command into a single string and give parameter shell = True
subprocess.call('hdfs dfs -copyFromLocal MyDir/* /path/to/hdfs/', shell = True)