How to append a file to a tar file use python tarfile module? How to append a file to a tar file use python tarfile module? python python

How to append a file to a tar file use python tarfile module?


From tarfile documentation:

Note that 'a:gz' or 'a:bz2' is not possible. If mode is not suitable to open a certain (compressed) file for reading, ReadError is raised. Use mode 'r' to avoid this. If a compression method is not supported, CompressionError is raised.

So I guess you should decompress it using gzip library, add the files using the a: mode in tarfile, and then compress again using gzip.


David Dale asks:

Update. From the documentation, it follows that gz files cannot be open in a mode. If so, what is the best way to add or update files in an existing archive?

Short answer:

  1. decompress / unpack archive
  2. replace / add file(s)
  3. repack / compress archive

I tried to do it in memory using gzip's and tarfile's and file/stream interfaces but did not manage to get it running - the tarball has to be rewritten anyway, since replacing a file is apparently not possible. So it's better to just unpack the whole archive.

Wikipedia on tar, gzip.

The script, if run directly, also tries to generates the test images "a.png, b.png, c.png, new.png" (requiring Pillow) and the initial archive "test.tar.gz" if they don't exist. It then decompresses the archive into a temporary directory, overwrites "a.png" with the contents of "new.png", and packs all files, overwriting the original archive.Here are the individual files:

a.png b.png c.png
new.png

Of course the script's functions can also be run sequentially in interactive mode, in order to have a chance to look at the files. Assuming the script's filename is "t.py":

>>> from t import *>>> make_images()>>> make_archive()>>> replace_file()Workaround

Here we go (the essential part is in replace_file()):

#!python3#coding=utf-8"""Replace a file in a .tar.gz archive via temporary files   https://stackoverflow.com/questions/28361665/how-to-append-a-file-to-a-tar-file-use-python-tarfile-module"""import sys        #import pathlib    # https://docs.python.org/3/library/pathlib.htmlimport tempfile   # https://docs.python.org/3/library/tempfile.htmlimport tarfile    # https://docs.python.org/3/library/tarfile.html#import gzip      # https://docs.python.org/3/library/gzip.htmlgfn = "test.tar.gz"iext = ".png"replace = "a"+iextreplacement = "new"+iextdef make_images():    """Generate 4 test images with Pillow (PIL fork, http://pillow.readthedocs.io/)"""    try:        from PIL import Image, ImageDraw, ImageFont        font = ImageFont.truetype("arial.ttf", 50)        for k,v in {"a":"red", "b":"green", "c":"blue", "new":"orange"}.items():            img = Image.new('RGB', (100, 100), color=v)            d = ImageDraw.Draw(img)            d.text((0, 0), k, fill=(0, 0, 0), font=font)            img.save(k+iext)    except Exception as e:        print(e, file=sys.stderr)        print("Could not create image files", file=sys.stderr)        print("(pip install pillow)", file=sys.stderr)def make_archive():    """Create gzip compressed tar file with the three images"""    try:        t = tarfile.open(gfn, 'w:gz')        for f in 'abc':            t.add(f+iext)        t.close()    except Exception as e:        print(e, file=sys.stderr)        print("Could not create archive", file=sys.stderr)def make_files():    """Generate sample images and archive"""    mi = False    for f in ['a','b','c','new']:        p = pathlib.Path(f+iext)        if not p.is_file():            mi = True    if mi:        make_images()    if not pathlib.Path(gfn).is_file():        make_archive()def add_file_not():    """Might even corrupt the existing file?"""    print("Not possible: tarfile with \"a:gz\" - failing now:", file=sys.stderr)    try:        a = tarfile.open(gfn, 'a:gz')  # not possible!        a.add(replacement, arcname=replace)        a.close()    except Exception as e:        print(e, file=sys.stderr)def replace_file():    """Extract archive to temporary directory, replace file, replace archive """    print("Workaround", file=sys.stderr)    # tempdir    with tempfile.TemporaryDirectory() as td:        # dirname to Path        tdp = pathlib.Path(td)        # extract archive to temporry directory        with tarfile.open(gfn) as r:            r.extractall(td)        # print(list(tdp.iterdir()), file=sys.stderr)        # replace target in temporary directory        (tdp/replace).write_bytes( pathlib.Path(replacement).read_bytes() )        # replace archive, from all files in tempdir        with tarfile.open(gfn, "w:gz") as w:            for f in tdp.iterdir():                w.add(f, arcname=f.name)    #donedef test():    """as the name suggests, this just runs some tests ;-)"""    make_files()    #add_file_not()    replace_file()if __name__ == "__main__":    test()

If you want to add files instead of replacing them, obviously just omit the line that replaces the temporary file, and copy the additional files into the temp directory. Make sure that pathlib.Path.iterdir then also "sees" the new files to be added to the new archive.


I've put this in a somewhat more useful function:

def targz_add(targz=None, src=None, dst=None, replace=False):    """Add <src> file(s) to <targz> file, optionally replacing existing file(s).    Uses temporary directory to modify archive contents.    TODO: complete error handling...    """    import sys, pathlib, tempfile, tarfile    # ensure targz exists    tp = pathlib.Path(targz)    if not tp.is_file():        sys.stderr.write("Target '{}' does not exist!\n".format(tp) )        return 1    # src path(s)    if not src:        sys.stderr.write("No files given.\n")        return 1    # ensure iterable of string(s)    if not isinstance(src, (tuple, list, set)):        src = [src]    # ensure path(s) exist    srcp = []    for s in src:        sp = pathlib.Path(s)        if not sp.is_file():            sys.stderr.write("Source '{}' does not exist.\n".format(sp) )        else:            srcp.append(sp)    if not srcp:        sys.stderr.write("None of the files exist.\n")        return 1    # dst path(s) (filenames in archive)    dstp = []    if not dst:        # default: use filename only        dstp = [sp.name for sp in srcp]    else:        if callable(dst):            # map dst to each Path, ensure results are Path            dstp = [pathlib.Path(c) for c in map(dst, srcp)]        elif not isinstance(dst, (tuple, list, set)):            # ensure iterable of string(s)            dstp = [pathlib.Path(dst).name]        elif isinstance(dst, (tuple, list, set)):            # convert each string to Path            dstp = [pathlib.Path(d) for d in dst]        else:            # TODO directly support iterable of (src,dst) tuples            sys.stderr.write("Please fix me, I cannot handle the destination(s) '{}'\n".format(dst) )            return 1    if not dstp:        sys.stderr.write("None of the files exist.\n")        return 1    # combine src and dst paths    sdp = zip(srcp, dstp) # iterator of tuples    # temporary directory    with tempfile.TemporaryDirectory() as tempdir:        tempdirp = pathlib.Path(tempdir)        # extract original archive to temporry directory        with tarfile.open(tp) as r:            r.extractall(tempdirp)        # copy source(s) to target in temporary directory, optionally replacing it        for s,d in sdp:            dp = tempdirp/d            # TODO extend to allow flag individually            if not dp.is_file or replace:                sys.stderr.write("Writing '{1}' (from '{0}')\n".format(s,d) )                dp.write_bytes( s.read_bytes() )            else:                sys.stderr.write("Skipping '{1}' (from '{0}')\n".format(s,d) )        # replace original archive with new archive from all files in tempdir        with tarfile.open(tp, "w:gz") as w:            for f in tempdirp.iterdir():                w.add(f, arcname=f.name)    return None

And a few "tests" as example:

# targz_add("test.tar.gz", "new.png", "a.png")# targz_add("test.tar.gz", "new.png", "a.png", replace=True)# targz_add("test.tar.gz", ["new.png"], "a.png")# targz_add("test.tar.gz", "new.png", ["a.png"], replace=True)targz_add("test.tar.gz", "new.png", lambda x:str(x).replace("new","a"), replace=True)

shutil also supports archives, but not adding files to one:

https://docs.python.org/3/library/shutil.html#archiving-operations

New in version 3.2.
Changed in version 3.5: Added support for the xztar format.
High-level utilities to create and read compressed and archived files are also provided. They rely on the zipfile and tarfile modules.


Here's adding a file by extracting to memory using io.BytesIO, adding, and compressing:

import ioimport gzipimport tarfilegfn = "test.tar.gz"replace = "a.png"replacement = "new.png"print("reading {}".format(gfn))m = io.BytesIO()with gzip.open(gfn) as g:    m.write(g.read())print("opening tar in memory")m.seek(0)with tarfile.open(fileobj=m, mode="a") as t:    t.list()    print("adding {} as {}".format(replacement, replace))    t.add(replacement, arcname=replace)    t.list()print("writing {}".format(gfn))m.seek(0)with gzip.open(gfn, "wb") as g:    g.write(m.read())

it prints

reading test.tar.gzopening tar in memory?rw-rw-rw- 0/0        877 2018-04-11 07:38:57 a.png ?rw-rw-rw- 0/0        827 2018-04-11 07:38:57 b.png ?rw-rw-rw- 0/0        787 2018-04-11 07:38:57 c.png adding new.png as a.png?rw-rw-rw- 0/0        877 2018-04-11 07:38:57 a.png ?rw-rw-rw- 0/0        827 2018-04-11 07:38:57 b.png ?rw-rw-rw- 0/0        787 2018-04-11 07:38:57 c.png -rw-rw-rw- 0/0       2108 2018-04-11 07:38:57 a.png writing test.tar.gz

Optimizations are welcome!