Filter (search and replace) array of bytes in an InputStream Filter (search and replace) array of bytes in an InputStream arrays arrays

Filter (search and replace) array of bytes in an InputStream


Not sure you have chosen the best approach to solve your problem.

That said, I don't like to (and have as policy not to) answer questions with "don't" so here goes...

Have a look at FilterInputStream.

From the documentation:

A FilterInputStream contains some other input stream, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality.


It was a fun exercise to write it up. Here's a complete example for you:

import java.io.*;import java.util.*;class ReplacingInputStream extends FilterInputStream {    LinkedList<Integer> inQueue = new LinkedList<Integer>();    LinkedList<Integer> outQueue = new LinkedList<Integer>();    final byte[] search, replacement;    protected ReplacingInputStream(InputStream in,                                   byte[] search,                                   byte[] replacement) {        super(in);        this.search = search;        this.replacement = replacement;    }    private boolean isMatchFound() {        Iterator<Integer> inIter = inQueue.iterator();        for (int i = 0; i < search.length; i++)            if (!inIter.hasNext() || search[i] != inIter.next())                return false;        return true;    }    private void readAhead() throws IOException {        // Work up some look-ahead.        while (inQueue.size() < search.length) {            int next = super.read();            inQueue.offer(next);            if (next == -1)                break;        }    }    @Override    public int read() throws IOException {            // Next byte already determined.        if (outQueue.isEmpty()) {            readAhead();            if (isMatchFound()) {                for (int i = 0; i < search.length; i++)                    inQueue.remove();                for (byte b : replacement)                    outQueue.offer((int) b);            } else                outQueue.add(inQueue.remove());        }        return outQueue.remove();    }    // TODO: Override the other read methods.}

Example Usage

class Test {    public static void main(String[] args) throws Exception {        byte[] bytes = "hello xyz world.".getBytes("UTF-8");        ByteArrayInputStream bis = new ByteArrayInputStream(bytes);        byte[] search = "xyz".getBytes("UTF-8");        byte[] replacement = "abc".getBytes("UTF-8");        InputStream ris = new ReplacingInputStream(bis, search, replacement);        ByteArrayOutputStream bos = new ByteArrayOutputStream();        int b;        while (-1 != (b = ris.read()))            bos.write(b);        System.out.println(new String(bos.toByteArray()));    }}

Given the bytes for the string "Hello xyz world" it prints:

Hello abc world


I needed something like this as well and decided to roll my own solution instead of using the example above by @aioobe. Have a look at the code. You can pull the library from maven central, or just copy the source code.

This is how you use it. In this case, I'm using a nested instance to replace two patterns two fix dos and mac line endings.

new ReplacingInputStream(new ReplacingInputStream(is, "\n\r", "\n"), "\r", "\n");

Here's the full source code:

/** * Simple FilterInputStream that can replace occurrances of bytes with something else. */public class ReplacingInputStream extends FilterInputStream {    // while matching, this is where the bytes go.    int[] buf=null;    int matchedIndex=0;    int unbufferIndex=0;    int replacedIndex=0;    private final byte[] pattern;    private final byte[] replacement;    private State state=State.NOT_MATCHED;    // simple state machine for keeping track of what we are doing    private enum State {        NOT_MATCHED,        MATCHING,        REPLACING,        UNBUFFER    }    /**     * @param is input     * @return nested replacing stream that replaces \n\r (DOS) and \r (MAC) line endings with UNIX ones "\n".     */    public static InputStream newLineNormalizingInputStream(InputStream is) {        return new ReplacingInputStream(new ReplacingInputStream(is, "\n\r", "\n"), "\r", "\n");    }    /**     * Replace occurances of pattern in the input. Note: input is assumed to be UTF-8 encoded. If not the case use byte[] based pattern and replacement.     * @param in input     * @param pattern pattern to replace.     * @param replacement the replacement or null     */    public ReplacingInputStream(InputStream in, String pattern, String replacement) {        this(in,pattern.getBytes(StandardCharsets.UTF_8), replacement==null ? null : replacement.getBytes(StandardCharsets.UTF_8));    }    /**     * Replace occurances of pattern in the input.     * @param in input     * @param pattern pattern to replace     * @param replacement the replacement or null     */    public ReplacingInputStream(InputStream in, byte[] pattern, byte[] replacement) {        super(in);        Validate.notNull(pattern);        Validate.isTrue(pattern.length>0, "pattern length should be > 0", pattern.length);        this.pattern = pattern;        this.replacement = replacement;        // we will never match more than the pattern length        buf = new int[pattern.length];    }    @Override    public int read(byte[] b, int off, int len) throws IOException {        // copy of parent logic; we need to call our own read() instead of super.read(), which delegates instead of calling our read        if (b == null) {            throw new NullPointerException();        } else if (off < 0 || len < 0 || len > b.length - off) {            throw new IndexOutOfBoundsException();        } else if (len == 0) {            return 0;        }        int c = read();        if (c == -1) {            return -1;        }        b[off] = (byte)c;        int i = 1;        try {            for (; i < len ; i++) {                c = read();                if (c == -1) {                    break;                }                b[off + i] = (byte)c;            }        } catch (IOException ee) {        }        return i;    }    @Override    public int read(byte[] b) throws IOException {        // call our own read        return read(b, 0, b.length);    }    @Override    public int read() throws IOException {        // use a simple state machine to figure out what we are doing        int next;        switch (state) {        case NOT_MATCHED:            // we are not currently matching, replacing, or unbuffering            next=super.read();            if(pattern[0] == next) {                // clear whatever was there                buf=new int[pattern.length]; // clear whatever was there                // make sure we start at 0                matchedIndex=0;                buf[matchedIndex++]=next;                if(pattern.length == 1) {                    // edgecase when the pattern length is 1 we go straight to replacing                    state=State.REPLACING;                    // reset replace counter                    replacedIndex=0;                } else {                    // pattern of length 1                    state=State.MATCHING;                }                // recurse to continue matching                return read();            } else {                return next;            }        case MATCHING:            // the previous bytes matched part of the pattern            next=super.read();            if(pattern[matchedIndex]==next) {                buf[matchedIndex++]=next;                if(matchedIndex==pattern.length) {                    // we've found a full match!                    if(replacement==null || replacement.length==0) {                        // the replacement is empty, go straight to NOT_MATCHED                        state=State.NOT_MATCHED;                        matchedIndex=0;                    } else {                        // start replacing                        state=State.REPLACING;                        replacedIndex=0;                    }                }            } else {                // mismatch -> unbuffer                buf[matchedIndex++]=next;                state=State.UNBUFFER;                unbufferIndex=0;            }            return read();        case REPLACING:            // we've fully matched the pattern and are returning bytes from the replacement            next=replacement[replacedIndex++];            if(replacedIndex==replacement.length) {                state=State.NOT_MATCHED;                replacedIndex=0;            }            return next;        case UNBUFFER:            // we partially matched the pattern before encountering a non matching byte            // we need to serve up the buffered bytes before we go back to NOT_MATCHED            next=buf[unbufferIndex++];            if(unbufferIndex==matchedIndex) {                state=State.NOT_MATCHED;                matchedIndex=0;            }            return next;        default:            throw new IllegalStateException("no such state " + state);        }    }    @Override    public String toString() {        return state.name() + " " + matchedIndex + " " + replacedIndex + " " + unbufferIndex;    }}


The following approach will work but I don't how big the impact is on the performance.

  1. Wrap the InputStream with a InputStreamReader,
  2. wrap the InputStreamReader with a FilterReader that replaces the strings, then
  3. wrap the FilterReader with a ReaderInputStream.

It is crucial to choose the appropriate encoding, otherwise the content of the stream will become corrupted.

If you want to use regular expressions to replace the strings, then you can use Streamflyer, a tool of mine, which is a convenient alternative to FilterReader. You will find an example for byte streams on the webpage of Streamflyer. Hope this helps.