# Ideas about Generating Untraceable Invoice IDs

I don't like the idea of using time. You can run into all sorts of issues - time differences, several events happening in a single second and so on.

If you want something sequential and not easily traceable, how about generating a random number between 1 and whatever you wish (for example 100) for each new Id. Each new Id will be the previous Id + the random number.

You can also add a constant to your IDs to make them look more impressive. For example you can add 44323 to all your IDs and turn IDs 15, 23 and 27 into 44338, 44346 and 44350.

There are two problems in your question. One is solvable, one isn't (with the constraints you give).

## Solvable: Unguessable numbers

The first one is quite simple: It should be hard for a customer to guess a valid invoice number (or the next valid invoice number), when the customer has access to a set of valid invoice numbers.

You can solve this with your constraint:

Split your invoice number in two parts:

- A 20 bit prefix, taken from a sequence of increasing numbers (e.g. the natural numbers 0,1,2,...)
- A 10 bit suffix that is randomly generated

With these scheme, there are a bout 1 million valid invoice numbers. You can precalculate them and store them in the database. When presented with a invoice number, check if it is in your database. When it isn't, it's not valid.

Use a SQL sequence for handing out numbers. When issuing a new (i.e. unused) invoice number, increment the seuqnce and issue the n-th number from the precalculated list (order by value).

## Not solvable: Guessing the number of customers

When you want to prevent a customer having a number of valid invoice numbers from guessing how much invoice numbers you have issued yet (and there for how much customers you have): This is not possible.

You have hare a variant form the so called "German tank problem". I nthe second world war, the allies used serial numbers printed on the gear box of german tanks to guestimate, how much tanks Germany had produced. This worked, because the serial number was increasing without gaps.

But even when you increase the numbers with gaps, the solution for the German tank problem still works. It is quite easy:

- You use the method described here to guess the highest issued invoice number
- You guess the mean difference between two successive invoice numbers and divide the number through this value
- You can use linear regression to get a stable delta value (if it exists).

Now you have a good guess about the order of magnitude of the number of invoices (200, 15000, half an million, etc.).

This works as long there (theoretically) exists a mean value for two successive invoice numbers. This is usually the case, even when using a random number generator, because most random number generators are designed to have such a mean value.

There is a counter measure: You have to make sure that there exists no mean value for the gap of two successive numbers. A random number generator with this property can be constructed very easy.

Example:

- Start with the last invoice number plus one as current number
- Multiply the current number with a random number >=2. This is your new current number.
- Get a random bit: If the bit is 0, the result is your current number. Otherwise go back to step 2.

While this will work in theory, you will very soon run out of 32 bit integer numbers.

I don't think there is a practical solution for this problem. Either the gap between two successive number has a mean value (with little variance) and you can guess the amount of issued numbers easily. Or you will run out of 32 bit numbers very quickly.

## Snakeoil (non working solutions)

Don't use any time based solution. The timestamp is usually easy guessable (probably an approximately correct timestamp will be printed somewhere on invoice). Using timestamps usually makes it easier for the attacker, not harder.

Don't use insecure random numbers. Most random number generators are not cryptographically safe. They usually have mathematical properties that are good for statistics but bad for your security (e.g. a predicable distribution, a stable mean value, etc.)

One solution may involve Exclusive OR (XOR) binary bitmaps. The result function is **reversible**, **may generate non-sequential numbers** (if the first bit of the least significant byte is set to 1), and is extremely easy to implement. And, as long as you use a reliable sequence generator (your database, for example,) there is no need for thread safety concerns.

According to MSDN, 'the result [of a exclusive-OR operation] is true if and only if exactly one of its operands is true.' reverse logic says that equal operands will always result false.

As an example, I just generated a 32-bit sequence on Random.org. This is it:

`11010101111000100101101100111101`

This binary number translates to **3588381501** in decimal, **0xD5E25B3D** in hex. Let's call it your *base key*.

Now, lets generate some values using the *([base key] XOR [ID])* formula. In C#, that's what your encryption function would look like:

` public static long FlipMask(long baseKey, long ID) { return baseKey ^ ID; }`

The following list contains some generated content. Its columns are as follows:

- ID
- Binary representation of ID
- Binary value after XOR operation
Final, 'encrypted' decimal value

`0 | 000 | 11010101111000100101101100111101 | 35883815011 | 001 | 11010101111000100101101100111100 | 35883815002 | 010 | 11010101111000100101101100111111 | 35883815033 | 011 | 11010101111000100101101100111110 | 35883815024 | 100 | 11010101111000100101101100111001 | 3588381497`

In order to reverse the generated key and determine the original value, you only need to do the same XOR operation using the same base key. Let's say we want to obtain the original value of the second row:

` 11010101111000100101101100111101 XOR 11010101111000100101101100111100 = 00000000000000000000000000000001`

Which was indeed your original value.

Now, Stefan made very good points, and the first topic is crucial.

In order to cover his concerns, you may reserve the last, say, 8 bytes to be purely random garbage (which I believe is called a nonce), which you generate when encrypting the original ID and ignore when reversing it. That would heavily increase your security at the expense of a generous slice of all the possible positive integer numbers with 32 bits (16,777,216 instead of 4,294,967,296, or 1/256 of it.)

A class to do that would look like this:

`public static class int32crypto{ // C# follows ECMA 334v4, so Integer Literals have only two possible forms - // decimal and hexadecimal. // Original key: 0b11010101111000100101101100111101 public static long baseKey = 0xD5E25B3D; public static long encrypt(long value) { // First we will extract from our baseKey the bits we'll actually use. // We do this with an AND mask, indicating the bits to extract. // Remember, we'll ignore the first 8. So the mask must look like this: // Significance mask: 0b00000000111111111111111111111111 long _sigMask = 0x00FFFFFF; // sigKey is our baseKey with only the indicated bits still true. long _sigKey = _sigMask & baseKey; // nonce generation. First security issue, since Random() // is time-based on its first iteration. But that's OK for the sake // of explanation, and safe for most circunstances. // The bits it will occupy are the first eight, like this: // OriginalNonce: 0b000000000000000000000000NNNNNNNN long _tempNonce = new Random().Next(255); // We now shift them to the last byte, like this: // finalNonce: 0bNNNNNNNN000000000000000000000000 _tempNonce = _tempNonce << 0x18; // And now we mix both Nonce and sigKey, 'poisoning' the original // key, like this: long _finalKey = _tempNonce | _sigKey; // Phew! Now we apply the final key to the value, and return // the encrypted value. return _finalKey ^ value; } public static long decrypt(long value) { // This is easier than encrypting. We will just ignore the bits // we know are used by our nonce. long _sigMask = 0x00FFFFFF; long _sigKey = _sigMask & baseKey; // We will do the same to the informed value: long _trueValue = _sigMask & value; // Now we decode and return the value: return _sigKey ^ _trueValue; }}`