Variable Length Ints

Assume an arbitrary sized buffer of arbitrary data, how should the length be encoded? Is it small (<256)? Prepend with an 8-bit length field! Will it always be below 16,000? A 16-bit field should suffice. What about 4 billion? 32-bits then. What if we start combining transmissions though? We better be safe and use 64-bits (18 billion billion), but now we're using 8x more space per number. Someone also noted the normal case would have a lot of 0-bytes in each size field providing easier cryptanalysis, but you did remember to compress THEN encrypt, right? Deciding how many bytes to use to describe a field is always a tradeoff between flexibility (large numbers) and resource constraints (small representations). What if we made the length field arbitrary size too?

Smart folks have looked at this problem and found many ways to represent Variable-Length quantities, including 645,000 patents. elicarter (via retracile.net) provided a sample implementation and a good decription of the problem including the risk of non-unique representations. Taking a similar tack, we've developed simple C/C++ and python encoder/decoders to tackle the problem.

Python

def read(vStream):
	rval = 0
	for idx in range(len(vStream)):
		rval = (rval * 128) + ord(vStream[idx])
		if not ord(vStream[idx]) & 128: return (idx+1,rval)
		rval = rval - 128 + 1
	return (0,rval)

def write(num):
	rval = [num%128]
	while num >= 128:
		num = (num/128)-1
		last = (num%128)+128
		rval.insert(0,last)
	return ''.join([chr(i) for i in rval])

C/C++

byte *encode_varlint(u64 num, u64 *elen=NULL)
{
	u32 buflen = 1;
	for(u64 tmp=num/128;tmp > 0;tmp/=128) ++buflen;
	byte *data = new byte[buflen];
	if(elen) *elen = buflen;

	for(int i=buflen;i>0;i--)
	{
		data[i-1] = (num % 128)+128;
		num = (num/128)-1;
	}
	data[buflen-1] -= 128;
	return data;
}

u64 decode_varlint(const byte *buf, u64 len, u64 *shift)
{
	u32 i=0;
	u64 retval = 0;
	for(;i<len;i++)
	{
		retval = (retval * 128) + buf[i];
		if(0 == (buf[i] & 128)) break;
		retval = retval - 128 + 1;
	}
	if(shift) *shift = i + 1;
	return retval;
}

- Kelson (kelson@shysecurity.com)