Notes on Data Representation, Signed and Unsigned integers, bytes etc..

Computers store, present, and help us modify many different types of data, including:
  • Numbers
  • Text
  • Audio
  • Images and graphics 
  • Video

Ultimately, all of this data is stored as binary digits. Each document, picture, and sound bite is somehow represented as strings of 1s and 0s.

Analog and Digital Information

Information can be represented in one of two ways: analog or digital. Analog data is a continuous representation, analogous to the actual information it represents. Digital data is a discrete representation, breaking the information up into separate elements.

Binary Representations

One bit can be either 0 or 1. There are no other possibilities. Therefore, one bit can represent only two things. For example, if we wanted to classify a food as being either sweet or sour, we would need only one bit to do it. We could say that if the bit is 0, the food is sweet, and if the bit is 1, the food is sour. But if we want to have additional classifications (such as spicy), one bit is not sufficient.

Representing (Negative) Numbers

I have found some very good information on unsigned and signed integers here and here. To summarise:

Consider an 8 bit signed integer: let us begin with 000000002 and start counting by repeatedly adding 1..

  • When you get to 127, the integer has a value of 011111112; this is easy to see because you know now that a 7 bit integer can contain a value between 0 and 27 - 1, or 127. What happens when we add 1?
  • If the integer were unsigned, the next value would be 100000002, or 128 (27). But since this is a signed integer, 100000002 is a negative value: the sign bit is 1!
  • Since this is the case, we must ask the question: what is the decimal value corresponding to the signed integer 100000002? To answer this question, we must take the 2's complement of that value, by first taking the 1's complement and then adding one.
  • The 1's complement is 011111112, or decimal 127. Since we must now add 1 to that, our conclusion is that the signed integer 100000002 must be equivalent to decimal -128!
  • 100000002 + 000000012 is 100000012.
  • To find the decimal equivalent of 100000012, we again take the 2's complement: the 1's complement is 011111102 and adding 1 we get 011111112 (127) so 100000012 is equivalent to -127.
  • We see then that once we have accepted the fact that 100000002 is decimal -128, counting by adding one works as we would expect.
  • Note that the most negative number which we can store in an 8 bit signed integer is -128, which is - 28 - 1, and that the largest positive signed integer we can store in an 8 bit signed integer is 127, which is 28 - 1 - 1.
  • The number of integers between -128 and + 127 (inclusive) is 256, which is 28; this is the same number of values which an unsigned 8 bit integer can contain (from 0 to 255).
  • Eventually we will count all the way up to 111111112. The 1's complement of this number is obviously 0, so 111111112 must be the decimal equivalent of -1.
  • If a signed integer has n bits, it can contain a number between - 2n - 1 and + (2n - 1 - 1).
  • Since both signed and unsigned integers of n bits in length can represent 2n different values, there is no inherent way to distinguish signed integers from unsigned integers simply by looking at them; the software designer is responsible for using them correctly.
  • No matter what the length, if a signed integer has a binary value of all 1's, it is equal to decimal -1.
  • Odd as this may seem, it is in fact the only consistent way to interpret 2's complement signed integers. Let us continue now to "count" by adding 1 to 100000002:
  • Using our deliberations on 8 bit signed integers as a guide, we come to the following observations about signed integer arithmetic in general:
  • If a signed integer has n bits, it can contain a number between - 2n - 1 and + (2n - 1 - 1).
  • Since both signed and unsigned integers of n bits in length can represent 2n different values, there is no inherent way to distinguish signed integers from unsigned integers simply by looking at them; the software designer is responsible for using them correctly.
  • No matter what the length, if a signed integer has a binary value of all 1's, it is equal to decimal -1.

Also I will copy a table from Wikipedia:
Keep in mind through all of this that whether a given binary pattern represents a signed or an unsigned value depends on how you choose to use it. The signed nature of a value lies in how you treat the value, and not in the nature of the underlying bit pattern that represents the value.

For example, does the binary number 10101111 represent a signed value or an unsigned value? The question is meaningless without context: if you need to treat the value as a signed value, you treat the high-order bit as the sign bit, and the value is -81. If you need to treat the value as an unsigned value, you treat the high bit as just another digit in a binary number, and the value is 175.

Now that we are more familiar with signed and unsigned representations, lets connect this info to Java.

From java docs:
byte: The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive).
int: By default, the int data type is a 32-bit signed two's complement integer, which has a minimum value of -231 and a maximum value of 231-1.

So a sample code:
public static void main(String[] args) throws Exception {                      
    File file = new File("/Users/koraytugay/Desktop/test.byt");                
    FileOutputStream fileOutputStream = new FileOutputStream(file);            
    DataOutputStream dataOutputStream = new DataOutputStream(fileOutputStream);
 
    byte b = -1;                                                               
    dataOutputStream.writeByte(b);                                             
    dataOutputStream.flush();                                                  
    dataOutputStream.close();                                                  
 
    FileInputStream fileInputStream = new FileInputStream(file);               
    int read = fileInputStream.read();                                         
    System.out.println(b);                                                     
    System.out.println(read);                                                  
    System.out.println((byte) read);                                           
    fileInputStream.close();                                                   
}                                
The output will be:
-1
255
-1

And when you open the file with an application such as Hex Viewer Pro you will see the value: FF which is the equivalent of 255 in decimal representation.

So what happened here? We have written the value 11111111 to a file. That is one byte, or eight bits of information. When we read this information as an integer in Java, we see the value 255. Why? Because an integer is a 32 bit value.. But when we cast it to a byte, since Java does not offer any unsigned bytes, it will be represented as -1. But the byte still has the value of 11111111 and nothing else. So an application that treats 11111111 as an unsigned byte will assume it is (+)128.

A good answer by Alexei is provided here. He says:
Bytes are not signed or unsigned by themselves. They are interpreted as such when some operations are applied (say, compare to 0 to determine sign). The operations can be signed and unsigned, and in Java, only signed byte operations are available. So the code you cited is useless for the question - it sends bytes but does not do any operation. Better show the code which receives bytes. One correct and handy method is to use java.io.InputStream.read() method which returns byte as an integer in the range 0...255. 
This is also taken from a good answer provided by Adamski here:
The fact that primitives are signed in Java is irrelevant to how they're represented in memory / transit - A byte is merely 8 bits and whether you interpret that as a signed range or not is up to you. There is no magic flag to say "this is signed" or "this is unsigned".
I would also suggest you taking a look at this question. My favorite answer is:
for (int i=0; i <= 255; i++) {
    byte b = (byte) i;    // cast int values 0 to 255 to corresponding byte values
    int neg = b;     // neg will take on values 0..127, -128, -127, ..., -1
    int pos = (int) (b & 0xFF);  // pos will take on values 0..255
}
The conversion of a byte that contains a value bigger than 127 (i.e,. values 0x80 through 0xFF) to an int results in sign extension of the high-order bit of the byte value (i.e., bit 0x80). To remove the 'extra' one bits, use x & 0xFF; this forces bits higher than 0x80 (i.e., bits 0x100, 0x200, 0x400, ...) to zero but leaves the lower 8 bits as is.