Binary Numbers.html


<!-- saved from url=(0056)http://train.usaco.org/usacotext2?a=zWju1un286A&S=binary -->
<html><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1252"><title>Binary Numbers
</title>
</head><body bgcolor="#f0f0f0">
<font face="Verdana,Tahoma,sans-serif,Arial,Lucida Sans,Gill Sans">
<div style="width:45em;background-color:white;border-style:solid;border-width:1px;padding:1em;">
<table cellspacing="8">
   <tbody><tr><td><img src="./Binary Numbers_files/cowhead2.gif"></td>
       <td>&nbsp;&nbsp;&nbsp;</td>
       <td><b><font size="5">
	<font face="Verdana,Tahoma,sans-serif,Arial,Lucida Sans,Gill Sans">
	Binary Numbers
</font></font></b></td>
   </tr>
</tbody></table>

<h4>Representing Binary Numbers</h4>
<p>
Computers operate on 1's and 0's; these are called 'bits'.  A byte
is a group of 8 bits, like this example:  00110101.  A computer
word on a 32-bit computer ('int') is 4 bytes, 32 bits:
10011010110101011010001010101011.  Other computers have different
word sizes; over time, 64-bit integers will become more common.

</p><p>
As you can see, 32 ones and zeroes is a bit cumbersome to write
down (or even to read).  Thus, people conventionally break these
large numbers of digits down into groups of 3 or 4 bits:

</p><pre>  1001.1010.1101.0101.1010.0010.1010.1011    &lt;-- four bit groups
  10.011.010.110.101.011.010.001.010.101.011 &lt;-- three bit groups
                             (note that the count of 3 starts on the right)
</pre>
<p>
These grouped sets of bits are then mapped onto digits, either four
bits per hexadecimal (base 16) digit or three bits per octal (base
8) digit. Obviously, hexadecimal needs some new digits (since decimal
digits only go 0..9 and 6 more are needed). These days, the letters
'A'..'F' are used for the 'digits' that represent 10..15.  Here's
the map; the correspondence is obvious:
   
</p><pre>    OCTAL:                   HEXADECIMAL:
  000 -&gt; 0  100 -&gt; 4       0000 -&gt; 0  0100 -&gt; 4  1000 -&gt; 8  1100 -&gt; C
  001 -&gt; 1  101 -&gt; 5       0001 -&gt; 1  0101 -&gt; 5  1001 -&gt; 9  1101 -&gt; D
  010 -&gt; 2  110 -&gt; 6       0010 -&gt; 2  0110 -&gt; 6  1010 -&gt; A  1110 -&gt; E
  011 -&gt; 3  111 -&gt; 7       0011 -&gt; 3  0111 -&gt; 7  1011 -&gt; B  1111 -&gt; F
(both upper and lower case A-F are used across different computers and
operating systems).
</pre>
<p>
The hex and octal representations of those integers above are easy
to translate from the binary counterparts; add 0x to the front for
a C-style-language hexadecimal number that the compiler will accept:
</p><pre>	1001.1010.1101.0101.1010.0010.1010.1011
    -&gt;     9    A    D    5    A    2    A    B  --&gt; 0x9AD5A2AB
				(that's 0x in front of the hex number)
</pre>
and
<pre>	10.011.010.110.101.011.010.001.010.101.011 
         2   3   2   6   5   3   2   1   2   5   3 -&gt; 023265321253
				(that's a numeric '0' in front)
</pre>
<p>
Octal is easier to write down quickly, but hexadecimal has the nice
properties of breaking easily into bytes (which are pairs of
hexadecimal digits).
</p><p>
Some aids for remembering the correspondence of hexadecimal (often
called 'hex') digits and their decimal digit counterpart:
</p><ul>
<li>hex 0-9 are the same as decimal digits 0-9
</li><li>A is the first one past 9 and is easy to remember as 10
</li><li>F is the last one and thus easy to remember as 15
</li><li>C is decimal 12 (the only one that you sort of have to memorize)
</li><li>All the rest are close to A, C, or F (B is just A+1, D is C+1,
E is F-1)
</li></ul>
<p>
If someone mentions the "third bit" of a number, its best to find
out if they mean third bit from the left or from the right and
whether they start counting from 0 or 1. A miscommunication can
result in real problems laster on (i.e., reversed strings of bits!).
<a href="http://www.codecademy.com/forum_questions/50a7a1dcab02dc5683001280">
This page</a>, for example, starts counting with the rightmost bit being
number 1. <a href="http://www.allinterview.com/showanswers/102265.html">This
page</a> is a gem in that answer 5 counts from the right starting at
1, while answer 8 counts from the right starting at 0.
</p><p>
Almost all modern computers use the left-most bit as the 'sign bit'
which signifies a negative integer if its value is 1. Note that
identifying the location of the sign bit requires knowledge of
precisely how many bits are in a given data type. This number can
change over time (i.e., when you recompile on a different computer).
Generally, the sizeof() operator will tell you the number of
<b>bytes</b> (which are generally 8 bits) its argument contains,
e.g., sizeof(int) or sizeof (100) will yield 4 on a 32-bit machine.
Sometimes one can identify a system-based include file that contains
the proper constants if a program must depend on a certain word
(integer) length. 

</p><h4>Operating on Binary Numbers in Programs</h4>
<p>
Sometimes it is handy to work with the bits stored in numbers rather
than just treating them as integers.  Examples of such times include
remembering choices (each bit slot can be a 'yes'/'no' indicator),
keeping track of option flags (same idea, really, each bit slot is
a 'yes'/'no' indicator for a flag's presence), or keeping track of
a number of small integers (e.g., successive pairs of bit slots can
remember numbers from 0..3).  Of course, occasionally programming tasks
actually contain 'bit strings'.

</p><p>
In C/C++ and others, assigning a binary number is easy if you know
its octal or hexadecimal representation:
</p><pre>	i = 0x9AD5A2AB;			/* hexadecimal: 0x */
</pre>
or
<pre>	i = 023265321253;		/* octal: start with 0 */
</pre>
<p>
More often, a pair of single-bit valued integers is combined to
create an integer of interest.  One might think the statement below
would do that:
</p><pre>	i = 0x10000 + 0x100;
</pre>

and it will -- until the sign bit enters the picture or the same
bit is combined twice:
<pre>	i = 0x100 + 0x100;
</pre>
<p>
In that case, a 'carry' occurs (0x100 + 0x100 = 0x200 which is
probably not the result you want) and then i contains 0x200 instead
of 0x100 as probably desired.  The 'or' operation -- denoted as '|'
in C/C++ and others -- does the right thing.  It combines corresponding
bits in its two operands using these four rules:
</p><pre>	0 | 0 -&gt; 0
        0 | 1 -&gt; 1
        1 | 0 -&gt; 1
        1 | 1 -&gt; 1
</pre>
The '|' operation is called 'bitwise or' in C so as not to be
confused with it's cousin '||' called 'logical or' or 'orif'.  The
'||' operator evaluates the arithmetic value of its left side operand
and, if that value is false (exactly 0), it evaluates its right
side operand.  The 'orif' operator is different: if either operand
is nonzero, then '||' evaluates to true (exactly 1 in C).
<p>
It is "1 | 1 = 1" that distinguishes the '|' operator from '+'.
Sometimes operators like this are displayed as a 'truth table':
</p><pre>                right operand
operator | | 0  1
        ---+------
   left  0 | 0  1    &lt;-- results
operand  1 | 1  1    &lt;-- results
</pre>
<p>
It's easy to see that the 'bitwise or' operation is a way to set
bits inside an integer. A '1' results with either or both of the
input bits are '1'.
</p><p>
The easy way to query bits is using the 'logical and' (also known
as 'bitwise and') operator which is denoted as '&amp;' and has this
truth table:
</p><pre>         &amp; | 0  1
        ---+------
         0 | 0  0
         1 | 0  1
</pre>
Do not confuse the single '&amp;' operator with its partner named 'andif'
with two ampersands ('&amp;&amp;'). The 'andif' operator will evaluate its
left side and yield 0 if the left side is false, without evaluating
its right side operand. Only if the left side is true will the right
side be evaluated, and the result of the operator is the logical 'and'
of their truth values (just as above) and is evaluated to either the
integer 0 or the integer 1 (vs. '&amp;' which would yield 4 when
evaluating binary 100100 &amp; 000111).
<p>
A '1' results only when *both* input bits are '1'. So, if a program
wishes to know if the 0x100 bit is '1' in an integer, the if statement
is simple:
</p><pre>	if (a &amp; 0x100) { printf("yes, 0x100 is on\n"); }
</pre>
C/C++ (and others) contain additional operators, including 'exclusive
or' (denoted '^') with this truth table:
<pre>         ^ | 0  1
        ---+------
         0 | 0  1
         1 | 1  0
</pre>
<p>
The 'exclusive or' operator is sometimes called 'xor', for easy of
typing.  Xor yields a '1' either exactly *one* of its inputs is
one: either one or the other, but not both.  This operator is very
handy for 'toggling' (flipping) bits, changing them from '1' to '0'
or vice-versa.  Consider this statement:
</p><pre>	a = a ^ 0x100;   /* same as a ^= 0x100; */
</pre>
The 0x100 bit will be changed from 0-&gt;1 or 1-&gt;0, depending on its
current value.
<p>
Switching off a bit requires two operators. The new one is the unary
operator that toggles every bit in a word, creating what is called
the 'bitwise complement' or just 'complement' of a word. Sometimes
this is called 'bit inversion' or just 'inversion' and is denoted
by the tilde: '~'.  Here's a quick example:

</p><pre>	char a, b;	/* eight bits, not 32 or 64 */
	a = 0x4A;	/* binary 0100.1010 */
	b = ~a;		/* flip every bit: 1011.0101 */
	printf("b is 0x%X\n", b);
</pre>

which yields:

<pre>	b is 0xB5
</pre>
<p>
Thus, if we have a single bit switched on (e.g., 0x100) then ~0x100
has all but one bit switched on: 0xFFFFFEFF (note the 'E' as the
third 'digit' from the right) (this example shows a 32-bit value;
a 64-bit value would have a lot more F's on the front).

</p><p>
These two operators combine to create a scheme for switching off
bits:
</p><pre>	a = a &amp; (~0x100);	/* switch off the 0x100 bit */
				/* same as a &amp;= ~0x100;
</pre>
since all but one bit in ~0x100 is on, all the bits except the 0x100
bits appear in the result.  Since the 0x100 bit is 'off' in ~0x100,
that bit is guaranteed to be '0' in the result. This operation is
universally called 'masking' as in 'mask off the 0x100 bit'.
<h4>Summary</h4>
<p>
In summary, these operators enable setting, clearing, toggling, and
testing any bit or combination of bits in an integer:
</p><pre>		a |= 0x20;		/* turn on bit 0x20 */
		a &amp;= ~0x20;		/* turn off bit 0x20 */
		a ^= 0x20;		/* toggle bit 0x20 */
		if (a &amp; 0x20) {
			/* then the 0x20 bit is on */
		}
<h4>Shifting</h4>
<p>
Moving bits to the left or right is called 'shift'ing. Consider a
five-bit binary number like 00110. If that number is shifted one
bit left, it becomes: 01100. On the other hand, if 00110 is shift
one bit to the right, it becomes 000011. Mathematically inclined
users will realize that shifting to the left one bit is the same
as multiplying by 2 while shifting to the right one bit is usually
the same as an integer divide by 2 (i.e., one discards any remainder).
Why usually?  Shifting -1 right by one yields an unusual result (i.e,,
0xFFFF.FFFF &gt;&gt; 1 == 0xFFFF.FFFF, no change at all).
</p><p>
Generally, one can specify a shift by more than one bit: Shifting
000001 to the left by three bits yields 001000. The shift operators
are:
</p><pre>	a &lt;&lt; n        /* shift a left n bits
	a &gt;&gt; n        /* shift a right n bits
</pre>
One generally shifts integers (instead of floating-point numbers),
although nothing prevents shifting floating point numbers in some
languages. The results are generally very difficult to interpret as a
floating point number, of course.
<p>
When shifting to the left, 0's are inserted in the lower end.
<b>When shifting to the right, the high order bit is duplicated
and inserted for the newly-needed bit</b> (thus preserving the number's sign).
This means that (-1)&gt;&gt;1 yields -1 instead of 0!
</p><p>
Another type of shift, unavailable natively in most programming
languages is the 'circular' shift, where bits shifted off one end
are inserted at the other end instead of the default 0.  Modern
uses of this operation are rare but could appear occasionally.  Some
machines (e.g., the x86 architecture) have assembly-language
instructions for this, but the prudent C or Java programming will
spend a few more machine cycles (billionths of a second) to execute
both a shift and then a bit-extract-shift-or combination to move
the bit themselves.
</p><p>
A note on optimization: some zealous optimizing coders like to change
a/4 to '(a&gt;&gt;2)' in order to save time ("since multiplies can be
slow"). Modern compilers know all about this and perform such
substitutions automatically, thus leaving the better programmer to
write a/4 when that is what's meant.
</p><h4>Very Advanced Bit Manipulation</h4>
<p>
<b>Skip this section if this is your first time dealing with bits.
Read it at your leisure in the future after you've written some bit
manipulation code. Really.</b>
</p><p>
It turns out that the way 2's complement machines represent integers
and the way they implement subtraction (the standard on virtually
all modern machines) yields some very interesting possibilities for
bit manipulation.
</p><p>
Two's complement machines represent positive integers as the binary
representation of that integer with a 0 in the sign bit. A negative
integer is represented as the complement of the positive integer
(including turning on the sign bit) plus 1. Thus, the absolute value
of the most negative representable integer is one more than the
most positive representable integer. Thus:
</p><pre>   x    +x in 8-bit binary    -x in 8-bit binary
   0        0000 0000              0000 0000
   1        0000 0001              1111 1111
   2        0000 0010              1111 1110
   3        0000 0011              1111 1101
  64        0100 0000              1100 0000
  65        0100 0001              1011 1111              
 126        0111 1110              1000 0010 
 127        0111 1111              1000 0001 
 128    [no representation]        1000 0000
</pre>
<p>
Given this representation, addition can proceed just as on pencil
and paper, discard any extra high order bits that exceed the length
of the representation. Subtracting b from a (a-b) proceeds as add
a and the quantity (-b).
</p><p>
This means that certain bit operations can exploit these definitions
of negation and subtraction to yield interesting results, the proofs
of which are left to the reader (see <a href="http://realtimecollisiondetection.net/blog/?p=78">a table of
these</a> offsite):
</p><pre>             Binary
Value        Sample             Meaning
  x         00101100        the original x value
x &amp; -x      00000100        extract lowest bit set
x | -x      11111100        create mask for lowest-set-bit &amp; bits to its left
x ^ -x      11111000        create mask bits to left of lowest bit set
x &amp; (x-1)   00101000        strip off lowest bit set
                            --&gt; useful to process words in O(bits set)
                                instead of O(nbits in a word)
x | (x-1)   00101111        fill in all bits below lowest bit set
x ^ (x-1)   00000111        create mask for lowest-set-bit &amp; bits to its right
~x &amp; (x-1)  00000011        create mask for bits to right of lowest bit set
x | (x+1)   00101101        toggle lowest zero bit
x / (x&amp;-x)  00001011        shift number right so lowest set bit is at bit 0
</pre>
There's no reason to memorize these expressions, but rather remember
what's possible to refer back to this page for saving time when you
processing bits.
</pre></div><br>
<center>
<a href="http://train.usaco.org/usacogate?a=zWju1un286A">USACO Gateway</a> | <a href="mailto:rob.kolstad@gmail.com">Comment or Question</a>
</center>

</font></body></html>