Fun fact: There is more than one kind of entropy out there.
If you've been through high school chemistry or physics, you might have learned about thermodynamic entropy, which is (roughly speaking) the amount of disorder in a closed system. Alternatively, and a little more precisely, thermodynamic entropy can be defined as the heat in a volume of space equalizing throughout the volume. But that's not the kind of entropy that I'm talking about.
Information theory has its own concept of entropy. One way of explaining information theory is that it's the mathematical study of messages as they travel through a communications system (which you won't need to know anything about for the purposes of this article). In the year 1948.ev Claude Shannon (the father of information theory) wrote a paper called A Mathematical Theory of Communication in which he proposed that the amount of raw information in a message could be thought of as the amount of uncertainty (or perhaps novelty) in a given volume of bits (a message) in a transmission. So, Shannon entropy could be thought of as asking the question "How much meaningful information is present in this message?" Flip a coin and there's only one bit - heads or tails, zero or one. Look at a more complex message and it's not quite so simple. However, let's consider a computational building block, if you will:
One bit has two states, zero or one, or 21 states. Two bits have four possible states: 00, 01, 10, and 11, or 22 possible states. n bits have 2n possible states, which means that they can store up to n bits of information. Now we bring in logarithms, which we can think of in this case as "what number foo would we need in 2foo to represent the number of bits in a message?"