Calculating entropy with Python.

29 September 2020

Fun fact: There is more than one kind of entropy out there.

If you've been through high school chemistry or physics, you might have learned about thermodynamic entropy, which is (roughly speaking) the amount of disorder in a closed system.  Alternatively, and a little more precisely, thermodynamic entropy can be defined as the heat in a volume of space equalizing throughout the volume.  But that's not the kind of entropy that I'm talking about.

Information theory has its own concept of entropy.  One way of explaining information theory is that it's the mathematical study of messages as they travel through a communications system (which you won't need to know anything about for the purposes of this article).  In the year 1948.ev Claude Shannon (the father of information theory) wrote a paper called A Mathematical Theory of Communication in which he proposed that the amount of raw information in a message could be thought of as the amount of uncertainty (or perhaps novelty) in a given volume of bits (a message) in a transmission.  So, Shannon entropy could be thought of as asking the question "How much meaningful information is present in this message?"  Flip a coin and there's only one bit - heads or tails, zero or one.  Look at a more complex message and it's not quite so simple.  However, let's consider a computational building block, if you will:

One bit has two states, zero or one, or 21 states.  Two bits have four possible states: 00, 01, 10, and 11, or 22 possible states.  n bits have 2n possible states, which means that they can store up to n bits of information.  Now we bring in logarithms, which we can think of in this case as "what number foo would we need in 2foo to represent the number of bits in a message?"

That said, Claude Shannon came up with a nifty bit of math to calculate the entropy in a message:

(Image credit: Wikipedia, probably CC-BY-SA 3.0 Unported.)

It sorta-kinda makes sense if you know math, sorta-kinda doesn't if math isn't your strong suit (math isn't mine, please don't feel badly).  Let's break the equation down to figure out what it means.

H(X) is the amount of entropy in the message.  To be honest, I don't know why you'd throw a - sign in front of the equation because it doesn't make sense to worry about the amount of negative entropy in a message (I'll probably catch some flak for saying that but I'm not afraid to say when I don't undertstand something, that's part of learning).  P(xi) is the probability mass function, or the probability that a discrete random variable (like a single bit) has a particular value.  log is the logarithm to the base 2 (so that included image should probably say log2, but we'll worry about that in a couple of paragraphs).  The Σ sign means "See the equation after the capital sigma?  Solve that equation once for every value of i between the value below the sigma, until you hit the value of i above the sigma.  Then add all of those answers together to get the final result."

If you've messed around with programming you've probably noticed that I tried to break the explanation down in such a way that you could write some code in the language of your choice to implement this math more easily by implying loops and function calls.  When I was thinking about this a while back I got it into my head to not use any special-purpose libraries to do it, because it seemed.. a bit excessive, to be honest.  Looking at Shannon's equation, it seemed to me that you shouldn't need to install a couple of score of megabytes of NumPy or SciPy libraries to accomplish this.  I posted about this on the Fediverse and heard back from @pra about a page at Rosetta Code that explains the equation in a much more clear fashion.  The thing that really helped is that it replaces the abstract probability mass function P(xi) with something that makes a little more sense, counti/N, or "the number of times some character appears in the message divided by the number of characters in the message."  That gives us enough information to implement the calculation of entropy.  I used as my sample message the sentence "Now is the time for all good men to come to the aid of their country." because that was the first touch typing drill sentence I learned from my mom.

Here's how I did it in Python:

import math

string = "Now is the time for all good men to come to the aid of their country."
entropy_in_bits = 0

letters_in_string = "".join(set(string))

for i in letters_in_string:
    x = string.count(i)
    y = x = (x / len(string))
    x = math.log2(x)
    x = x * y
    entropy_in_bits = entropy_in_bits + x

The math module is the basic Python mathematics library, which implements all of the math functions in the ANSI C standard.  If you like, this is the bare minimum math library you need for a programming language.  string is my sample sentence and entropy_in_bits is just what it sounds like, a count of the number of bits of entropy in string.  The statement letters_in_string = "".join(set(string)) means, "First, build a list of every character in string the first time it appears, called a set.  (Yes, I linked to the documentation for an old version of Python because I felt it explains what a set is more concisely.)  Then turn that list into a string containing all of the unique characters in the value string."

Now, for every character in letters_in_string, count the number of times it appears in the sample text in string.  Divide the number of times the character appears in the text and divide that by the number of characters in the entire text.  Store it into two variables, x and y.  Take the logarithm to the base 2 of x and store it back into x.  Then multiply the log2 of x by the value in y.  Add that value to the running total in entropy_in_bits.  Move on to the next character.

The amount of Shannon entropy (in bits) I got with the above code and sample text was 3.839559153700935.  To sanity check myself I plugged the sample text into GCHQ's online copy of CyberChef at Github and also got the answer 3.839559153700935.  This double-check strongly implies that the above code I wrote works the same way and does the same thing as the entropy calculation module in CyberChef, written by people a lot smarter than I am.

There you have it.  I sincerely hope that I was able to explain Shannon entropy in a way that made sense (and if not, let me know through the usual channels and I'll do my best to fix that) and I hope that I got a few people curious about what this means.  Maybe the Wikipedia page will make more sense to you (it does to me after this writing this hamster).  If nothing else, it was a fun exercise to take something big and scary and rewrite it in a form that's tiny, compact, and easy to dissect and understand.

Go forth and make shit happen.