Tag Archives: information theory

What Is Information?

For the end of the year, something a little different: let’s talk about information. Information is one of those concepts that everyone feels familiar with and few people examine carefully, so take a step back to think about what it is. A book, an email, or even a science blog could each be said to contain information, and in fact to have the primary purpose of conveying information from one person or place to another. The physical properties of an object aren’t classically considered to be information, meaning that the words inside the book are information while the fact that the cover is red leather is a property rather than information. How do we make this distinction? Well, the properties of the book cover have to be separately measured by each observer, but the contents of the book constitute a message, a sequence of symbols or signals that have a known meaning. That meaning constitutes information!

And the question of how information is transmitted from one place to another is deeply relevant for all of human communication. This is even more true with the rise of the Internet and the struggle to understand how to compress information without losing important pieces. Data compression is a problem of information, as is data storage. But even earlier, what constitutes information was being studied as part of the World War II cryptography effort, because of the importance during wartime of sending and receiving messages. It was just after the war that Claude Shannon wrote a paper which effectively founded the research field of information theory, which focused on how to encode a message to pass between two people. Shannon’s important insight was to think of information probabilistically, for example by looking at the set of all word lengths to understand their distribution and thus, the difference between trying to encode a short word and a long word.

Initially word length might seem like a rather mechanical way to classify words, but it turns out to be deeply related to the information carried within the word itself. In English, many of the most common words are short words (see for example the word list for Basic English). But as you’ll find out when you start trying to write in Basic English, the longer words you can’t use often require a lot of short words to explain, which means that long words tend to, on average, carry more information than short words! And if we return to the problem of coded messages, longer messages will carry more information than shorter messages. Put that way, it sounds like common sense, but it’s a key insight into how information works: the more symbols you can use in your message, the greater the information content of that message. Shannon coined the term ‘bit‘ for a unit of information within a message, which may sound familiar as there are eight bits in a Byte, just over a million (220 or 1048576) Bytes in a megabyte, and so on in the computer world.

As for the uncertainty in a word or message, which relates directly to the number of symbols or bits, Shannon decided to call that information-entropy.  A longer message has more potential combinations of symbols, more uncertainty, and is thus, on average, likely to contain more information than a shorter message. Information-entropy directly measures that information potential, just by counting bits. And if you recognize the word entropy, well, information-entropy is indeed related to thermodynamic entropy, which we’ll explore further in the New Year!