What Is Information?

For the end of the year, something a little different: let’s talk about information. Information is one of those concepts that everyone feels familiar with and few people examine carefully, so take a step back to think about what it is. A book, an email, or even a science blog could each be said to contain information, and in fact to have the primary purpose of conveying information from one person or place to another. The physical properties of an object aren’t classically considered to be information, meaning that the words inside the book are information while the fact that the cover is red leather is a property rather than information. How do we make this distinction? Well, the properties of the book cover have to be separately measured by each observer, but the contents of the book constitute a message, a sequence of symbols or signals that have a known meaning. That meaning constitutes information!

And the question of how information is transmitted from one place to another is deeply relevant for all of human communication. This is even more true with the rise of the Internet and the struggle to understand how to compress information without losing important pieces. Data compression is a problem of information, as is data storage. But even earlier, what constitutes information was being studied as part of the World War II cryptography effort, because of the importance during wartime of sending and receiving messages. It was just after the war that Claude Shannon wrote a paper which effectively founded the research field of information theory, which focused on how to encode a message to pass between two people. Shannon’s important insight was to think of information probabilistically, for example by looking at the set of all word lengths to understand their distribution and thus, the difference between trying to encode a short word and a long word.

Initially word length might seem like a rather mechanical way to classify words, but it turns out to be deeply related to the information carried within the word itself. In English, many of the most common words are short words (see for example the word list for Basic English). But as you’ll find out when you start trying to write in Basic English, the longer words you can’t use often require a lot of short words to explain, which means that long words tend to, on average, carry more information than short words! And if we return to the problem of coded messages, longer messages will carry more information than shorter messages. Put that way, it sounds like common sense, but it’s a key insight into how information works: the more symbols you can use in your message, the greater the information content of that message. Shannon coined the term ‘bit‘ for a unit of information within a message, which may sound familiar as there are eight bits in a Byte, just over a million (220 or 1048576) Bytes in a megabyte, and so on in the computer world.

As for the uncertainty in a word or message, which relates directly to the number of symbols or bits, Shannon decided to call that information-entropy.  A longer message has more potential combinations of symbols, more uncertainty, and is thus, on average, likely to contain more information than a shorter message. Information-entropy directly measures that information potential, just by counting bits. And if you recognize the word entropy, well, information-entropy is indeed related to thermodynamic entropy, which we’ll explore further in the New Year!

Advertisements

9 responses to “What Is Information?

  1. When you get a chance, please talk about the ambiguity of a book’s contents so that different readers get different information from the same book. As to length implying more information, I will take a Robert Frost poem over most long scientific papers any time. 😉
    Beyond Shannon, it seems that there should be a measure of signal to noise in information transfer and also a measure of what implicit information a receiver brings to a message. Are there such measures?

    • Jessamyn Fairfield

      Length implies more information on average. There’s obviously a distribution of information at any length, with some strings being well-crafted and information-dense and others being padded out with fluff. And the question of what the reader can extract is independent of this measure of information; if I send you a message in Navajo, but you don’t understand Navajo, that doesn’t mean the message has no information in it.

      Noise level in information transfer is very relevant in lossy compression, so if you’re interested in learning about how to measure noise I’d start there.

      • Does information content and transfer depend on the existence of an accurate reader of the message? If so, should not the information content of this reader be included in the information content of the message. A series of 1s and 0s is just 1s and 0s unless there is a reader, I think.

      • Jessamyn Fairfield

        That’s the difference between the information theory definition of information and a more colloquial definition. The definition I use above does not depend on the existence of a reader and can be applied to any message. I’ll get more into the limitations and some consequences in the next post.

  2. Pingback: What is Entropy? | letstalkaboutscience

  3. I don’t like being picky (but I do lie), but it’s not actually a million bytes in a megabyte, because everything goes up as powers of two, so there’s 1024 bytes in a kilobyte, and 1024 kilobytes in a megabyte.
    Which makes 1048576 bytes in a megabyte
    just sayin’
    😉

    • Jessamyn Fairfield

      Oh dear, is my physical science background showing? Of course you’re right; thanks for the correction!

  4. Pingback: Topic Index | letstalkaboutscience

  5. Pingback: Ignite: Entropy | letstalkaboutscience

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s