|
Post by abacus9900 on Feb 4, 2011 0:12:13 GMT 1
Well, we've dealt with logs so now we can go on to examine Shannon's mathematical ideas regarding the probabilities inherent in information transmission.
UKIP? Colourful.
|
|
|
Post by speakertoanimals on Feb 4, 2011 2:44:28 GMT 1
The basic idea is fairly straightforward. Suppose you and the receiver have agreed a codebook beforehand. What you want to send is one of some agreed set of messages. Suppose the possible messages MEAN:
tea, coffee, orange juice, cereal, or nothing.
It's your breakfast order! The point is, you want to send an order everyday, but you usually pick tea only 1/8 of the time, orange juice 3/8 of the time, cereal 1/4, and nothing 1/4.
Suppose I now ask what you want, but only using yes/no questions.
If the questions were:
do you want tea? do you want coffee? cereal?
That's 3 questions. If the answer to the first is yes, I don't have to ask any more -- I can just send 1 for yes. So tea is encoded as just 1. And that happens 1/8 of the time.
If coffee, the answers are no then yes, hence 01, 3/8 of the time.
Cereal is no, no, yes -- 001 1/4 of the time.
And nothing takes no no no, 000, 1/4 of the time.
So we can work out the average length message I have to send:
1/8 + 2x3/8 + 3x1/4 + 3x1/4 = 19/8 = 2 3/8 bits on average.
Except I can do better than that!
1) Is it one of coffee or tea? 2) If yes to that, do you want coffee, or if no, do you want cereal?
More complicated questions, but now I can do it with just the answers to these 2 questions.
So, for tea, I answer yes no. For coffee, yes, yes.
For cereal, no, yes, and for nothing no, no.
Hence I only ever have to send 2 answers -- 10, 11, 01, or 00. Hence on average 2 bits.
What Shannon did was work out the shortest possible average messge length (how many answers I had to send to order), based on the probabilities of each choice. Hamming worked out how to calculate the best possible code.
Shannon said that the way to structure the questions depends on the probabilities of each choice. So, if I had tea 99% of the time, the best first question would just be -- do you want tea, because 99% of the time I would say, yes, and just have to send 1. Only 1% of the time would I have to go onto further questions, and send a longer string.
The EXACT way that best-possible codeword length depends on the probability of each choice is given by the log -- minus log p to be exact. So more probable choices get SHORTER codewords, with the longer codewords reserved for choices I don't make very often.
|
|
|
Post by Progenitor A on Feb 4, 2011 8:47:32 GMT 1
Well, we've dealt with logs so now we can go on to examine Shannon's mathematical ideas regarding the probabilities inherent in information transmission. UKIP? Colourful. Hi Abacus Do you wish to continue with the thread we were following or divert toward the interjected explanation? I do not mind. There is simply no point in trying to follow 2 separate explanations simultaneously,especially when one explanation will arrive at the conclusion that for information transmission there is as much information in a 20GB string of zeros as there is in a 20GB string of encoded Encyclopaedia Britannica. (Perhaps you can see some difficulty in the 'simplified' interjected explanation in comprehending exactly what information is to be sent [some might say impossibility- why on earth would you send a daily breakfast order specifying coffee 1/8 of the time for example?] not to mention the the errors ? At least that indicates it is not copied from a book)
|
|
|
Post by abacus9900 on Feb 4, 2011 11:41:03 GMT 1
I think I get what STA is saying. It's kind of like playing a game of twenty questions in the most efficient way possible based on the probabilities of the answers. It also reminds me of a way a doctor would ask questions of a patient in order to reach a dignosis. The doctor would know the most probable answers based on experience. That right, STA?
Don't you agree naymissus?
|
|
|
Post by Progenitor A on Feb 4, 2011 13:10:26 GMT 1
I think I get what STA is saying. It's kind of like playing a game of twenty questions in the most efficient way possible based on the probabilities of the answers. It also reminds me of a way a doctor would ask questions of a patient in order to reach a dignosis. The doctor would know the most probable answers based on experience. That right, STA? Don't you agree naymissus? The principle is fine The questions to be asked in the example do not make sense though and the number of bits is not minimised and the result has, of course no practical use, and Hamming codes add redundancy, (do not minimise the number of bits but add more). She is probably thinking of the Huffman code which guarantees to encode a string of bits within the entropy (plus one bit) of the string , but only generally for very long strings of digits. As you can see for this example, the minimum number of bits required is 2, but that was self-evident as there were 4 states and 2 2=4. In theory we could transmit the info with 1 3/8 bits but there is no such thing as 3/8 bit - which is why the example has no practical utility. In fact to get the full minimisation in practice we would need a substantial string length
|
|
|
Post by speakertoanimals on Feb 4, 2011 13:56:17 GMT 1
I think I get what STA is saying. It's kind of like playing a game of twenty questions in the most efficient way possible based on the probabilities of the answers. It also reminds me of a way a doctor would ask questions of a patient in order to reach a dignosis. The doctor would know the most probable answers based on experience. That right, STA? That's EXACTLY right! You and the questioner know before you start WHAT the questions are, and have checked that they cover all possibilities in this case. You have agreed how the answers to the questions will be sent -- that is what your codebook is, in effect -- 001 means cereal, or whatever it was.
|
|
|
Post by speakertoanimals on Feb 4, 2011 14:06:17 GMT 1
The questions to be asked in the example do not make sense though Rubbish! they may sound long and complicated, BUT there is a unique sequence of answers for each choice, which is all that is required. [/quote]and the number of bits is not minimised and the result has, of course no practical use, and Hamming codes add redundancy, (do not minimise the number of bits but add more). She is probably thinking of the Huffman code which guarantees to encode a string of bits within the entropy (plus one bit) of the string , but only generally for very long strings of digits. As you can see for this example, the minimum number of bits required is 2, but that was self-evident as there were 4 states and 2 2=4. In theory we could transmit the info with 1 3/8 bits but there is no such thing as 3/8 bit - which is why the example has no practical utility. In fact to get the full minimisation in practice we would need a substantial string length[/quote] A strawman. ANY mathematical expression used in information theory has to deal with the case that maths used is continuous, whilst answers in the real world (such as the length of a codeword), have to be an integer. Hence your supposed objection is NONSENSE. Well, it was nearly 3am, I misremembered Huffman as Hamming -- both begin with H, lucky I didn't think of Top Gear and type Hammond..................... Point is, the example IS useful, because it helps with: how do we go from -- I want tea -- to a string of binary. THe fact that codewords can be related to probability, and how we decide on a set of unambiguous codewords. The twenty-questions scenario, is, of course, just a friendly version of a binary tree. So, I think just sour grapes on your part............ Except that is precisely what the twenty-questions case IS telling you! 000 is as valid and as informative a set of answers as 001, BOTH enable the receiver to send the right breakfast! Just add more choices and more questions, and you inevitably end up with a long string of zeros that contains information. come on NM, if it was so EASY to PROVE it was nonsense, you would have done so. Except you can't, and instead just resort to a favourite tactic of creationists -- argument from incredulity. It SOUNDS daft that a string of zeros can carry information, hence I'll use that. Because you are still unable to admit that your specific background has led you to a mistaken interpretation of the basics of information theory, as my many posts have demonstrated.
|
|
|
Post by Progenitor A on Feb 4, 2011 15:02:03 GMT 1
Well Abausus, the start of this thread was that absurd proposition that a 20GB string of 0's contains as much information as a 20GB string of encoded Encyclopaedia Britannica.
The only way to show that this is false is to look at the concept of entropy, which we started to do, and an essential part of that was manipulation of log2
We were going along that path
Now you have STA giving her take on it. It is no good mixing the two. The only way to proceed is to build the picture step by step.
It is fruitles trying to consider 2 opposing views at once when learning a subject. You must decide to go with STA's explanation or or mine. I do not mind. Come to me after STA has finished if you wish
I am not interested in getting into squabbles with STA. I can build an irrefutable picture in small understandable steps. That is the art getting across complex concepts
However you must choose.
I really do not mind
|
|
|
Post by abacus9900 on Feb 4, 2011 15:17:03 GMT 1
Please continue, naymissus. I'm sorry if I appeared to 'go off' - was simply trying to appear open to all ideas.
|
|
|
Post by Progenitor A on Feb 4, 2011 15:33:49 GMT 1
Please continue, naymissus. I'm sorry if I appeared to 'go off' - was simply trying to appear open to all ideas. No problem, I can understand that. Unfortunately I am off now - an early meal and then a concert in town We will come back together tomorrow and consider how the entropy of a source is related to the amount of information that is transmitted over a channel of capacity C, and how we can guarantee (through Huffman coding) that a source of a certain H can send its information over a channel C with the minimum amount of information bits seing sent We will see how the amount of information transferred dpends upon the changes in a string
|
|
|
Post by abacus9900 on Feb 4, 2011 15:55:30 GMT 1
|
|
|
Post by speakertoanimals on Feb 4, 2011 16:45:28 GMT 1
I suspect he's gone off to look at his books again, although today he DOES seem to be correctly talking about entropy OF THE SOURCE, rather than in previous posts where he was using entropy of a single message.
As I keep saying, the entropy of the SOURCE involves how often the various events occur (like tosses of a coin). The entropy of a message involves how often particular symbols (like H or T) occur in that particular message, which NEED NOT be the same frequency as they occur at the source, although they can be used as an estimate of it.
I think NM is wrong as well pushing logs -- you don't NEED to know what a logarithm is exactly to get this, just simple things such as the more frequent an event, the shorter the codeword (or the fewer questions in the 20 questions set-up). You don't NEED to know that the exact relation is a log................
Which I can directly contradict with my breakfast scenario. ONe breakfast involved all no answers (000 in binary). That certainly contained information, enough to get me the correct breakfast! If I'd had a wider menu, more questions and longer strings -- BUT I would still have strings of no answers to all questions, and I'd still get the right breakfast.........
|
|
|
Post by abacus9900 on Feb 4, 2011 17:14:36 GMT 1
Ok, STA, I get the general principle of what you are saying but it might be helpful if you could provide an actual 'real-world' example of the processes used.
|
|
|
Post by Progenitor A on Feb 5, 2011 14:43:50 GMT 1
Well Abacus, I was going to look at the entropy of a source and apply it to some examples, but STA has come up with the basis of Huffman coding, so I will stick with her example (even though it is not at all clear what message she wishes to send to the kitchen!)
Huffman coding followed upon Shannons statement that a source of entropy H can be accommdated by a channel C given that HN<=C where N is the number of digits in the channel
So the race started to get an encoding system that would suit that purpose, and Huffman came up with the neart -perfect coding system that guarantees to meet that Shannons requirement
All this sounds a bit abstruse and no need to take it in but investigating Huffman coding tells us lots of lovely things
Now, although she did not apparaently realise it, STA was using the basis of Huffman coding to predict a minimum number of bits for each encoded word- the words being Coffee, Ceral, tea, and nothing. It is self-evident that 2bits will suffice a there are 4 states and 22=4, but is that minimum number of information bits that must be transmitted per word?
She gave probabilities for including these words in a unspecified message that includes Coffee (probability) 3/8, Cereal 1/4, Nothing 1/4, Tea 1/8
Now from our discussions of old we know that the more probable (or predictable) an event, the less information is needed to convey that information.(Weather tomorrow, India, same as today)
So in the context of this messaging, coffee needs less information to transmit than Nothing (is required for breakfast).
Indeed the Huffman coding for Coffee is 1 and for Nothing is 001
So the very probabilities about the words STA used is already telling us something about the information content necessary to transmit them, and Huffman simply makes a code for each probability
This is not really surprising as Huffman was using Shannons mathematics to derive his code.
Because Shannon tells us that if we know the probabilities of a set of words (or symbol set, or letters) then we can calculate a brand new quantity which he called H, the entropy.
And Entropy, H simply quantifies the information content necessary to be transmitted to send a word or symbol set or letter.
With the probabilities in this example
H=1.905625 bits per symbol
(Note that this is very close to the common-sense conclusion that we need 2 bits per word to send 4 words, as 22=4.
The reason that Shannon's result is smaller is because some words are sent more often than others and hence we have unequal probabilities of words; if all the words had equal probabilities (1/4), the we require 2 bits per word on average)
In other words the average number of bits we need to transmit per word to get any of these words with these probabilities across a communication link is 1.9 bits per word. (Some words require more bits others less)
Now Huffman coding tells us nearly exactly the same thing and works out a coding that achieves near that average number of bits per word that Shannons entropy tells us is necessary BUT ONLY IF THE CODE WORDS ARE LONG. In our case as the code words have a maximum length of 3 bits, then the Huffman approximation is not close to the Shannon prediction If we work out the Huffman coding dor these for words then it tells us that the average number of bits per word is much higher than the Shannon average, but as we make code words longer and longer Huffman approaches Shannon on the information we neeed to transmit to convet a word or a selection of words.
We are in quite interesting territory here
However Huffman coding does not always give us the most efficient coding for inform,ation transfer as we shall see.
enough for today I think. Som epractical examples of Huffman coding tomorrow
The essentil point is, the less probable an event, the more information is needed to transmit it, and the transmited information is the number of changes in a transmitted signal
|
|
|
Post by abacus9900 on Feb 5, 2011 18:44:03 GMT 1
|
|