|
Post by abacus9900 on Feb 2, 2011 23:15:25 GMT 1
Ah, ok, I had that slightly wrong. The log is an expression of the power to which the base has to be raised to equal some quantity. Let it sink in overnight, ok, naymissus? And thanks for your time!
|
|
|
Post by speakertoanimals on Feb 3, 2011 3:44:22 GMT 1
To those STILL doubting my opening statement about the information content of a repeated string from a random source: homepages.cwi.nl/~paulv/papers/info.pdfA few notable quotes: That is, the source, NOT the characteristics of any ONE message. This is talking about Kolmogorov complexity, an alternative to shannon information, which tries to address one of the problems with shannon -- the ensemble of all possible messages that is used, rather than the single message being sent. Hence the confusing thing about shannon that I pointed out above -- the string HHHHH from a fair coin contains 5 bits of information, whilst HHHHH from a double-headed coin contains none -- not because the message itself is different, but because the ensemble of all possible messages is different in each case.
|
|
|
Post by Progenitor A on Feb 3, 2011 9:23:48 GMT 1
Ah, ok, I had that slightly wrong. The log is an expression of the power to which the base has to be raised to equal some quantity. Let it sink in overnight, ok, naymissus? And thanks for your time! No problem My pleasure
|
|
|
Post by abacus9900 on Feb 3, 2011 13:48:10 GMT 1
I can't really follow the logic here.
|
|
|
Post by speakertoanimals on Feb 3, 2011 14:03:38 GMT 1
In my continued quote-mining quest. From the American Journal of Physics,
Entropy, information, and computation
J. Machta
My emphasis -- you can't compute the shannon for a single message.
IF you do, what you are in effect doing is using the message itself as an estimate of the unknown probabilities of the source that produced the message. Hence Naymissus' mistaken belief that a string of zeros (which would give an estimate of P(one) = 0) has zero information content. In general (unles you really did have the totally trivial casev where a source produced only zeros!), your estimate of P(one) based on a string of zeros will be wrong, hence so is your estimate of the information content of the message itself. Plus you can compute what the likely error on your probability estimate is.
TO see this consider the following -- I have a fair coin, and I record 2 tosses. FOr a fair coin, I will get HH, HT, TH and TT with equal probability.
Hence I have a one in 4 chance of getting HH (or 00), and concluding the coin is a double-header, based on NMs reasoning.
How to improve? Easy -- take a longer sample! Suppose I have a fair coin. If I take N tosses, then there is only a (1/2)^N chance that I get ALL heads, and hence mistakenly assume that tails never occur. For other possible sequences, I get at least SOME tails, so that although I may get P(tail) wrong, I at least say it is non-zero. ANd if P(tail) is non-zero, SO IS the information content of a string of all heads!
This is why NM has got it wrong in this simple random case -- IF I had a long enough string of all heads from a coin, then it becomes more and more unlikely that the absence of any tails is due to a random fair coin, and more and more likely that instead I have an unfair coin. Which is why, if I only had a very long string of zeros, I would end up assuming I had a double-header.
Except in NMs case, we KNOW we don't have that, because at some point, we get SOMETHING else in our message! Hence the swapping-coins example I introduced in another message.
THe point about SHannon and the optimuj codeword length is that it gives the optimum coding for the whole ensemble of messages from a particular source, where you keep your coding and decoding scheme FIXED. So, than it reduces to the simple rule that the more likely a message, the shorter the codeword length you assign to it. Whereas rare events can have long codewords.
The problem with sequences of tosses from a fair coin is that ANY sequence of heads and tails is equally likely, hence codeword length for ANY such sequence (even HHHHHHH......HHHHHHHHH) is the same as the codeword length for any other. In short, N tosses of a fair coin contain N bits of information, and that is it, even if the N tosses I am sending at the moment are ALL heads.
|
|
|
Post by speakertoanimals on Feb 3, 2011 14:28:59 GMT 1
Lets go back to our repeated sequence from a fair coin. If I have received a sequence of 10 million heads, what INFORMATION did I receive from the 10 millionth head? NM would say NONE.
Exceptv we DID -- since we have a fair coin, the information we received was that the toss was not a tail, but it COULD HAVE BEEN.
Whereas with a double-header, we receive no information ever, because it could only have ever been a head.
Which is the thing about Shannon information that some find unacceptable, and that some get totally WRONG (as NM did) -- that the information content refers to the SOURCE, not any one particular message. Hence HHHHHHH on its own can have ANY amount of information, from 0 to 7 bits (assuming just 2 letters, H and T), depending on the source that produced it, and what the other messages that could have been produced but weren't are.
So, for a fair coin, HH contains 2 bits of information, not because there isn't a PATTERN (all heads!), but because it happens to be as likely as any other 2 toss results (TT, HT, or TH). For a biased coin (low probability of tails), the SAME sequence HH contains LESS information, because it is MORE likely in this case than in the fair case.
What NM keeps getting wrong -- the probability here is the probability of the whole message compared to the probabilities of all other messages of the same length. It is NOT the probability of zeros and ones in this PARTICULAR message, as I explained before.
Which some find unsatisfactory -- they want HHHHHHHHH (which is highly patterned) to have low information content, whereas things like HTTHHTHHTT to have higher information content. Which is where Kolmogorov complexity comes in -- it asks instead, what is the shortest way of producing this one PARTICULAR message, rather than worrying about the source that happened to produce it this time. Which is a different beastie to Shannon information!
Shannon wanted to know how to produce thes best coding for ALL POSSIBLE messages that you might want to send, not just how to send one particular one that happens to be highly -patterned (or not). That is where and why the ensemble of all possible messages comes in -- he wanted the most efficient scheme for messages of a known type, NOT best scheme for the message you want to send today, which then might be useless for the one you want to send tomorrow.
|
|
|
Post by Progenitor A on Feb 3, 2011 14:48:43 GMT 1
I can't really follow the logic here. Sorry, not making things clear: 4 = 2 21/4 = 1/2 2now 1/x can be written x -1 similarly we can rewrite x -1 as 1/x Taking this further 1/x n= x- nSo we can write 1/4=1/2 2 = 2 -2so going to our logs log 2 (1/2) = log 2 (1/2 1) = log 2 (2 -1) Now, log 2 a y=y log 2 a so log 2 (2 -1) is of the form log 2 a yWhere a=2, y =-1 so log 2 (2 -1) = -1 x log 2 2 = -1 x 1 =-1 For any other fraction above (indeed for any fraction) log 2 1/2 n=log 2 2 -nso log 2 1/64 = log 2 1/2 6= log 2 2 -6= -6 x log 2 2 = -6 Hope that makes things clearer Finding log2 of any number using your desktop calcualtorfinding log2 x where x is any number simply go to your desktop calculator Use the calculator in scientific mode For example what is log2 70? enter 2 then press the function x yWe will start at the bottom enter 2 agiain the function you have entered is 2 2 =4 So we know the log 24=2 But we want to find log 2 70 Clear all entries Type in 2 press x ytype in 6 we have entered the function 2 6 =64 So the logarithm of 64 = 6 Very close to log 2 70 try a 6 plus a fraction enter2 press x yenter 6.5 2 6.5= 90.5.... so 6.5 is log 2 90.5 Try a smaller fraction Type 2 type x ytype 6.2 26.2 =73.51 So we are getting close Continue until you nearly get there!
|
|
|
Post by abacus9900 on Feb 3, 2011 15:24:11 GMT 1
Brilliant naymissus! Thank you for taking the time to spell things out. Did you say you were an engineer or professor?
I just hope that STA is taking note of all this, because this is the correct way to teach!
Anyway, to continue our study of Shannons' ideas....
|
|
|
Post by speakertoanimals on Feb 3, 2011 16:59:08 GMT 1
So, you think correct teaching is by imparting incorrect information?
Plus I wasn't trying to teach YOU information theory, especially since you had a bit of a wibble at logs -- but NM.
Perhaps NM could learn from you how to learn -- you first have to admit what you don't know -- NM seems incapable of this, so keeps presenting the same mangled version of information theory.
|
|
|
Post by abacus9900 on Feb 3, 2011 17:53:31 GMT 1
How do I know what you are saying is correct?
|
|
|
Post by speakertoanimals on Feb 3, 2011 18:00:28 GMT 1
How do you know what NM is saying is correct?
What sounds sensible at first sight might not be the case. What sounds intuitive might not be the case either.
You could try following up references, when they are given, to see if what the poster says agrees with what you find online. That doesn't make it right though, just means if it is a misapprehension or a mistake, it's a common one!
The references might be beyond your level of understanding at the moment.
All of which means -- you are pretty much stumped if you are relying on the internet and message boards for an education on an obscure, or disputed, or generally misunderstood topic.
Why should you doubt me more than NM, apart from the fact you don't seem to like me very much? Is personal antipathy a good basis for judging factual accuracy?
|
|
|
Post by abacus9900 on Feb 3, 2011 19:06:45 GMT 1
It's not really a question of personal antipathy, STA, we all have our own personalities and I suppose we have to respect each other's idiosyncratic oddities. No, the thing with you STA is that while you appear to know what you are talking about you cannot seem to put it in a way that meets people half way. We know effort is required for learning any new material but you have to remember that some people here have not been in formal education for some years so that what may be obvious to you isn't necessarily so for us. Also, trying to convey too much material too quickly can be counterproductive because it will be much harder to digest than adopting a more 'piecemeal' approach and probably lead to confusion. That way, it is more likely you will retain people's interest and whet their curiosity for further information. When kids, or even college students, learn subjects in the classroom they learn in stages and are required to master the more basic ideas before moving on to more advanced ones.
|
|
|
Post by speakertoanimals on Feb 3, 2011 21:12:16 GMT 1
Except this is a discussion board, not a formal learning environment.
I DO try to pitch material at the approriate level, IF people are polite. Except in some cases, ANYONE can ask a seemingly simple question, yet be unable to understand the actual answer without a lot more work. I can't help that, I give what I hope is a correct answer.
So, to give an example -- anyone could ask 'what's quantum gravity about then', but unless they first understand:
quantum theory relativity why quantum is different from classical
then they aren't going to get much out of any answer. I can't help that, I just have to assume that someone who asks about quantum gravity may have some vague inkling about what 'quantum' means and what 'gravity' means, just not the two together. Else every answer would have to be -- physics from GCSE to graduate level (condensed)...................
Anyway, the point as regards NM is that I don't expect anyone else to much get what the point is -- I just don't want people getting the mistaken idea that what he is actually saying is totally correct (since he does seem able to do logs at least).
After all, there MAY be more reading this than people who actually comment, and we don't know what level they are at.
|
|
|
Post by abacus9900 on Feb 3, 2011 22:18:22 GMT 1
I thought you were a teacher STA. Perhaps I was wrong.
|
|
|
Post by Progenitor A on Feb 3, 2011 23:04:07 GMT 1
Now Abacus, where were we? Just got in -UKIP meeting
|
|