|
Post by Progenitor A on Feb 1, 2011 20:26:58 GMT 1
The trouble is STA is no help in providing a basic framework onto which ordinary people can add to in order to develop their understanding. Again, you cannot expect people to begin learning any topic if you are always going to start at an advanced level. Why does she keep doing this, it really does not serve much purpose. She is simpy regurgitating half-understood texts that she cannot possibly make others understand because she does not understand herself We have seen that so many times
|
|
|
Post by abacus9900 on Feb 1, 2011 20:35:13 GMT 1
You may be right because if someone has no proper understanding of a given topic they will be unable to put it over in their own words in a way a beginner would understand. If you do possess a thorough understanding of any topic it should be fairly easy to use analogies and language in a way that does not confuse too much. I hate to appear to seem to be bullying STA, I'm not - at least she tries to contribute - but unless she alters the way she 'communicates' most of her efforts will be a waste of energy.
|
|
|
Post by speakertoanimals on Feb 1, 2011 20:59:32 GMT 1
Except the point of this discussion ISN'T providing a basic framework -- Naymissus may claim he is doing that, but he is getting it WRONG.
ANd I note yet again his utter refusal (or perhaps inability?) to actually engage in a discussion of the actual detailed points. Instead, he just keeps repeating his particular, engineering-biased usuage of entropy to refer to the entropy of a MESSAGE, rather than to the entropy of the source.
He's misunderstood, plain as a pikestaff as far as I'm concerned. I can now see WHY he has misunderstood, I can find other academics who have noted a similar misunderstanding/difference of usage as regards entropy. Yet still he refuses to engage, just blindly maintaining that HE knows, he has taught it for years, and that I'm a moron.
Yeah, well, people can go look for themselves, reads the first page of Shannons paper, and judge for themselves. The intuitive (but incorrect) versus the mathematically precise (but non-intuitive) -- I don't need a crystal-ball to predict which one some posters will plump for. They'll be wrong, but I know that there will be no way I can convince them, no with NM leading the ingloroius way. The inept leading the deluded.........................
I quoted Shannon to you -- you ignored it!
I have looked very closely at what I think is the case, I have looked very closely at why you probably think you are right, I have tried to show my reasoning. You have failed to provide any similar level of analysis. Plus (what REALLY riles me!), you KEEP IGNORING the quote from page 1 of the Shannon paper! Plus the quotes where he introduces the entropy. You just ASSUME you know what he is talking about because you know how it is applied, yet you are WRONG. And you refuse to learn, you refuse to even attempt to discuss the question, you refuse to try to see it from my point of view -- you just keep name-calling and trying to spread your misunderstanding to others -- which really, REALLY makes me mad!
Seems you can show an engineer the water, but you can't take him there, or even persuade him to at least sample the water -- because he already knows all that he will ever need to know.
A blatant refusal to go back over the basics of what you think you already know -- I learnt something from this, shame no one else seems to have learnt anything.
Same ole nonsense. ANd you couldn't bully me if you tried.....................Delusions of grandeur methinks............
|
|
|
Post by abacus9900 on Feb 1, 2011 22:15:01 GMT 1
It's your job to teach - so teach!
|
|
|
Post by Progenitor A on Feb 1, 2011 22:53:02 GMT 1
Shannon, ‘information Theory’
“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities.
These semantic aspects of communication are irrelevant to the engineering problem.
The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.”
Professor Pierce Caltech Shannon show that ….If a message source produces one among n symbols with probabilities which are independent of previous choices, the entropy of the source is defined…….
H =- nSigmai=1 pi log pi bits per symbol
When n=2, as in a binary source
H=-(po log po + p1 log p1) bits per symbol
So if the probability of po =1, that is the symbol ‘0’ has a probability of ‘1’ Then the probability of P1 = 1–po, that is the probability of the symbol ‘1’ is 0
Therefore for a string of ‘0’s
000000000000000000 for example
H=-(po log po + p1 log p1) bits per symbol H= -(1 x 0 + 0 x log p1)bits per symbol H= (0) The entropy for a string of '0's from the source is 0
The number of bits required to transmit this string of ‘0’s = H x number of bits in the message source = 0 ….....now suppose that we have a communication channel which is capable of transmitting 10,000 bits per sec. If the channel is used to transmit a repetitive string of message bits, (such as all ‘0’s), then we must say that the rate of transmission of information is 0 bits per second J Pierce,Professor of Engineering , Caltech
|
|
|
Post by speakertoanimals on Feb 2, 2011 2:08:10 GMT 1
You can QUOTE, but you don't understand!
Suppose I have two DIFFERENT sources, one that produces random 0's and 1's, another that produces only zeros.
The first has entropy 2(-1/2 log 1/2) bits per symbol, which is 1 bit per symbol (no shit sherlock!).
The second has zero bits per symbol.
BUT a message string such as 00000000 CAN be produced by either source. In the first case, it DOES contain information -- the information it contains in that case is that it is 00000000, rather than any one of the other 8 bit messags that COULD be produced.
Hence the probability when computing shannon imformation content of this message is p = (1/2)^8, with -log p = 8 bits of information. Which is in accord with the entropy result, which says this source has an average of 1 bit per symbol, hence 8 bits for this message of all zeros.
In the second case, the SAME message occurs with probability 1 (DIFFERENT ensemble, with only one member), hence the information content of the same message from the second source is now 0 bits.
Which is what I have been saying all along -- IF we assume a random source, the string 00000000 contains just as much information ACCORDING TO SHANNON as any other string of 8 bits -- 8 bits.
Whereas it is only in the second case, where we KNOW that prob 1 =0, that 00000000 contains NO information.
The confusion arises when you use the message itself (we assume we have just one) to ESTIMATE the symbol probabilities for the source. Then if we had a million zeros, we would estimate that no ones ever occur, hence zero information. But that is not the same as saying that 00000........ from the first source contains zero information -- it still contains N bits because of the nature of the source, NOT the nature of the individual message.
You have to be careful to distinguish between the entropy of the source, and the entropy of the source ESTIMATED from the entropy of the message. Which you didn't. The entropy of the message, since it involves replacing the actual probabilities for the various symbols by their probabilities estimated from the message, is only ever an estimate of the source entropy, ence is only ever an estimate of the information content of the message.
So, if your sample message (which happened to be all zeros, say), was too short -- just 00 say, then it is a totally LOUSY estimate of prob 0, and just stating that prob 1 is zero from this is nonsense, as is stating that it contains no information. Same goes for longer strings of zeros, although the longer the string of repeated zeros gets, the better your estimate that prob 1 is zero becomes.
Which I have all said before. But now expecting Naymissus to just repeatedly quote:
which as I have said is NOT the same as the entropy of the source, not the same as saying a string of repeated zeros has zero information content................etc etc etc.
|
|
|
Post by Progenitor A on Feb 2, 2011 9:19:37 GMT 1
So, if your sample message (which happened to be all zeros, say), was too short -- just 00 say, then it is a totally LOUSY estimate of prob 0, and just stating that prob 1 is zero from this is nonsense, as is stating that it contains no information. Same goes for longer strings of zeros, although the longer the string of repeated zeros gets, the better your estimate that prob 1 is zero becomes. Which I have all said before. But now expecting Naymissus to just repeatedly quote: which as I have said is NOT the same as the entropy of the source, not the same as saying a string of repeated zeros has zero information content................etc etc etc. ;D radio4scienceboards.proboards.com/index.cgi?action=gotopost&board=gensci&thread=490&post=6673
|
|
|
Post by speakertoanimals on Feb 2, 2011 14:07:25 GMT 1
And Naymissus descends into meaningless quote mode. No attempt to actually address the points I raised -- at least I didn't get yet another claim that whatever I write is meaningless, incomprehensible nonsense................
|
|
|
Post by abacus9900 on Feb 2, 2011 16:12:33 GMT 1
naymissus, according to one source I have found on the net the more probable a set of bits is the less information you have to provide about them on the channel. Conversely, the less probable a set of bits is the more information you have to encode for them. How does this work? They give, as an analogy, the weather. In India the chances of tomorrow being a sunny day are very high, so you are not providing much info by saying it is going to be sunny. On the other hand, if you say it is going to rain tomorrow, which is an unlikely event, you are providing a lot of information.
How does this work in information theory?
|
|
|
Post by abacus9900 on Feb 2, 2011 16:21:14 GMT 1
STA, you simply provide far too much information for people to be able to digest in one go. You should give out key concepts first and then elaborate a bit on them in subsequent posts and then perhaps take things a bit further. You are creating an information overload in people which simply has the effect of making them shut down. It is impossible to turn people into 'instant experts.' Education does not work that way because attempting to overwhelm people with too much material too quickly makes them think the subject is just too difficult.
|
|
|
Post by Progenitor A on Feb 2, 2011 17:06:39 GMT 1
naymissus, according to one source I have found on the net the more probable a set of bits is the less information you have to provide about them on the channel. Conversely, the less probable a set of bits is the more information you have to encode for them. How does this work? They give, as an analogy, the weather. In India the chances of tomorrow being a sunny day are very high, so you are not providing much info by saying it is going to be sunny. On the other hand, if you say it is going to rain tomorrow, which is an unlikely event, you are providing a lot of information. How does this work in information theory? Well yes that accords with Shannon's idea of information entropy. In general information entropy is concerned with information 'degenerating' into a 'sameness' where no changes occur and everywhere we look we see much the same information. Shannon says (not as simply as I am saying - he was speaking to his peers in mathematical language)) that if information is unchanging, remains much the same, then there is little information content, and the ultimate degeneration of entropy is when things are exactly the same and then there is no information content at all. If information remains exactly the same then the probability of it being the same is 1. Thus in a message of 8 bits 00000000 the probability of 0 is 1(and the probability of 1 is zero) similarly 11111111 the probability of 1 is 1 (and the probability of 0 is zero) The entropy is 0 - there is no information contentThis final entropy is very boring. There is no information! If information has a probability of 1 then you can bet you last pound that you are not going to get any new information! Just as with the 'information' that tomorrow will be a sunny day in India -TELL ME SOMETHING NEW (for Gawds sake)-GIVE ME SOME INFORMATION! But in information entropy when things are changing, then we have excitement - we cannot predict what is coming next - there is lots of NEW information coming our way! The probability of a particular event happening is low - the bookies would not give you high odds on a particular event occuring because we just do not know what is going to happen next. Thus in the message 10110001 there are many changes that have occurred - there is lots of information The probability of 1 occuring is 1/2 The probability of 0 occuring is half The entropy is 1 - we have lots of information there! Entropy is a maesure of the information content of a message Entropy = 0, no information Entropy = 1 there is maximum information Entropy is derived from probabilities If the probability of new information (a change) is low, p=1, entropy is minimum If the probabilty of new information is high p= 1/8 9say), then entropy goes high And as we have seen, it takes far less bits to send a nessage that has little information than it does to send a message that has a lot of information Thus, with control bits the 8-bit message 0 0 0 0 0 0 0 0 1 1Can be sent with just one control bit (where 01 means 'all 0's) 0 1Whreas to send the 8-bit mesage 10101010 XXwe would probably send 10101010 XXThis picture is made clearer, perhaps when we look at a message and calculate Shannon's entropy for it We can do that if you wish
|
|
|
Post by speakertoanimals on Feb 2, 2011 20:43:40 GMT 1
Unfortunately, Naymissus gets it wrong AGAIN, with his wittering on about change..................
Let's take a simple source -- a tossed coin, heads and tails with equal probability.
If I write down a sequence of coin tosses, say 10 -- EACH sequence of coin tosses contains the SAME information, whether the sequence of 10 is HHHHHHHHHH or HTTHHTHTTT.
Naymissus gets it wrong mixing up change in a sequence with predictability. For the coin toss case with a fair coin, EVERY toss is unpredictable, given previous tosses -- thinking otherwise is the gamblers fallacy.
So, if I have a run of 10 heads with a fair coin, that still tells me NOTHING about the next toss, heads or tails. Hence every toss is always totally unpredictable, and every sequence of tosses contains the same information -- including the ones which are ALL heads! Because these WILL occur with a fair coin, and with such a fair coin after a sequence of 100 heads, you STILL can't predict the next toss. Or even after a sequence of a million heads (although I don't like to guess exactly HOW long you'd have to toss a coin to randomly generate such a sequence -- although I believe Derren Brown did it for a slightly shorter sequence.............)
Now suppose I have a totally unfair coin, both sides are heads. Now every sequence is totally predictable, since all sequences are HHHHHHH..... and the information content is ZERO.
SO, the information doesn't depend on the SEQUENCE, but the coin. HHHHHH from a fair coin contains information, whereas HHHHHH from a double-headed coin contains none.
The place where Naymissus got confused is what you do when you don't know before you start whether or not the coin is fair.
So, what would you do? You would take a long sequence of tosses, and count how many heads, how many tails.
If in a sequence of 100 you got no tails, you'd probably decide that it wasn't a fair coin but a double-header, hence HHHHH contains no information.
If in a sequence of 100, you got roughly equal numbers of heads and tails, you'd decide probably a fair coin. If you then tossed it again and got a sequence of 100 heads (not impossible, just rather unlikely), then that sequence DOES contain information, and just as much information as any other sequence that coin could produce.
That's the key issue -- a sequence compared to all possible other sequences that the same source could produce, NOT the sequence on its own.
I have to say, what Naymissus is producing MAY be the ways some communications engineers 'understand' information theory, but going back to the chaps that actually developed it, it's just plain WRONG.
As I've explained before (go read shannon), the information content of a SEQUENCE is -log p, where p is the probability of that sequence. The entropy is of the source, NOT the message. You can TRY to estimate the entropy of the source by using the frequency of symbols in one message, and this is the entropy of that message, and the ESTIMATE of the entropy of the source. But it ISN'T the information content OF THAT MESSAGE, just (as an estimate of the entropy of the source), an ESTIMATE of the AVERAGE information content of messages produced by that source.
If you don't believe me, go look it up on the internet. If you want to believe Naymissus, because you're his chum, and think his stuff sounds reasonable, go ahead, your pigeon, but anyone else who does know what they are talking about will laugh at you -- he's got it wrong.
Let me have another go -- something completely predictable contains no information. Something random contain MAXIMUM information, you can't predict anything.
Yet the randomness or predictability is of the SOURCE -- it's the COIN that is fair (random), or unfair (double-header). Either coin can produce a sequence of ALL heads, hence the information content of a string of tosses depends which source it came from, and what it is compared to what other sequences MIGHT have occured.
Now suppose we don't know which COIN we are using -- the 'information' as to which coin we have is contained in whether or not the sequence contains heads AND tails -- changes in NMs words. BUT this isn't the information content of the MESSAGE.
So, if my test message is all heads, I will plump for unfair coin and zero information. I MIGHT get it wrong, it might have actually come from the fair coin, in which case my guess as to its information content will be WRONG. Suppose my test message contains heads and tails -- then I know it is the fair coin. BUT even knowing it is the fair coin, I could still toss it again and get a sequence of ALL heads. And that DOES contain information, not because there are no changes in the sequence, but because since it was a fair coin, there COULD have been. Hence if a thing COULD have changed, knowing it didn't is information.
Let's take Naymissus' case to the absolute limit -- suppose I toss a coin ONCE. Whether I get a head or a tail doesn't tell me anything other than the coin does contain that symbol.
Toss it again -- even for a fair coin, it is very likely that i get two heads or two tails. Do the sequences HH and TT contain information? Of COURSE they do! Then contain the information that I didn't get HT or TH, or HH/TT as approriate. Of the 4 possible results, HH is just one.
When I toss a real coin, in 2 tosses I get 4 possible results with equal probability. Each of those pairs of tosses contains as much infomation an any other -- 2 bits giving me the actual result fo those 2 tosses.
This DOESN't change if we extend the sequence to a hundred, or a million coin tosses -- ANY sequence contains just as much information as any other, because I know I have a fair coin.
|
|
|
Post by speakertoanimals on Feb 2, 2011 20:59:20 GMT 1
Okay, one more go at showing WHY Maymissus gets it wrong!
Suppose I have a message sender who has TWO possible sources, a fair coin, and a double-header. Every so often, he switches coins.
He then sends me the sequence of heads and tails.
Now I have a DIFFERENT task to the sort of thing considered by SHannon. From the sequence, I have to GUESS where he has switched coins. If I get this right, the repeated sequences of zeros from the double-header contain no information in the Shannon sense, until he switches and they end. BUT I have to face up to the fact that I will sometimes get it wrong, and sequences of heads can come from the FAIR coin as well!
SO, according to SHannon, the actual information content of a string depends on what coin he was using at the time -- IF I knew that exactly, I could just throw away whatever he sent using the double-header. Then I would be left with sequences of heads that DID contain information, produced from the fair coin. Except I don't know, I have to guess, so my first stab is that any sequence of zeros longer than some set length means coin switched, and coin switched back as soon as a tail occurs.
It won't be right, but its a decent stab -- BUT it ISN'T the actual information content acording to Shannon, and I would have to know the times of the switches to get that right.
SO, Naymissus, without realising it, is actualy talking about what you can do in a situation like this when you are ignorant. You have a best stab at ESTIMATING the information.
ANd we can see WHY he has focused on what to us may seem a slightly odd situation -- phone calls. Where you have periods of silence (double-headed coin), punctuated with periods of talk. The best-stab estimate of when the coins switch and what the information content is give you the changes result he keeps referring to. But gives the WRONG answer when you apply it to the case of JUST a fair coin, where ANY string contains as much information as any other.
And NM doesn't actually know enough about the basis of information theory to see that that is what he is doing. Good enough for engineers perhaps, but mathematicians and computer scientists laugh at it!
|
|
|
Post by abacus9900 on Feb 2, 2011 21:14:34 GMT 1
So, it looks like you are saying that events with the same probability of happening generate the same amount of information, yes? Presumably, therefore, unequal events do NOT generate the same amount of information, so that a relatively rare event, let's say an event with a probability of, say, 1/100 would convey a great deal more information. If so, I'm still a bit hazy about how this is represented mathematically.
BTW, as a preparation allow me to reviews logs.
Now, taking the binary system as our base, log 16 is equivalent to 2^4, right?
|
|
|
Post by Progenitor A on Feb 2, 2011 22:32:01 GMT 1
So, it looks like you are saying that events with the same probability of happening generate the same amount of information, yes? That is exactly what Shannon tells us. Yes that is right Presumably, therefore, unequal events do NOT generate the same amount of information, Exactly .... so that a relatively rare event, let's say an event with a probability of, say, 1/100 would convey a great deal more information. Precisely If so, I'm still a bit hazy about how this is represented mathematically. We will clear that up at the right time BTW, as a preparation allow me to reviews logs. Now, taking the binary system as our base, log 16 is equivalent to 2^4, right? Right The logarithm af a number x, is the power to which the base (in our case base 2) must be raised to produce that number log 2 16; the base is 2,; what power must 2 be raised to to produce 16? 2 4=16 Therefore Log 216=4 Here are some useful numbers and their log 2 result log 24=2 (2 2=4) log 22=1 (2 1=2) log 28=3 (2 3=8) log 216=4 (2 4=16) l0g 264=6 (2 6=64) log 2256=8 (2 8=256) As we will be looking at probabilities, then we must consider the log 2 of fractions: log 2 (1/2)= Log 2 (2 -1) = -1log 2 2 = -1 x1 =1 log 2(1/4) = log 2(1/2 2) = log 2 2 -2= -2log 2 2= -2 x 1=-2 ....so it follows that log 2(1/8) =-3 log 216=-4 Log 264=-6........ We can do more the log 2 of more complex fractions later
|
|