Wednesday, March 08, 2006

The Mythology of Internet Multimedia Statistics

Just the other day I was reading articles on diggdot.us, a cool content aggregation website consolidating links from Digg, Del.icio.us, and Slashdot, and I found an article about a singer in the UK who is web casting prerecorded shows of her playing music in her basement. The article itself was a short piece describing what she's doing and how successful she's been doing it. I say good for her. That's not the most important issue here. The most important issue is that the article is a total sham.

Let me explain what I mean. According to the 'Times Online UK' article she started web casting shows on February 24th. That's ten days from the Times article's publication, on March 5th. The article goes on to say that she had 70 people watch her first show, and 62,138 people watch her show ten days later. That means she gained at least 62,068 fans over ten days or 6,207 per show on average.

Do you know that the Major League Baseball Cleveland Indians stadium, Jacobs Field, only hold 43,368 people?

I would say that this is a pretty remarkable feat. To go from a nobody, to more than selling out an entire baseball stadium in ten days. I think it is so remarkable, that I'm willing to call it total crap. No, I'm not accusing the woman or the Times journalist of lying, I just believe that they, like most people, have no idea how to interpret the statistics given to them about their on-line media events.

So who am I and why do I think this story is pure shenanigans? Well, I work for a company who provides multimedia streaming of live and prerecorded content on the Internet. We provide about 15,000 live audio broadcasts per year, and we rebroadcast all of these events on demand for our customers. We also broadcast live video, and we are expecting to do at least 500 events over the next year. My job for the company is to manage some of the equipment used to broadcast these events, and to write and maintain the software the provides our customers with accurate and understandable reports on their event usage.

Typical Statistics

As a service provider, my company provides and manages the equipment and bandwidth to broadcast multimedia events over the Internet for our customers. The technical details of how all this is done is outside the scope of this rant. What will be discussed though is the statistics and reporting process.

No matter how you distribute you media, if you do it on the Internet, the process will leave you with detailed log files. In most cases these log file contain information describing the content receiver. Information such as IP address, date, time, duration, and bandwidth is recorded in these logs. The logs are very similar to logs written by a web server, except they contain slightly more information.

The information in the log files is then used to provide useful reports to the service provider's customers. Most service providers will give you the following information on a per day or per event basis: connections, unique IP addresses, bandwidth used, peak bandwidth, average duration, maximum duration, minimum duration, and many other things. So what do these stats really mean?

Connections

A connection results from every media player that successfully receives you event for some period of time. I find that this statistics is the single largest point of confusion for content providers today. The confusion stems from service providers historically and to some extent still today reporting this statistic or some multiple of it to their customers as being an indicator for the number of actual people whom have received a content provider's event. This claim is in undoubtedly false, as it would require you to believe that every media player that connected your event listened for the entire duration of that event. For proof that this is false take a look at the average duration statistic reported by your service provider. For this claim to be true, the average duration would have to be equal to the entire length of your event.

So what good is this statistic really? I find that this statistic can be used to find problems with your broadcasts, as this statistic will peek very sharply when your broadcast is having trouble. If you are familiar with web site statistics, this statistic is comparable in meaning to 'Pages' or 'Page Views'.

Unique IP Addresses

Every computer that connects to your event has an IP address, and the service provider's media server logs these addresses. The unique IP address report, generated by most service providers, results in giving you a unique set of all the IP addresses that received your event for some period of time.

I find that because of the use of network address translation or (NAT), this statistic looses some of its meaning. In spite of this, I usually consider this statistic to be a good indicator of the actual number of people who have received an event. If you are familiar with web site statistics, this statistic is comparable in meaning to 'Unique Visitors'.

Bandwidth

Bandwidth is a very basic statistic that is reported directly from the log files written by the server. Typically a service provider will summarize the bandwidth you've used over your billing period and use this to charge you. For more explanation, see the Wikipedia entry for Bandwidth.

Duration

Every media player that receives your event does so for a specific period of time. This amount of time is the duration. The duration by itself is not very helpful. It is usually shown in conjunction with the connections statistic. This allows your service provider to give you more meaningful statistics such as the average duration, minimum duration, maximum duration, and median duration.

These statistics can be used for a wide range of things, but are usually used to better understand the habits of your listeners or viewers.

The Holy Grail

If you're still reading and I've done my job, then you should have a better understanding of the statistic given by most service providers. What you are still missing and what you really need, are what I consider 'The Holy Grail' of Internet media statistics. I did not just make these up either. They come from many conversations with our marketing department and our customers.

Actual Number of People

Most people do not care about any of the statistics listed above; all they want to know is how many people received my event. Unfortunately there is no definite answer to this question. Why you might be wondering? Well it has to do with the fact that none of the information recorded in log files is actually about people. It is all about the computer and media player that connected to the event. All hope is not lost though. I can recommend some pointers for creating approximations.

The first step to knowing your users is to get some information about them. Don't give away access to your event until your user's have logged into your website. This way you can have email addresses, phone numbers, names, or whatever other information you require. Now that still doesn't give us an actual person count, but it gets us closer.

The second step, now attainable with the data from step one, is to survey your listeners. You can ask them whatever you want, but the most important thing to ask them is 'When you watched or listened to my event, how many people watched or listened with you?'. This is what you need to know to estimate a number of people who have received your event.

Once you receive a good sampling from your survey, take the average number you receive from this question and multiply it by the number of unique IP addresses given to you by your service provider. This will get you a fairly accurate estimate for the actual number of people who have received your event.

Advertising Statistics

If you're looking for advertising statistics then I would recommend using the same metric used with TV and Radio stations, listeners per quarter hour. Fortunately for you, this statistic can be easily calculated and is not an estimate, it is an exact count. Our company provides this report to our clients in the form of a convenient graph, and I'm sure that your service provider should be able to provide you with this report as well. I highly recommend asking for this report as I find it not only interesting, but very useful as a performance metric. It would also be vital for selling any ad spots during your event.

Conclusion

We've come a long way from my initial rant, and hopefully you have all learned a thing or two about Internet media statistics. I feel that it is very important for people to grasp and understand these statistics because if they don't they will end up caught believing the hype and not knowing the truth. To the average person this isn't the end of the world, but to content providers and advertisers this equates to actual dollars and cents.

I hope that after reading my article you have a better grasp of web media statistics, and that you can now see through the hype and bullshit surrounding them on the web.

As always leave me your comments, and let the world know your opinion.