What about missing values?

4 comments 9,087 views

There is no perfect method for ranking social media except trial and error, along with a realization that perfection is an impossibility and that good enough is all one can possibly expect in the real world.

Nevertheless, one must deal with missing values. Data on the site come from a number of outside as well as inside sources. Nevertheless, values in these data sets are sometimes missing. Main reason being:

– no data available – for instance, if the blog is not registered with Technorati, we end up with missing data (i.e. neither a Technorati Authority nor a Technorati Ranking score is available).

replacements for missing values

In some cases we display a replacement value when values are missing. The goal is to give you an idea of the typical magnitude of the missing value. These replacement values are marked with an asterisk (*) and are shown in gray text to distinguish them from known values. Here is how the replacement values are computed:

We calculate missing values using Google PagRank as the main indicator. So for instance:

AVG = average value of all blogs having the same PageRank as the current blog with the missing value (e.g., Technorati Ranking)

We calculate the average Technorati Ranking for the group of blogs that have the same PageRank as the current blog with the missing rank. We then replace the missing Technorati Ranking with the average scroe we obtained for those blogs with the same Google PageRank.

So how does this work in practice? Suppose the Technorati Ranking is not known for a corporate blog that has a Google PageRank of 4. We compute the average number of Technorati Ranking for those blogs that have a Google PageRank of 4 included in the index with known Technorati Ranking scores. We then display and use this average in place of the missing value.

missing values in rankings
Missing values are handled in three different ways in the rankings. You decide which method will be used when you set your priorities.

Use averages – when a value is missing, we use in its place a replacement value generated via the process described above.

Skip and reweigh – when a value is missing, we skip it and increase the weights you assign to other items.

Do not rank – when a value is missing, we do not rank the blog and instead display it in a separate, alphabetically sorted list.

To explain the skip and reweight option, suppose, for example, that you give a weight of 3 to “Google PageRank,” a weight of 1 to “Technorati Ranking“, a 1 to “Technorati Authority,” a weight of 3 to “Yahoo! InLinks,” weight of 2 to “Google BlogSearch count.”

Suppose further that the Google PageRank for one blog is not available. We compute the score for the blog with the missing value by skipping over the missing value and increasing the weights for the other items.

In this case, the sum of the magnitudes of all your weights is 3 + 1 + 1 + 3 + 2 = 10, and the sum of the magnitudes of the weights for the known values is 1+ 1 + 3 + 2 = 7. We skip the item with the missing value and increase the weights for the remaining items by a factor of 10 / 7 = 1.4286.

Thus, for the blog with the missing item, we use weights of 2.4286 for “Technorati Ranking,” 2.4286 for “Technorati Authority,” 4.4286 for Yahoo! InLinks,” and 3.4286 for “Google BlogSearch county.”

default method for missing values
The use of averages as outlined above is used.

Find more interesting information also here: composite indicators – never ending debate, how are raw data normalized (here you are), how we calculate the rankings, what we do with missing values, time series, weighting – default is

If you are interested in the topic of statistics, then I suggest you check out this blog:

Karen Grace-Martin – writes The Analsis Factor blog – you should subscribe – it is refreshing and very helpful indeed