02 March 2008

Everything in the Digg, Reddit & Netscape Algorithms

At work today, Matt noted that he found Digg's algorithm far more interesting than Google's. I was shocked - after all, Digg isn't nearly as complex or widely used as Google, but with its rising popularity in the tech space, I could, at least, empathize with why he might feel that way. I also took it as a challenge to expose all the possible elements that might be in an algorithm at Digg, Reddit, Netscape, Shoutwire or other social-news-voting sites. Let's see how I do:

BTW - I'm going to use a lot of Digg-specific terminology, despite the fact that I'm referring to all of the sites above.

1. Number of votes over time

  • Uses a floating target based on relative levels of popularity (as mentioned in timing below)

  • Any number of votes in a very short period (if not manipulative) is stronger than the same number of votes over a longer period.

2. Domain of link

  • Has it previously had content submitted? If so, did that content receive votes, get marked as spam/lame, make the front page, etc?

  • Has the domain been manually/automatically flagged for being manipulative

3. Profile of submitter

  • Have they submitted high quality stories in the past?

  • Have they submitted spam/lame stories in the past?

  • How many friends do they have? This could make it harder or easier to get a story Dugg (harder if they have thousands of friends, but possibly easier if they have at least a few)

  • How many submissions have they made? What is their success rate?

  • How long has the member been around? New registrants could be a clear sign of spam

4. Profiles of voters (as above)

5. Timing of submission

  • If a low number of stories have recently made the front page in a given sector or overall, the story is more likely to get on top with fewer votes

  • If a high number of recent submissions, the opposite may be true

  • Time of day - if 50 people all tag a site at 3:00am, that might be a red flag

6. Similarity to other links (duplicate)

7. Source of votes

  • From the same IP address or IP block

  • From the same geographic region (that's not a hotspot for Digg users)

  • From the same group as has voted on previous content from a domain or string of domains

  • From a group of users who aren't regular participants/voters

8. Manual review as it hits the homepage

  • Many Digg users may not realize it, but all stories to hit the frontpage get a manual, editorial review that may pull the story. This often happens with content the editors feel is marketing-focused, driven by marketing dollars or has a marketing agenda.

  • Reddit does this, too, but it's not instantaneous

  • Netscape used to do it, but some have speculated the the level of oversight fluctuates

  • As a quick example, Brian Clark (of Copyblogger) had this post hit Digg's homepage last week for a scant minute or so before the editors pulled it.

9. Number of comments

  • Potentially could be used to detect patterns, though I've seen a lot of Dugg stories that had very few comments, so this might not be a great signal

10. Number of views

  • An abnormally high ratio of views with few Diggs could mean that people aren't fans of the content

  • In my opinion, this is a low signal, and down votes or lame/spam would earn more weight in bringing down a story

11. Down votes

  • Although Digg doesn't specifically have them, Reddit does and surely uses them as an influential factor

  • Digg, Netscape and Shoutwire all use flag systems which could be similarly interpreted

12. Source of Votes

  • I suspect that Digg would follow how users normally reach pages (through friends, via direct links, via email/type-in, etc.)
  • If an abnormally high number of folks came via an uncommon method to a Digg page (for example, with no referring URL, possibly signifying a mass email or IM link), Digg might want to discount the value of those votes

In a wonderful irony, the Digg website appears to have crashed tonight (a likely cause could be the new re-design, which Neil details at SELand).

So, what do you think? Are there other elements you'd consider having in your own social media voting site? Any obvious ones I neglected to mention?

No comments: