Main Area and Open Discussion > Living Room
Scaling issues to keep in mind while developing a social network feed
(1/1)
Paul Keith:
Ok...testing how much I can get away with anonymous quoting of Quora while linking back to the source:
http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed
You want to minimize the number of disk seeks that need to happen when loading your home page. The number of seeks could be 0 or 1 but definitely not O(num friends). You also can't store all the data on one machine if you're concerned about scaling, so you've got a couple of options...
If you're willing to tolerate one disk seek, or if your graph has low fan-out (small number of people following any given person), you can de-normalize the data such that the metadata about every piece of activity is propagated to each of the followers of that activity at the time the action occurs. You might think of this as a "push" model. You'd still probably only store one copy of the actual activity data, but you'd push pointers to it (along with whatever other metadata is needed if you're supporting any ranking/filtering) to all the subscribers at the time it is created. Generally the first thing to break in this model will be the process of propagating the activity to all the subscribers, particularly if you have users that have large numbers of followers (celebrities). When this fails, the feed will start to get backed up. This can also be complicated in that you may need to write code that properly updates all of the subscribers whenever the important metadata about the content is updated, and you may want to also add code to update things when someone changes their list of subscriptions.
The alternative is to keep all the recent activity data in memory and not propagate the updates to the subscribers at write time, instead fanning out at the time of loading the home page. This way you avoid all disk seeks. It's also nice in that your fan out size is limited to the number of people a user follows rather than the number of people who follow a user (most people don't have enough time on their hands to follow millions of people, so you don't have the inverse of the celebrity problem). It's also easier to keep things up-to-date, since you don't have to worry about propagating updates to all of the subscribers. The downside of this approach is that the failure scenario is more catastrophic - instead of just delaying updates, you may potentially fail to generate a user's feed. Having some kind of fallback mechanism that approximates the feed (eg by querying only a subset of your friends) is handy to avoid having to show an error page.
Probably the theoretically best approach would be a hybrid of the above two options, but either of these options can be made to work reasonably well even at very large scales.
--- End quote ---
40hz:
No fear! ;) I think you're within the scope of Quora's terms and conditions with what you're doing:
Subject to these Terms, Quora gives you a worldwide, royalty-free, non-assignable and non-exclusive license to re-post any of the Content on Quora anywhere on the rest of the web provided that the Content was added to the Service after April 22, 2010, and provided that the user who created the content has not explicitly marked the content as not for reproduction, and provided that you: (a) do not modify the Content; (b) attribute Quora with a human and machine-followable link (an A tag) linking back to the page displaying the original source of the content on quora.com (c) upon request, either by Quora or a user, remove the user's name from Content which the user has subsequently made anonymous; (d) upon request, either by Quora or by a user who contributed to the Content, make a reasonable effort to update a particular piece of Content to the latest version on quora.com; and (e) upon request, either by Quora or by a user who contributed to the Content, make a reasonable attempt to delete Content that has been deleted on quora.com.
--- End quote ---
Quote away. :Thmbsup:
However...I can't help thinking it would be better if there were also some original content included with the quote. A summary, an introduction, an argument for or against the point being made - in short, anything that might avoid the impression we're scraping somebody else's content. Including a link back to the original is a step in the right direction. But even with that, just having quotes out there all by themselves can still make a forum thread start to feel like one of those dreaded "list of links" after a while.
But maybe that's just my personal weird. Feel free to ignore. :)
Paul Keith:
Well, it's kind of a catch-22.
If I can provide better and original content, I wouldn't fear pasting a quoted answer because I can answer that question directly and claim it as my own.
Then there's the issue that if I know enough to have original content, I probably won't share it on a forum to discuss because I would think...hmm... this is too vague of a details or this is wrong, etc. etc. and instead I'd probably be sharing a blog post I made where the details can better be lay-ed out and it can come off more authoritatively.
Btw thanks, it's nice to have two opinions affirming the TOS (I got the other one via pm) of Quora.
40hz:
^Very valid points you're making. I guess it really is a Catch-22 of sorts.
Best ignore my hangup and continue to do as you think best. :)
They're certainly interesting* posts! :Thmbsup:
------------------
* I'm still trying to get my head completely around the one about conversational structure!
Navigation
[0] Message Index
Go to full version