FRIHOST FORUMS SEARCH FAQ TOS BLOGS COMPETITIONS
You are invited to Log in or Register a free Frihost Account!


Storing Massive Amounts of Data





polly-gone
Currently, I am looking at ways to store massive amounts of data, and I have some questions.

Let's look at Facebook for example.

Dozens of comments can be posted on every status, fan page update, photo, video, link, post etc. How would this information be stored?

Would it be efficient to have one insanely massive table that stores a comment id, user id, the id of what it was posted on, and the comment text in a row for every single comment? Would it be more efficient to have a table for each post type (photos, statuses, etc.) And post the same information to there? Is there another method that I am not considering?

Thank you very much,

-Nick SmileSmileSmile
badai
if your one massive table is already normalize, why not?

otherwise, you have to break it up to have at least 3rd normal form

and don't forget to index your tables to speed up query.
jmraker
I think facebook could have a table named "post" and have fields like
post_id, post_parent, post_userid, post_body, post_date, post_link, post_type

and other tables like video, photo, link, etc
and do an SQL outer join on all of them

Code:
SELECT * FROM post
LEFT JOIN video WHERE post_link=video_id AND post_type="video"
LEFT JOIN photo WHERE post_link=photo_id AND post_type="photo"
...
WHERE post_parent=100
ORDER BY post_date;

That might work.

and if they comment with a video it adds a record to "post" and "video"
rvec
I don't think they'd ever do a query like that. Joining those tables would probably make the system very unhappy.
jmraker
The next least likely way they do it is if they generate the comment/posts's HTML code and save it in a generic field (that stores HTML) in a "post" table so if they reply with a video it puts the HTML for the video's <embed> code in the same field it puts images, links, etc

There's any number of ways they could do it.

I wouldn't suggest dumping all the files into one directory because that directory will eventually become unmanagable with FTP because some or all FTP servers don't handle directories with more than 2000 files and subdirectories very well.
rvec
I am absolutely sure they wouldn't do that either. The html code is the same for all videos, the only difference is the url (and of that probably even only the id). I think they have one big table with comments, and when those comments need to be fetched it'd just do something like:
Code:
SELECT c.message, u.uid, u.name from comments c LEFT JOIN users u ON c.uid=u.uid WHERE c.parent={id};


Because you want to be able to show all the posts of one user, all the posts made on one topic and never all posts made on one content type.

To get the content itself they probably only do one query to fetch a whole row from the content table (probably one big main content table and multiple add-on content tables for content-type specific stuff, like length for video).
The id for the content type is probably used to get the url to the video for the embed code and stuff.

Also most stuff will probably be cached to the hard disk, and not to a database. So what jmraker said probably will be done something like that, but as a .html file in a cache directory on the hard-disk (or with them maybe a cache-server (proxy)).

Also I don't think they'd ever come in that directory over ftp. Everything can probably be done on a webinterface, and if that somehow gets stuck I think they ssh in there.
dwxco
most databases support a datatype (like BLOB) which is stored in a secondary place, not the primary table, so that searches can still be done quickly.

Storing strings with a link does make a lot of sense to reduce the space and not have to store a video in the table.
polly-gone
Well, I think this question was sufficiently answered A YEAR AGO, but thanks anyway.
Related topics
I got a question about the webspace!!
energy
Muslims killed in stampede
Think it's safe to trust your data on DVDs?
When Jesus was born...
Looking For A good Databasing Program
Google's Philosopy
Is this piece of PHP code efficent?
Storage of Massive Amounts of Data
Americans want universal health care. Why can't we get it?
10 Reasons To Buy Windows 7 Professional (!)
Obamacare Fiasco
Do you have any faith that there is a "soul"
Storing MySQL connection data securely and conveniently
Reply to topic    Frihost Forum Index -> Scripting -> Php and MySQL

FRIHOST HOME | FAQ | TOS | ABOUT US | CONTACT US | SITE MAP
© 2005-2011 Frihost, forums powered by phpBB.