Best way to store large amount of data of users Best way to store large amount of data of users database database

Best way to store large amount of data of users


Could you create relational MySQL tables? e.g.:

A users table and a files table.

Your users table would keep track of everything you are (I assume) already tracking:

id, name, email, etc.

Then the files table would store something like:

id, fileExtension, fileSize, userID <---- userID would be the foreign key pointing to the id field in the files table.

then when you save your file you could save it as it's id.fileExtension and use a query to pull the user associated with that file, or all files associated with a user.

e.g.:

SELECT users.name, files.id, files.extensionFROM `users`INNER JOIN `files` on users.id = files.userID;


I handle file metadata on the database and retrive the files with a UUID. What i do is:

  1. Content based identification
    1. MD5 from file's content
    2. Namespaced UUID:v5 to generate unique identifier based on user's uuid and file's md5.
    3. Custom function to generate path based on 'realname'.
    4. Save on the database: uuid, originalname (the uploaded name), realname (the generated name), filesize, and mime. (optional dateAdded, and md5)
  2. File retrival.
    1. UUID to retrive metadata.
    2. regenerate filepath based on realname.
    3. Originalname is used to show a familiar name to the user that downloads the file.

I process the file's name assigning it a namespaced UUID as the database primary key, and Generate the path based on User and filename. The precondition is that your user has a uuid assigned to him. The following code will help you avoid id collisions on the database, and help you identify files by its contents (If you ever need to have a way to spot duplicate content and not necesarily filenames).

$fileInfo = pathinfo($_FILE['file']['name']);$extension = (isset($fileInfo['extension']))?".".$fileInfo['extension']:"";$md5Name = md5_file($_FILE['file']['tmp_name']); //you could use other hash algorithms if you are so inclined.$realName = UUID::v5($user->uuid, $md5Name) . $extension; //UUID::v5(namespace, value).

I use a function to generate the filepath based on some custom parameteres, you could use $username and $realname. This is helpful if you implement a distributed folder structure which you might have partitioned on file naming scheme, or any custom scheme.

function generateBasePath($realname, $customArgsArray){    //Process Args as your requirements.    //might as well be  "$FirstThreeCharsFromRealname/"    //or a checksum that helps you decide which drive/volume/mountpoint to use.    //like some files on the local disk and some other from an Amazon::S3 mountpoint.    return $mountpoint.'/'.$generatedPath; }

As an added bonus this also:

  1. helps you maintain a versioned file repository if you add an attribute on the file's record of which file (uuid) it has replaced.
  2. create a application Access Control List if you add an attributes of 'owner' and/or 'group'
  3. also works on a single folder structure.

Note: I used php's $_FILE as an example of the file source based on this question's tags. It can be from any file source or generated content.


Since you already use MongoDB, I would suggest checking out GridFS. It's a specification that allows you to store files(even if they are larger than 16mb) into MongoDB collections.

It is scalable, so you'll have no problems if you add another server, it also stores metadata, it is possible to read files in chunks and it also has built in backup functions.