Calculating the relevance of a User based on Specific data Calculating the relevance of a User based on Specific data mysql mysql

Calculating the relevance of a User based on Specific data


This problem is a candidate for machine learning. Look for an introductory book, because I think that it is not very complex and you could do it. If not, depending on the income you make with your website, you might consider hiring someone who does it for you.

If you prefer to do it "manually"; you will build your own model with specific weights to different factors. Be aware that our brains deceive us very often and what you think is a perfect model might be far from optimal.

I would suggest you to start right away storing data on which users each user interacts more with; so you can compare your results with real data. Also, in the future you will have a foundation to build a proper machine learning system.

Having said that, here is my proposal:

In the end, you want a list like this (with 3 users):

A->B: relevance----------------User1->User2: 0.59User1->User3: 0.17User2->User1: 0.78User2->User3: 0.63User3->User1: 0.76User3->User2: 0.45

1) For each user

1.1) Compute and cache the age of every user's 'last_seen', in days, integer rounding down (floor).

1.2) Store max(age(last_seen)) -let's call it just max-. This is one value, not one per user. But you can only compute it once you have previously computed the age of every user

1.3) For each user, change the stored age value with the result of (max-age)/max to get a value between 0 and 1.

1.4) Compute and cache also every object's 'created_at', in days.

2) For each user, comparing with every other user

2.1) Regarding mutual connections, think of this: if A has 100 connections, 10 of them shared with B, and C has 500 connections, 10 of them shared with D, do you really take 10 as the value for the calculation in both cases? I would take the percentage. For A->B it would be 10 and for C->D it would be 2. And then /100 to have a value between 0 and 1.

2.2) Pick a maximum age for mutual objects to be relevant. Let's take 365 days.

2.3) In user A, remove objects older than 365 days. Do not really remove them, just filter them out for the sake of these calculations.

2.4) From the remaining objects, compute the percentage of mutual objects with each of the other users.

2.5) For each one of these other users, compute the average age of the objects in common from the previous step. Take the maximum age (365), subtract the computed average and /365 to have a value between 0 and 1.

2.6) Retrieve the age value of the other user.

So, for each combination of A->B, you have four values between 0 and 1:

  • MC: mutual connections A-B
  • MO: mutual objects A-B
  • OA: avg mutual object age A-B
  • BA: age of B

Now you have to assign weights to each one of them in order to find the optimal solution. Assign percentages which sum 100 to make your life easier:

Relevance = 40 * MC + 30 * MO + 10 * OA + 20 * BA

In this case, since OA is so related to MO, you can mix them:

Relevance = 40 * MC + 20 * MO + 20 * MO * OA + 20 * BA

I would suggest running this overnight, every day. There are many ways to improve and optimize the process... have fun!