How to check against all joins when generating a score using MYSQL
As advised by Sam Dufel, you probably do not need full text search, especially since you are using exact string comparison in your WHERE
clauses.
Moreover, because of the many-to-many relationship between results
and categories
(assumed from the HAVING COUNT(c_id) = 3
clause), I think in no way can you join both categories
and tags
in the same query.
Without the GROUP BY
clause, for one given result
, you would get one row for each matching category
. For each matching pair (result
, category
), you would then get one row for each matching tag.name
. I don't think there is a way to deal with such a result.
What I would suggest is:
Step 1: getting results
present in all three categories
SELECT results.IDFROM resultsJOIN categories ON results.id = categories.result_idWHERE categories.name IN ('purchase', 'condo', 'va')GROUP BY results.IDHAVING COUNT(DISTINCT c.c_id) = 3
Step 2: computing score of any results
matching at least one search string
SELECT DISTINCT results.*, -- DISTINCT is redundant because of the GROUP BY clause ( 5*(COUNT(tags.result_id)) + -- you actually want to count the number of matches! 1*usefulness + -- warning, see below 10*shares -- warning, see below ) AS score FROM resultsINNER JOIN tags ON results.id = tags.result_idWHERE tags.name = 'self employed' OR tags.name = 'rental income' OR tags.name = 'commission income' OR tags.name = 'bankruptcy' OR tags.name = 'condo approval'GROUP BY results.ID
Step 3: putting it all together
SELECT results.*, ( 5*(COUNT(tags.result_id)) + 1*usefulness + -- warning, see below 10*shares -- warning, see below ) AS score FROM ( SELECT results.id FROM results JOIN categories ON results.id = categories.result_id WHERE categories.name IN ('purchase', 'condo', 'va') AND ( results.scope = 'all' OR results.scope = 'hi' ) AND published = 1 GROUP BY results.id HAVING COUNT(DISTINCT categories.c_id) = 3) AS results_subsetJOIN results ON results_subset.id = results.idJOIN tags ON results.id = tags.result_idWHERE tags.name = 'self employed' OR tags.name = 'rental income' OR tags.name = 'commission income' OR tags.name = 'bankruptcy' OR tags.name = 'condo approval'GROUP BY results.ID
Notice where I chose to include the conditions WHERE on scope
and published
. This choice is based on the principle that filters should be stated as early as possible. You may get better performance if you place them in the outer query but it really depends on cardinalities.
A word of warning: fields usefulness
and shares
are neither part of the GROUP BY
function not included in an aggregation function. This is allowed by MySQL but highly dangerous. If usefulness
and shares
belong to a table other than result
(the table being GROUP'ed BY), the values returned in your query are undefined.
write it as follows:
"sum((5*(MATCH(tags.name) AGAINST('"self employed"' IN BOOLEAN MODE))), (5*(MATCH(tags.name) AGAINST('"rental income"' IN BOOLEAN MODE))) , (5*(MATCH(tags.name) AGAINST('"commission income"' IN BOOLEAN MODE))), (5*(MATCH(tags.name) AGAINST('"bankruptcy"' IN BOOLEAN MODE))), (5*(MATCH(tags.name) AGAINST('"condo approval"' IN BOOLEAN MODE))), (1*usefulness), (10*shares)) as score"
You need to SUM() the Score because ONE line only matches ONE Tag.
In your Query selected multiple Rows and grouped them by ID, so you are getting the Result for ONE Row only and that would always be 5 in your case.