GraphQL Dataloader vs Mongoose Populate GraphQL Dataloader vs Mongoose Populate mongoose mongoose

GraphQL Dataloader vs Mongoose Populate


It's important to note that dataloaders are not just an interface for your data models. While dataloaders are touted as a "simplified and consistent API over various remote data sources" -- their main benefit when coupled with GraphQL comes from being able to implement caching and batching within the context of a single request. This sort of functionality is important in APIs that deal with potentially redundant data (think about querying users and each user's friends -- there's a huge chance of refetching the same user multiple times).

On the other hand, mongoose's populate method is really just a way of aggregating multiple MongoDB requests. In that sense, comparing the two is like comparing apples and oranges.

A more fair comparison might be using populate as illustrated in your question as opposed to adding a resolver for activities along the lines of:

activities: (task, _, context) => Activity.find().where('id').in(task.activities)

Either way, the question comes down to whether you load all the data in the parent resolver, or let the resolvers further down do some of the work. because resolvers are only called for fields that are included in the request, there is a potential major impact to performance between these two approaches.

If the activities field is requested, both approaches will make the same number of roundtrips between the server and the database -- the difference in performance will probably be marginal. However, your request might not include the activities field at all. In that case, the activities resolver will never be called and we can save one or more database requests by creating a separate activities resolver and doing the work there.

On a related note...

From what I understand, aggregating queries in MongoDB using something like $lookup is generally less performant than just using populate (some conversation on that point can be found here). In the context of relational databases, however, there's additional considerations to ponder when considering the above approaches. That's because your initial fetch in the parent resolver could be done using joins, which will generally be much faster than making separate db requests. That means at the expense of making the no-activities-field queries slower, you can make the other queries significantly faster.