Nodejs & Mongo pagination random order
I think the only way that you will be able to guarentee that users see unique users every time is to store the list of users that have already been seen. Even in the RAND
example that you linked to, there is a possibility of intersection with a previous user list because RAND
won't necessarily exclude previously returned users.
Random Sampling
If you do want to go with random sampling, consider Random record from MongoDB which suggests using an an Aggregation and the $sample
operator. The implementation would look something like this:
const { MongoClient} = require("mongodb");const DB_NAME = "weather", COLLECTION_NAME = "readings", MONGO_DOMAIN = "localhost", MONGO_PORT = "32768", MONGO_URL = `mongodb://${MONGO_DOMAIN}:${MONGO_PORT}`;(async function () { const client = await MongoClient.connect(MONGO_URL), db = await client.db(DB_NAME), collection = await db.collection(COLLECTION_NAME); const randomDocs = await collection .aggregate([{ $sample: { size: 5 } }]) .map(doc => { return { id: doc._id, temperature: doc.main.temp } }); randomDocs.forEach(doc => console.log(`ID: ${doc.id} | Temperature: ${doc.temperature}`)); client.close();}());
Cache of Previous Users
If you go with maintaining a list of previously viewed users, you could write an implementation using the $nin
filter and store the _id
of previously viewed users.
Here is an example using a weather database that I have returning entries 5 at a time until all have been printed:
const { MongoClient} = require("mongodb");const DB_NAME = "weather", COLLECTION_NAME = "readings", MONGO_DOMAIN = "localhost", MONGO_PORT = "32768", MONGO_URL = `mongodb://${MONGO_DOMAIN}:${MONGO_PORT}`;(async function () { const client = await MongoClient.connect(MONGO_URL), db = await client.db(DB_NAME), collection = await db.collection(COLLECTION_NAME); let previousEntries = [], // Track ids of things we have seen empty = false; while (!empty) { const findFilter = {}; if (previousEntries.length) { findFilter._id = { $nin: previousEntries } } // Get items 5 at a time const docs = await collection .find(findFilter, { limit: 5, projection: { main: 1 } }) .map(doc => { return { id: doc._id, temperature: doc.main.temp } }) .toArray(); // Keep track of already seen items previousEntries = previousEntries.concat(docs.map(doc => doc.id)); // Are we still getting items? console.log(docs.length); empty = !docs.length; // Print out the docs docs.forEach(doc => console.log(`ID: ${doc.id} | Temperature: ${doc.temperature}`)); } client.close();}());
I have encountered the same issue and can suggest an alternate solution.
TL;DR: Grab all Object ID of the collections on first landing, randomized using NodeJS and used it later on.
- Disadvantage: slow first landing if have million of records
- Advantage: subsequent execution is probably quicker than the other solution
Let's get to the detail explain :)
For better explain, I will make the following assumption
Assumption:
- Assume programming language used NodeJS
- Solution works for other programming language as well
- Assume you have 4 total objects in yor collections
- Assume pagination limit is 2
Steps:
On first execution:
- Grab all Object Ids
Note: I do have considered performance, this execution takes spit seconds for 10,000 size collections. If you are solving a million record issue then maybe used some form of partition logic first / used the other solution listed
db.getCollection('my_collection').find({}, {_id:1}).map(function(item){ return item._id; });
OR
db.getCollection('my_collection').find({}, {_id:1}).map(function(item){ return item._id.valueOf(); });
Result:
ObjectId("FirstObjectID"),ObjectId("SecondObjectID"),ObjectId("ThirdObjectID"),ObjectId("ForthObjectID"),
- Randomized the array retrive using NodeJS
Result:
ObjectId("ThirdObjectID"),ObjectId("SecondObjectID"),ObjectId("ForthObjectID"),ObjectId("FirstObjectID"),
- Stored this randomized array:
- If this is a Server side script that randomized pagination for each user, consider storing in Cookie / Session
- I suggest Cookie (with timeout expired linked to browser close) for scaling purpose
On each retrieval:
Retrieve the stored array
Grab the pagination item, (e.g. first 2 items)
Find the objects for those item using find $in
.
db.getCollection('my_collection') .find({"_id" : {"$in" : [ObjectId("ThirdObjectID"), ObjectId("SecondObjectID")]}});
- Using NodeJS, sort the retrieved object based on the retrived pagination item
There you go! A randomized MongoDB query for pagination :)