First time with MongoDB + Docker - Set up from docker compose
So here is an attempt at helping.. For the most part, the docker compose yaml file is pretty close, with exception of some minor port and binding parameters. There is an expectation that initialization will be additional commands. Example:
- docker-compose up the environment
- run some scripts to init the environment
... but this was already part of the original post.
So here is a docker compose file
docker-compose.yml
version: '3'services: # mongo config server mongocfg1: container_name: mongocfg1 hostname: mongocfg1 image: mongo command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27019 --bind_ip_all volumes: - ~/mongo_cluster/config1:/data/db mongocfg2: container_name: mongocfg2 hostname: mongocfg2 image: mongo command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27019 --bind_ip_all volumes: - ~/mongo_cluster/config2:/data/db mongocfg3: container_name: mongocfg3 hostname: mongocfg3 image: mongo command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27019 --bind_ip_all volumes: - ~/mongo_cluster/config3:/data/db# replica set 1 mongors1n1: container_name: mongors1n1 hostname: mongors1n1 image: mongo command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27018 --bind_ip_all volumes: - ~/mongo_cluster/data1:/data/db mongors1n2: container_name: mongors1n2 hostname: mongors1n2 image: mongo command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27018 --bind_ip_all volumes: - ~/mongo_cluster/data2:/data/db mongors1n3: container_name: mongors1n3 hostname: mongors1n3 image: mongo command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27018 --bind_ip_all volumes: - ~/mongo_cluster/data3:/data/db# replica set 2 mongors2n1: container_name: mongors2n1 hostname: mongors2n1 image: mongo command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27018 --bind_ip_all volumes: - ~/mongo_cluster/data4:/data/db mongors2n2: container_name: mongors2n2 hostname: mongors2n2 image: mongo command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27018 --bind_ip_all volumes: - ~/mongo_cluster/data5:/data/db mongors2n3: container_name: mongors2n3 hostname: mongors2n3 image: mongo command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27018 --bind_ip_all volumes: - ~/mongo_cluster/data6:/data/db# mongos router mongos1: container_name: mongos1 hostname: mongos1 image: mongo depends_on: - mongocfg1 - mongocfg2 command: mongos --configdb mongors1conf/mongocfg1:27019,mongocfg2:27019,mongocfg3:27019 --port 27017 --bind_ip_all ports: - 27017:27017 mongos2: container_name: mongos2 hostname: mongos2 image: mongo depends_on: - mongocfg1 - mongocfg2 command: mongos --configdb mongors1conf/mongocfg1:27019,mongocfg2:27019,mongocfg3:27019 --port 27017 --bind_ip_all ports: - 27016:27017
... and some scripts to finalize the initialization...
docker-compose up -d
... Give it a few seconds to wind up, then issue...
# Init the replica sets (use the MONGOS host)docker exec -it mongos1 bash -c "echo 'rs.initiate({_id: \"mongors1conf\",configsvr: true, members: [{ _id : 0, host : \"mongocfg1:27019\", priority: 2 },{ _id : 1, host : \"mongocfg2:27019\" }, { _id : 2, host : \"mongocfg3:27019\" }]})' | mongo --host mongocfg1:27019"docker exec -it mongos1 bash -c "echo 'rs.initiate({_id : \"mongors1\", members: [{ _id : 0, host : \"mongors1n1:27018\", priority: 2 },{ _id : 1, host : \"mongors1n2:27018\" },{ _id : 2, host : \"mongors1n3:27018\" }]})' | mongo --host mongors1n1:27018"docker exec -it mongos1 bash -c "echo 'rs.initiate({_id : \"mongors2\", members: [{ _id : 0, host : \"mongors2n1:27018\", priority: 2 },{ _id : 1, host : \"mongors2n2:27018\" },{ _id : 2, host : \"mongors2n3:27018\" }]})' | mongo --host mongors2n1:27018"
... again, give 10-15 seconds to allow the system to adjust to recent commands...
# ADD TWO SHARDS (mongors1, and mongors2)docker exec -it mongos1 bash -c "echo 'sh.addShard(\"mongors1/mongors1n1:27018,mongors1n2:27018,mongors1n2:27018\")' | mongo"docker exec -it mongos1 bash -c "echo 'sh.addShard(\"mongors2/mongors2n1:27018,mongors2n2:27018,mongors2n3:27018\")' | mongo"
Now, try to connect to the mongos from the host with docker running (assumes you have mongo shell installed on this host). Use 2 mongos hosts as the seed list.
mongo --host "localhost:27017,localhost:27016"
Comments
Notice how the priority for node0 is set to a priority of 2 in the init() call?
Notice how the config servers are all port 27019 - this follows recommendations by MongoDB.
Notice how the shard servers are all port 27018 - again, following mongo recommendations.
The mongos expose 2 ports 27017 (the natural port for MongoDB) and also port 27016 (a secondary mongos for high availability).
The config servers and the shard servers do not expose their respective ports - for security reasons. Should be using the mongos to get to the data. If need to have these ports open for administrative purposes simply add to the docker compose file.
The replica-set intercommunication is not using authentication. This is a security no-no. Need to decide which auth mechanism is best for your scenario - can use keyfile (just a text file that is identical among the members of the replica set) or x509 certs. If going with x509 then you need to include the CA.cert in each docker container for reference along with the individual cert per server with proper host name alignment. Would need to add the startup configuration item for the mongod processes to use whichever auth method was selected.
Logging is not specified. It probably makes sense to set the logging output of the mongod and mongos to the default location of /var/log/mongodb/mongod.log and /var/log/mongodb/mongos.log for these. Without specifying a logging strategy I believe mongo logs to standard out, which is suppressed if running docker-compose up -d
.
Superuser: No users are yet created on the system. Usually for every replica set I stand up before adding it to a sharded cluster I like to add a super user account - one having root access - so if I need to make administrative changes at the replica set level I can. With the docker-compose approach you can create a super user from the mongos perspective and perform most all operations needed on a sharded cluster, but still - I like having the replica set user available.
OS tunables - Mongo likes to take up all the system resources. For a shared ecosystem where one physical host is hosting a bunch of mongo processes, you might want to consider specifying the wiredTiger cache size, etc. WiredTiger by default wants (System Memory Size
- 1 GB) / 2. Also, you would benefit from setting ulimits to proper values - i.e., 64000 file handles per user is a good start - mongo potentially likes to use a lot of files. Also, filesystem should be mounted somewhere having xfs. This strategy is using the host system users home directory for database data directories. A more thoughtful approach could be welcomed here.
Anything else?
I am sure I am missing something. If you have any questions, please leave a comment and I will reply.
Update 1
The above docker-compose.yml file was missing the hostname attribute for some of the hosts, and this was causing balancer issues, so I have edited the docker-compose.yml to include hostname on all hosts.
Also, the addShard() method only referred to one host of the replica set. For completeness I added the other hosts to the addShard() method described above.
Following these steps will result in a brand new sharded cluster, but there are no user databases yet. As such, no user databases are sharded. So let's take a moment to add a database and shard it, then view the shard distributions (A.K.A., balancer results).
We must connect to the database via the mongos (as described above). This example assumes the use of the mongo shell.
mongo --host "localhost:27017,localhost:27016"
Databases in Mongo can be created a variety of ways. While there is no explicit database create command, there is an explicit create collection command (db.createCollection()). We must first set the database context using a 'use ' command...
use mydatabasedb.createCollection("mycollection")
... but rather than use this command we can create a database and collection by creating an index on a non-existing collection. (If you already created the collection, no worries, this next command should still be successful).
use mydatabasedb.mycollection.createIndex({lastName: 1, creationDate: 1})
In this example, I created a compound index on two fields...
- lastName
- creationDate
... on a collection that does not yet exist, on a database that does not yet exist. Once I issue this command, both the database and the collection will be created. Furthermore, I now have the basis for a shard key - the key to which sharding distribution will be based. This shard key will be based on this new index having these two fields.
Shard the database
Assuming I have issued the createIndex command, I can now turn on sharding at the database and issue the shardCollection command...
sh.enableSharding("mydatabase")sh.shardCollection("mydatabase.mycollection", { "lastName": 1, "creationDate": 1})
Notice how the command 'shardCollection()' refers to our indexed fields created earlier? Assuming sharding has been successfully applied, we can now view the distribution of data by issuing the sh.status() command
sh.status()
Example of output: (new collection, no data yet, thus no real distribution of data - need to insert more than 64MB of data such that there is more than one chunk to distribute)
mongos> sh.status()--- Sharding Status --- sharding version: { "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId("6101c030a98b2cc106034695") } shards: { "_id" : "mongors1", "host" : "mongors1/mongors1n1:27018,mongors1n2:27018,mongors1n3:27018", "state" : 1, "topologyTime" : Timestamp(1627504744, 1) } { "_id" : "mongors2", "host" : "mongors2/mongors2n1:27018,mongors2n2:27018,mongors2n3:27018", "state" : 1, "topologyTime" : Timestamp(1627504753, 1) } active mongoses: "5.0.1" : 2 autosplit: Currently enabled: yes balancer: Currently enabled: yes Currently running: no Failed balancer rounds in last 5 attempts: 0 Migration results for the last 24 hours: No recent migrations databases: { "_id" : "config", "primary" : "config", "partitioned" : true } { "_id" : "mydatabase", "primary" : "mongors2", "partitioned" : true, "version" : { "uuid" : UUID("bc890722-00c6-4cbe-a3e1-eab9692faf93"), "timestamp" : Timestamp(1627504768, 2), "lastMod" : 1 } } mydatabase.mycollection shard key: { "lastName" : 1, "creationDate" : 1 } unique: false balancing: true chunks: mongors2 1 { "lastName" : { "$minKey" : 1 }, "creationDate" : { "$minKey" : 1 } } -->> { "lastName" : { "$maxKey" : 1 }, "creationDate" : { "$maxKey" : 1 } } on : mongors2 Timestamp(1, 0)
Insert some data
To test out the sharding we can add some test data. Again, we want to distribute by lastName, and creationDate.
In mongoshell we can run javascript. Here is a script that will create test records such that data will be split and balanced. This will create 500,000 fake records. We need more than 64MB of data to create another chunk to balance. 500,000 records will make approx. 5 chunks. This takes a couple of minutes to run and complete.
use mydatabasefunction randomInteger(min, max) { return Math.floor(Math.random() * (max - min) + min);} function randomAlphaNumeric(length) { var result = []; var characters = 'abcdef0123456789'; var charactersLength = characters.length; for ( var i = 0; i < length; i++ ) { result.push(characters.charAt(Math.floor(Math.random() * charactersLength))); } return result.join('');}function generateDocument() { return { lastName: randomAlphaNumeric(8), creationDate: new Date(), stringFixedLength: randomAlphaNumeric(8), stringVariableLength: randomAlphaNumeric(randomInteger(5, 50)), integer1: NumberInt(randomInteger(0, 2000000)), long1: NumberLong(randomInteger(0, 100000000)), date1: new Date(), guid1: new UUID() };}for (var j = 0; j < 500; j++) { var batch=[]; for (var i = 0; i < 1000; i++) { batch.push( {insertOne: { document: generateDocument() } } ); } db.mycollection.bulkWrite(batch, {ordered: false});}
Give a few minutes and review in the mongoshell, if we now look at the shard status we should see chunks distributed across both shards...
sh.status()
... we should see something similar to ...
mongos> sh.status()--- Sharding Status --- sharding version: { "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId("6101c030a98b2cc106034695") } shards: { "_id" : "mongors1", "host" : "mongors1/mongors1n1:27018,mongors1n2:27018,mongors1n3:27018", "state" : 1, "topologyTime" : Timestamp(1627504744, 1) } { "_id" : "mongors2", "host" : "mongors2/mongors2n1:27018,mongors2n2:27018,mongors2n3:27018", "state" : 1, "topologyTime" : Timestamp(1627504753, 1) } active mongoses: "5.0.1" : 2 autosplit: Currently enabled: yes balancer: Currently enabled: yes Currently running: yes Collections with active migrations: config.system.sessions started at Wed Jul 28 2021 20:44:25 GMT+0000 (UTC) Failed balancer rounds in last 5 attempts: 0 Migration results for the last 24 hours: 60 : Success databases: { "_id" : "config", "primary" : "config", "partitioned" : true } config.system.sessions shard key: { "_id" : 1 } unique: false balancing: true chunks: mongors1 965 mongors2 59 too many chunks to print, use verbose if you want to force print { "_id" : "mydatabase", "primary" : "mongors2", "partitioned" : true, "version" : { "uuid" : UUID("bc890722-00c6-4cbe-a3e1-eab9692faf93"), "timestamp" : Timestamp(1627504768, 2), "lastMod" : 1 } } mydatabase.mycollection shard key: { "lastName" : 1, "creationDate" : 1 } unique: false balancing: true chunks: mongors1 2 mongors2 3 { "lastName" : { "$minKey" : 1 }, "creationDate" : { "$minKey" : 1 } } -->> { "lastName" : "00001276", "creationDate" : ISODate("2021-07-28T20:42:00.867Z") } on : mongors1 Timestamp(2, 0) { "lastName" : "00001276", "creationDate" : ISODate("2021-07-28T20:42:00.867Z") } -->> { "lastName" : "623292c2", "creationDate" : ISODate("2021-07-28T20:42:01.046Z") } on : mongors1 Timestamp(3, 0) { "lastName" : "623292c2", "creationDate" : ISODate("2021-07-28T20:42:01.046Z") } -->> { "lastName" : "c3f2a99a", "creationDate" : ISODate("2021-07-28T20:42:06.474Z") } on : mongors2 Timestamp(3, 1) { "lastName" : "c3f2a99a", "creationDate" : ISODate("2021-07-28T20:42:06.474Z") } -->> { "lastName" : "ed75c36c", "creationDate" : ISODate("2021-07-28T20:42:03.984Z") } on : mongors2 Timestamp(1, 6) { "lastName" : "ed75c36c", "creationDate" : ISODate("2021-07-28T20:42:03.984Z") } -->> { "lastName" : { "$maxKey" : 1 }, "creationDate" : { "$maxKey" : 1 } } on : mongors2 Timestamp(2, 1)
... Here we can see evidence of balancing activites. See label "chunks" for mongors1 and mongors2. While it is balancing our test collection it is also pre-splitting and balancing a different collection for session data. I believe this is a one-time system automation.
I hope these details help. Please let me know if you have any other questions.