First time with MongoDB + Docker - Set up from docker compose First time with MongoDB + Docker - Set up from docker compose mongodb mongodb

First time with MongoDB + Docker - Set up from docker compose


So here is an attempt at helping.. For the most part, the docker compose yaml file is pretty close, with exception of some minor port and binding parameters. There is an expectation that initialization will be additional commands. Example:

  1. docker-compose up the environment
  2. run some scripts to init the environment

... but this was already part of the original post.

So here is a docker compose file

docker-compose.yml

version: '3'services: # mongo config server  mongocfg1:    container_name: mongocfg1    hostname: mongocfg1    image: mongo    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27019 --bind_ip_all    volumes:      - ~/mongo_cluster/config1:/data/db  mongocfg2:    container_name: mongocfg2    hostname: mongocfg2    image: mongo    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27019 --bind_ip_all    volumes:      - ~/mongo_cluster/config2:/data/db  mongocfg3:    container_name: mongocfg3    hostname: mongocfg3    image: mongo    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27019 --bind_ip_all    volumes:      - ~/mongo_cluster/config3:/data/db# replica set 1  mongors1n1:    container_name: mongors1n1    hostname: mongors1n1    image: mongo    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27018 --bind_ip_all    volumes:      - ~/mongo_cluster/data1:/data/db  mongors1n2:    container_name: mongors1n2    hostname: mongors1n2    image: mongo    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27018 --bind_ip_all    volumes:      - ~/mongo_cluster/data2:/data/db  mongors1n3:    container_name: mongors1n3    hostname: mongors1n3    image: mongo    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27018 --bind_ip_all    volumes:      - ~/mongo_cluster/data3:/data/db# replica set 2  mongors2n1:    container_name: mongors2n1    hostname: mongors2n1    image: mongo    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27018 --bind_ip_all    volumes:      - ~/mongo_cluster/data4:/data/db  mongors2n2:    container_name: mongors2n2    hostname: mongors2n2    image: mongo    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27018 --bind_ip_all    volumes:      - ~/mongo_cluster/data5:/data/db  mongors2n3:    container_name: mongors2n3    hostname: mongors2n3    image: mongo    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27018 --bind_ip_all    volumes:      - ~/mongo_cluster/data6:/data/db# mongos router  mongos1:    container_name: mongos1    hostname: mongos1    image: mongo    depends_on:      - mongocfg1      - mongocfg2    command: mongos --configdb mongors1conf/mongocfg1:27019,mongocfg2:27019,mongocfg3:27019 --port 27017 --bind_ip_all    ports:      - 27017:27017  mongos2:    container_name: mongos2    hostname: mongos2    image: mongo    depends_on:      - mongocfg1      - mongocfg2    command: mongos --configdb mongors1conf/mongocfg1:27019,mongocfg2:27019,mongocfg3:27019 --port 27017 --bind_ip_all    ports:      - 27016:27017

... and some scripts to finalize the initialization...

docker-compose up -d

... Give it a few seconds to wind up, then issue...

# Init the replica sets (use the MONGOS host)docker exec -it mongos1 bash -c "echo 'rs.initiate({_id: \"mongors1conf\",configsvr: true, members: [{ _id : 0, host : \"mongocfg1:27019\", priority: 2 },{ _id : 1, host : \"mongocfg2:27019\" }, { _id : 2, host : \"mongocfg3:27019\" }]})' | mongo --host mongocfg1:27019"docker exec -it mongos1 bash -c "echo 'rs.initiate({_id : \"mongors1\", members: [{ _id : 0, host : \"mongors1n1:27018\", priority: 2 },{ _id : 1, host : \"mongors1n2:27018\" },{ _id : 2, host : \"mongors1n3:27018\" }]})' | mongo --host mongors1n1:27018"docker exec -it mongos1 bash -c "echo 'rs.initiate({_id : \"mongors2\", members: [{ _id : 0, host : \"mongors2n1:27018\", priority: 2 },{ _id : 1, host : \"mongors2n2:27018\" },{ _id : 2, host : \"mongors2n3:27018\" }]})' | mongo --host mongors2n1:27018"

... again, give 10-15 seconds to allow the system to adjust to recent commands...

# ADD TWO SHARDS (mongors1, and mongors2)docker exec -it mongos1 bash -c "echo 'sh.addShard(\"mongors1/mongors1n1:27018,mongors1n2:27018,mongors1n2:27018\")' | mongo"docker exec -it mongos1 bash -c "echo 'sh.addShard(\"mongors2/mongors2n1:27018,mongors2n2:27018,mongors2n3:27018\")' | mongo"

Now, try to connect to the mongos from the host with docker running (assumes you have mongo shell installed on this host). Use 2 mongos hosts as the seed list.

mongo --host "localhost:27017,localhost:27016"

Comments

Notice how the priority for node0 is set to a priority of 2 in the init() call?

Notice how the config servers are all port 27019 - this follows recommendations by MongoDB.

Notice how the shard servers are all port 27018 - again, following mongo recommendations.

The mongos expose 2 ports 27017 (the natural port for MongoDB) and also port 27016 (a secondary mongos for high availability).

The config servers and the shard servers do not expose their respective ports - for security reasons. Should be using the mongos to get to the data. If need to have these ports open for administrative purposes simply add to the docker compose file.

The replica-set intercommunication is not using authentication. This is a security no-no. Need to decide which auth mechanism is best for your scenario - can use keyfile (just a text file that is identical among the members of the replica set) or x509 certs. If going with x509 then you need to include the CA.cert in each docker container for reference along with the individual cert per server with proper host name alignment. Would need to add the startup configuration item for the mongod processes to use whichever auth method was selected.

Logging is not specified. It probably makes sense to set the logging output of the mongod and mongos to the default location of /var/log/mongodb/mongod.log and /var/log/mongodb/mongos.log for these. Without specifying a logging strategy I believe mongo logs to standard out, which is suppressed if running docker-compose up -d.

Superuser: No users are yet created on the system. Usually for every replica set I stand up before adding it to a sharded cluster I like to add a super user account - one having root access - so if I need to make administrative changes at the replica set level I can. With the docker-compose approach you can create a super user from the mongos perspective and perform most all operations needed on a sharded cluster, but still - I like having the replica set user available.

OS tunables - Mongo likes to take up all the system resources. For a shared ecosystem where one physical host is hosting a bunch of mongo processes, you might want to consider specifying the wiredTiger cache size, etc. WiredTiger by default wants (System Memory Size - 1 GB) / 2. Also, you would benefit from setting ulimits to proper values - i.e., 64000 file handles per user is a good start - mongo potentially likes to use a lot of files. Also, filesystem should be mounted somewhere having xfs. This strategy is using the host system users home directory for database data directories. A more thoughtful approach could be welcomed here.

Anything else?

I am sure I am missing something. If you have any questions, please leave a comment and I will reply.

Update 1

The above docker-compose.yml file was missing the hostname attribute for some of the hosts, and this was causing balancer issues, so I have edited the docker-compose.yml to include hostname on all hosts.

Also, the addShard() method only referred to one host of the replica set. For completeness I added the other hosts to the addShard() method described above.

Following these steps will result in a brand new sharded cluster, but there are no user databases yet. As such, no user databases are sharded. So let's take a moment to add a database and shard it, then view the shard distributions (A.K.A., balancer results).

We must connect to the database via the mongos (as described above). This example assumes the use of the mongo shell.

mongo --host "localhost:27017,localhost:27016"

Databases in Mongo can be created a variety of ways. While there is no explicit database create command, there is an explicit create collection command (db.createCollection()). We must first set the database context using a 'use ' command...

use mydatabasedb.createCollection("mycollection")

... but rather than use this command we can create a database and collection by creating an index on a non-existing collection. (If you already created the collection, no worries, this next command should still be successful).

use mydatabasedb.mycollection.createIndex({lastName: 1, creationDate: 1})

In this example, I created a compound index on two fields...

  • lastName
  • creationDate

... on a collection that does not yet exist, on a database that does not yet exist. Once I issue this command, both the database and the collection will be created. Furthermore, I now have the basis for a shard key - the key to which sharding distribution will be based. This shard key will be based on this new index having these two fields.

Shard the database

Assuming I have issued the createIndex command, I can now turn on sharding at the database and issue the shardCollection command...

sh.enableSharding("mydatabase")sh.shardCollection("mydatabase.mycollection", { "lastName": 1, "creationDate": 1})

Notice how the command 'shardCollection()' refers to our indexed fields created earlier? Assuming sharding has been successfully applied, we can now view the distribution of data by issuing the sh.status() command

sh.status()

Example of output: (new collection, no data yet, thus no real distribution of data - need to insert more than 64MB of data such that there is more than one chunk to distribute)

mongos> sh.status()--- Sharding Status ---   sharding version: {    "_id" : 1,    "minCompatibleVersion" : 5,    "currentVersion" : 6,    "clusterId" : ObjectId("6101c030a98b2cc106034695")  }  shards:        {  "_id" : "mongors1",  "host" : "mongors1/mongors1n1:27018,mongors1n2:27018,mongors1n3:27018",  "state" : 1,  "topologyTime" : Timestamp(1627504744, 1) }        {  "_id" : "mongors2",  "host" : "mongors2/mongors2n1:27018,mongors2n2:27018,mongors2n3:27018",  "state" : 1,  "topologyTime" : Timestamp(1627504753, 1) }  active mongoses:        "5.0.1" : 2  autosplit:        Currently enabled: yes  balancer:        Currently enabled: yes        Currently running: no        Failed balancer rounds in last 5 attempts: 0        Migration results for the last 24 hours:                 No recent migrations  databases:        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }        {  "_id" : "mydatabase",  "primary" : "mongors2",  "partitioned" : true,  "version" : {  "uuid" : UUID("bc890722-00c6-4cbe-a3e1-eab9692faf93"),  "timestamp" : Timestamp(1627504768, 2),  "lastMod" : 1 } }                mydatabase.mycollection                        shard key: { "lastName" : 1, "creationDate" : 1 }                        unique: false                        balancing: true                        chunks:                                mongors2    1                        { "lastName" : { "$minKey" : 1 }, "creationDate" : { "$minKey" : 1 } } -->> { "lastName" : { "$maxKey" : 1 }, "creationDate" : { "$maxKey" : 1 } } on : mongors2 Timestamp(1, 0) 

Insert some data

To test out the sharding we can add some test data. Again, we want to distribute by lastName, and creationDate.

In mongoshell we can run javascript. Here is a script that will create test records such that data will be split and balanced. This will create 500,000 fake records. We need more than 64MB of data to create another chunk to balance. 500,000 records will make approx. 5 chunks. This takes a couple of minutes to run and complete.

use mydatabasefunction randomInteger(min, max) {    return Math.floor(Math.random() * (max - min) + min);} function randomAlphaNumeric(length) {  var result = [];  var characters = 'abcdef0123456789';  var charactersLength = characters.length;  for ( var i = 0; i < length; i++ ) {    result.push(characters.charAt(Math.floor(Math.random() * charactersLength)));  }  return result.join('');}function generateDocument() {  return {    lastName: randomAlphaNumeric(8),    creationDate: new Date(),    stringFixedLength: randomAlphaNumeric(8),    stringVariableLength: randomAlphaNumeric(randomInteger(5, 50)),    integer1: NumberInt(randomInteger(0, 2000000)),    long1: NumberLong(randomInteger(0, 100000000)),    date1: new Date(),    guid1: new UUID()  };}for (var j = 0; j < 500; j++) {  var batch=[];  for (var i = 0; i < 1000; i++) {    batch.push(      {insertOne: {          document: generateDocument()         }       }    );  }    db.mycollection.bulkWrite(batch, {ordered: false});}

Give a few minutes and review in the mongoshell, if we now look at the shard status we should see chunks distributed across both shards...

sh.status()

... we should see something similar to ...

mongos> sh.status()--- Sharding Status ---   sharding version: {    "_id" : 1,    "minCompatibleVersion" : 5,    "currentVersion" : 6,    "clusterId" : ObjectId("6101c030a98b2cc106034695")  }  shards:        {  "_id" : "mongors1",  "host" : "mongors1/mongors1n1:27018,mongors1n2:27018,mongors1n3:27018",  "state" : 1,  "topologyTime" : Timestamp(1627504744, 1) }        {  "_id" : "mongors2",  "host" : "mongors2/mongors2n1:27018,mongors2n2:27018,mongors2n3:27018",  "state" : 1,  "topologyTime" : Timestamp(1627504753, 1) }  active mongoses:        "5.0.1" : 2  autosplit:        Currently enabled: yes  balancer:        Currently enabled: yes        Currently running: yes        Collections with active migrations:                 config.system.sessions started at Wed Jul 28 2021 20:44:25 GMT+0000 (UTC)        Failed balancer rounds in last 5 attempts: 0        Migration results for the last 24 hours:                 60 : Success  databases:        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }                config.system.sessions                        shard key: { "_id" : 1 }                        unique: false                        balancing: true                        chunks:                                mongors1    965                                mongors2    59                        too many chunks to print, use verbose if you want to force print        {  "_id" : "mydatabase",  "primary" : "mongors2",  "partitioned" : true,  "version" : {  "uuid" : UUID("bc890722-00c6-4cbe-a3e1-eab9692faf93"),  "timestamp" : Timestamp(1627504768, 2),  "lastMod" : 1 } }                mydatabase.mycollection                        shard key: { "lastName" : 1, "creationDate" : 1 }                        unique: false                        balancing: true                        chunks:                                mongors1    2                                mongors2    3                        { "lastName" : { "$minKey" : 1 }, "creationDate" : { "$minKey" : 1 } } -->> {                            "lastName" : "00001276",                            "creationDate" : ISODate("2021-07-28T20:42:00.867Z")                        } on : mongors1 Timestamp(2, 0)                         {                            "lastName" : "00001276",                            "creationDate" : ISODate("2021-07-28T20:42:00.867Z")                        } -->> {                            "lastName" : "623292c2",                            "creationDate" : ISODate("2021-07-28T20:42:01.046Z")                        } on : mongors1 Timestamp(3, 0)                         {                            "lastName" : "623292c2",                            "creationDate" : ISODate("2021-07-28T20:42:01.046Z")                        } -->> {                            "lastName" : "c3f2a99a",                            "creationDate" : ISODate("2021-07-28T20:42:06.474Z")                        } on : mongors2 Timestamp(3, 1)                         {                            "lastName" : "c3f2a99a",                            "creationDate" : ISODate("2021-07-28T20:42:06.474Z")                        } -->> {                            "lastName" : "ed75c36c",                            "creationDate" : ISODate("2021-07-28T20:42:03.984Z")                        } on : mongors2 Timestamp(1, 6)                         {                            "lastName" : "ed75c36c",                            "creationDate" : ISODate("2021-07-28T20:42:03.984Z")                        } -->> { "lastName" : { "$maxKey" : 1 }, "creationDate" : { "$maxKey" : 1 } } on : mongors2 Timestamp(2, 1) 

... Here we can see evidence of balancing activites. See label "chunks" for mongors1 and mongors2. While it is balancing our test collection it is also pre-splitting and balancing a different collection for session data. I believe this is a one-time system automation.

I hope these details help. Please let me know if you have any other questions.