MongoDB Schema for ecommerce products that have variations with different SKUs, prices and stock MongoDB Schema for ecommerce products that have variations with different SKUs, prices and stock mongodb mongodb

MongoDB Schema for ecommerce products that have variations with different SKUs, prices and stock


Data modelling of Product information is more an art than a science.

It is very common to define Products as the entities sales thinks about. Let's say a car model or a cable. E.g. a "cat 5e Ethernet cable".

Such a product has attributes and dimensions. Attributes might be

  • standard / norm (e.g. EIA/TIA-586)
  • Manufacturer (Kabelwerk Eupen)
  • Number of Wires (8)
  • Packaging (Plastic Bag)
  • Tags (Network, Ethernet, Cabling, Infrastructure)
  • RHoS (Compliant)
  • etc.

Attributes tend to vary between industries and even between different product categories in the same company.

Dimensions distinguish between different variants of a Product. One or more Dimensions can define a concrete Product. Typical Dimensions are size and colour. For cables, we might have:

  • size / length (0.5, 1, 2, 5, 10 meters)
  • colour (green, red, blue)
  • flame retardant (yes, no)

So Products are the concept of one of your merchandise. In a paper catalogue a Product usually is described as a single thing (maybe on a single page). E.g. a jacket available in blue and brown in sizes S, M, L and XL.

What defines a single Product is somewhat fuzzy. The blue and green sneaker might be the same Product, but the orange and the golden might not be seen as the same product.

Before E-Commerce, we tended to expect the same price for all Dimensions of a Product - not long ago, people were scandalized if a size 8 shoe would be more expensive than a size 9 shoe.

Along some dimensions - colour mostly - users usually except pictures. Along other dimensions - e.g. shoe size - usually there is no expectation of specific pictures.

With some products the manufacturer might be considered an Dimension (cables), for others it might be considered irrelevant (cable ties) and for others two identical looking goods from different manufacturers might be considered completely different Products (e.g. sneakers).

The other concept are SKUs - the stock keeping units are the stuff which is actually in the warehouse. Usually per Product we have Dimensions multiplied with each other SKUs. So 5 sizes x 3 colours x 2 fire retardant variants - so there could be 30 SKUs. Usually each SKU has a distinct GTIN/UPC/EAN ("Barcode" 4423456789012).

Keeping SKUs and Products separate is best practice because they are related to different concerns: Products are of importance for marketing and sales. SKUs concern auditing, bookkeeping and logistics. Products represent concepts, SKUs represent physical goods. Amount of stock usually should be kept in or near the SKU - because on large commerce applications it might get updated several times per second. I would never design a system where transaction data - amount of stock - is mixed up with master data - product description, etc.

Pricing information has been historically attached to the product because product and price data is somewhat static but dynamic pricing might change that.

You seem to be asking for a Product Database. Schemaless Databases work nicely for this because it is very hard to anticipate all needed dimensions for the next few years. Normalizing the whole thing for a Relational Database can certainly be done, but tents to result in less than elegant, and slowish code.

{  name: "Cat 5e Cable",  …  dimensions: {    color: {       title: "Color",        red: {          title: "Red",          images: [            "http://myserver.com/images/product_12345_3",            "http://myserver.com/images/product_12345_4",          ],        },        green: { … }    },    size: {      title: "Size"      s05: {        title: "0.5 m",        images: [],      },      s1: {...},    fireretardant:                 title: "Size"      yes: {        title: "fire retardant",        images: [],      },      no: {        title: "not fire retardant",        images: [],      }  }  // here you need a stable way of generating keys from dimension data      variations: [    {      dimensions: {color: red, size: s1, fireretardant: no}      SKU: "98765"      price: 10,    },    {      dimensions: {color: red, size: s1, fireretardant: yes}      SKU: "98765"      price: 10,    },  },  …]

I have implemented applications with such a schema. Usually you want to limit available dimensions and valid values per dimension in the Admin GUI so staff does not come up with obscure new dimensions all the time. But this should be an administrative restriction, not one of the underlying schema.

Non-existent combinations ("fire retardant is only available in green and not in 0.5 m), conflicting instructions ("make all 5 m cables 10 € and all red ones 8 €"), differing and inconsistent ideas what e.g. needs a image, what images should be shared between Dimensions, inconsistent definitions, what considered a separate product ("USB C) or just a Variant ("3.5 mm or 5.5 mm headphone jack"), translation and conversion (don't get me started with shoe sizes) makes real life database design and maintenance interesting …this is what the so-called "domain knowledge" is about. You need to involve a shoe salesman to design a good shoe database.