Create a Test Database copy of Production with only a sample set of Data Create a Test Database copy of Production with only a sample set of Data database database

Create a Test Database copy of Production with only a sample set of Data


If you have enough space, I would suggest copying the entire production data to test. It will be a lot easier to administer, it could also be a good oportunity to test your backup (restore from backup to a new instance).

From a developer point of view, you won't be able to test the performance of your application reliably without a representative set of data. This data set should have the same properties as the production data (data volume, physical distribution...). The easiest way to achieve this is to have the same data in test as in production.

If you can afford downtime you could stop the production db, copy the file to the test server and mount both databases. If you can't afford downtime it might be a good idea to pickup some DBA skills (and eventually learn about hot backup then restore to a new instance).


Update: if physically copying your database is not feasible, you should look into bulk copying data with expdp and impdp (or the old exp/imp). You can either copy all schemas or filter the data on export. You would choose appropriate WHERE clause by hand in this case. Exporting and importing in bulk will be orders of magnitude faster than copying data row by row.


i just want to point out the industry scenario and products .. not a solution but an input for reference as @aintnoprophet mentioned that

"I am not a DBA so I don't know much of that realm. I tried to Google some info about Database Links but couldn't find very good examples."

This use case is related to test data management systems. This concept deals with the fact that we need to have a way to manage test data, that goes into those specific test cases for some specific test scenarios.It will be very difficult to manually create the subset of data from large amounts of data, you significantly increase the chance of human error.So repeated data sub-setting for multiple test iterations quickly becomes unmanageable.

there are many use cases, especially in enterprise examples -

http://www.compuware.com/resources/mainframe/Fileaid_clientserver.pdf

http://www.erphome.net/wdb/upload/forum1_f_8007_1227768589.pdf

these are for huge enterprise data.

there is an open source tdm tool http://sourceforge.net/projects/opentdm/ but not in java (i m not sure whether its good or not)

if you are looking for products (jut for reference) some among are


A test databases that are full copies of production will work much better than a subset for flushing out bugs. Most people subset because they don't have the time and resources to make full copies but all that has changed with virtual data aka copy data management. In such systems you can ingest one copy of production and make thin clone copies in minutes for almost no extra storage. This systems also generally compress the the data so the ingest copy is 1/3 size of the original and have the advantage of fast refresh in minutes which is great in destructive testing. They also tend to come integrated with masking since we probably don't want sensitive data coming out of the production environment into dev and test. There are a number of vendors in this area. Check out Netapp SMO, Actifio, Oracle Snap Clone and Delphix for starters. Most of these require some specialized hardware which can be an obstacle. Delphix is fully software and even has a free version called Delphix Express that you can download. Similar technology is being use by Rubrick and Cohesity for VM images.

Best Wishes Kyle Hailey