Subsetting

Key benefits of using
subsetting

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua.

Reduce infrastructure
and computational costs

Excessive data volumes can lead to high infrastructure and computation costs, which are unnecessary for test data in non-production environments. With subsetting capabilities, you can easily create smaller subsets of your data to reduce your costs.

Manageable test data by
testers and developers

Managing huge data volumes in non-production environments poses challenges for testers and developers. Smaller and thereby more manageable test data, significantly streamlining testing and development processes, ultimately optimizing the entire cycle in terms of time and resources.

Faster test data setup
and maintenance

Smaller data volumes facilitate faster and more straightforward setup and maintenance of non-production test environments. This is particularly relevant in complex IT landscapes and when frequent changes in data structures require regular updates and refreshes to ensure the representativeness of test data.

User documentation

Explore the Syntho user documentation

Learn more

Why subsetting is more
advanced

Subsetting is not as simple as
“just deleting data”

Subsetting is not as easy as simply deleting data, as all downstream and upstream related linked tables should be subsetting proportionally to preserve referential integrity.

Subsetting ensures that not only data in a target table is deleted, but also that any data in any other linked table related to the deleted data from the target table is deleted.

This ensures that referential integrity across tables, databases and systems is preserved as part of data deletion.

Reducing the data volume by removing “Person X” from “Table Y”, all records related to “Person X” in “Table Y” should be deleted, but also all records related to “Person X” in any other upstream or downstream related table (table A, B, C etc.) should also be deleted.

Reducing the data volume by removing “Richard” from the “Customers” table, all records related to “Richard” in the “Customer” table should be deleted, but also all records related to “Richard” in any other upstream or downstream related table (Payment table, Incidents table, Insurance Coverage Table etc.) should also be deleted.

Across tables

Subsetting works across tables

Across databases

Subsetting works across databases

Across Systems

Subsetting works across systems

Product Demo

Create synthetic data that enhances the volume and diversity of your data

Proportional subsetting

You can configure the Syntho Engine to subset a relational database and to ensure that all “linked tables” are subsetted based on the “Target Table”.

Target table

These are all directly or indirectly connected tables to the “Target Table”. Links between tables may be direct, such as a target table listing allergies that reference a patient’s table through a foreign key relationship, or indirect, such as a target table referencing a patient’s table, which in turn references a hospital’s table.

Linked tables

Subsetting based on business rules

In addition to proportional subsetting, where you specify a percentage for data extraction, our advanced capabilities allow you to precisely define the target group for subsetting. For instance, you can specify criteria to include or exclude specific subsets, providing greater flexibility and control over the data extraction process

Customers younger than 60 years and older than 30 years and
Als Male customers

Trusted by enterprise companies

Mimic (sensitive) data with AI to generate synthetic data twins

Case studies

Synthetic data for the National Statistical Office, Statistics Netherlands (CBS)

Empower CBS’s statistical excellence with secure synthetic data solutions and learn how they are shaping the future of statistical

Synthetic test and development data with a leading EMR and healthcare solutions

Case Study About the client The company specializes in developing and supporting a proprietary electronic medical record (EMR) software

Synthetic data for academic research at the Erasmus University

Revolutionize academic research at Erasmus University with synthetic data. Explore its power by reading our case study.

Synthetic data for the The Netherlands Chamber of Commerce (KVK)

Discover how synthetic data for a Dutch governmental organization enables fast, secure, and actionable initiatives.

Synthetic data for advanced analytics and testing with a leading international bank

Unlock the potential of synthetic data for AI/ML modeling, advanced analytics, and testing with a renowned International Dutch Bank.

Synthetic test and development data with a leading Dutch insurance company

Explore the innovative world of synthetic test and development data in collaboration with a prominent Dutch insurance company.

Synthetic data for software development and testing with a leading Dutch Bank

Check out how synthetic data for software development and testing can help solving privacy issues of a leading Dutch Bank.

Synthetic patient EHR data for advanced analytics with Erasmus MC

The company specializes in developing and supporting a proprietary electronic medical record (EMR) software application widely recognized

Synthetic data generation for data sharing with Lifelines

Are you curious how realistic are synthetic biobank data generation for data sharing? Learn more about it from our case study with a

Synthetic healthcare data for a leading US hospital

Are you curious how works synthetic healthcare data with a leading US hospital? Learn more about it from our case study

Case studies

Frequently Asked Questions

What is subsetting?

Many organizations have production environments with massive amounts of data and do not want massive amounts of data in non-production test environments. Hence, database subsetting is used to create a smaller, representative subset of a larger relational database with preserved referential integrity. Organizations utilize sub-setting for test data to reduce costs, to make it manageable and for faster setup and maintenance.

What is referential integrity and why is it important?

Referential integrity is a concept in database management that ensures consistency and accuracy between tables in a relational database. Referential integrity would ensure that every value that corresponds to “Person 1” of “Table 1” corresponds to the correct value of “person 1” in “Table 2” and any other linked table.

Enforcing referential integrity is crucial for maintaining the reliability of test data in a relational database as part of non-production environments. It prevents data inconsistencies and ensures that relationships between tables are meaningful and reliable for proper testing and software development.

Test data in a relational database environment should preserve referential integrity to be usable.

View all FAQ’s

Build better and faster with synthetic data today

Unlock data access, accelerate development, and enhance data privacy.

Book a demo Contact Us

Join our newsletter

Keep up to date with synthetic data news

Subsetting

Key benefits of using subsetting

User documentation

Why subsetting is more advanced

Subsetting is not as simple as “just deleting data”

Across tables

Across databases

Across Systems

Subsetting

Proportional subsetting

Target table

Linked tables

Subsetting based on business rules

Trusted by enterprise companies

Frequently Asked Questions

Build better and faster with synthetic data today

Join our newsletter

Key benefits of using
subsetting

Why subsetting is more
advanced

Subsetting is not as simple as
“just deleting data”