Skip to content
RATH
Connect Data
Best Practices

Best practices for Data Connection

Data sampling

Data sampling is a method for selecting a representative subset of data from a larger dataset. The purpose of this process is to reduce the amount of data without sacrificing the accuracy of the results. In RATH, it is recommended to use data sampling for datasets that exceed 100 MB.

For best results, reduce the number of rows in the dataset to below 100,000. Datasets with 100,000 to 1 million rows may experience some lag, and for datasets exceeding 1 million rows, data sampling is necessary.

You can select the sample size in the following way: Click on the fixed sample size button, and choose your desired sample size. Data Sampling

Alternatives to data sampling

You can use ClickHouse an MPP (Massively Parallel Processing) Database, which is optimized for processing large datasets and can provide improved performance compared to data sampling.

For other types of MPP database support, contact RATH Team for support.