Connect to Your Data
Before getting started with your Exploratory Data Analysis journey with RATH, you need to connect RATH to your data source and import data.
Currently, RATH supports the following data types:
Connect to a data source
Use a local file
RATH supports common data files such as Excel Workbook, CSV and JSON text files. For files larger than 100MB, follow the best practices of Data Sampling.
The following video explains the detailed steps:
Connect to a remote database
RATH allows you to connect to your remote database, such as ClickHouse, and import your data for Exploratory Data Analysis.
Connect to ClickHouse
Connect to Snowflake
RATH is compatible with the following types of remote databases:
If you are looking for support for additional databases, reach out to the RATH support team through the contact support page or submit an issue (opens in a new tab) on GitHub.
In RATH, the concept of data source and data engine aren't a clear-cut.
For example, we can connect RATH to a remote ClickHouse database. In this case, RATH functions as the data engine, and utilize ClickHouse as the data source.
For processing large volumes of data that exceed RATH's computational capacity, you can use ClickHouse Clusters as the data engine. In this scenario, RATH functions as the data source.
Use a demo dataset
For a quick walkthrough of RATH features, you may pick one of the available demo datasets.
Use a cloud dataset
Once you have finished data wraggling, created some interesting visualizations and saved them to collections, you can upload the dataset to the Cloud. You can access the modified dataset wherever and whenever you want.
On the Data Source tab, click on the Save to Cloud button and upload your data source to the Cloud.
You can customize which part of the dataset you want to upload to the cloud, for example, Raw Data, Meta Data, Visualization Collections, Casual Model, etc.
You can access the saved dataset by clicking on the Cloud button on the Data Connections tab, and log into your account.
Connect to AirTable
If your data is stored in AirTable, RATH can connect to the AirTable account and import the data directly. Simply select the AirTable option on the Data Connections tab, and connect to your AirTable account.
Connect to OLAP
If your data is stored in an OLAP cube, RATH can connect to the cube and import the data directly. On the Data Connections tab, select the OLAP option and connect to an OLAP cube.
OLAP option is not enabled by default. Contact RATH Team for support.
Connect via API
RATH also provides support for importing data through APIs. If you need assistance integrating the RATH API, please reach out to the RATH team for further support.
Data sampling is a method for selecting a representative subset of data from a larger dataset. The purpose of this process is to reduce the amount of data without sacrificing the accuracy of the results. In RATH, it is recommended to use data sampling for datasets that exceed 100 MB.
For best results, reduce the number of rows in the dataset to below 100,000. Datasets with 100,000 to 1 million rows may experience some lag, and for datasets exceeding 1 million rows, data sampling is necessary.
You can select the sample size in the following way: Click on the fixed sample size button, and choose your desired sample size.
Alternatives to data sampling
You can use ClickHouse an MPP (Massively Parallel Processing) Database, which is optimized for processing large datasets and can provide improved performance compared to data sampling.
For other types of MPP database support, contact RATH Team for support.
- Data Profiling and Data Transformation
- Extract Text Patterns from your data soruce
- Generate Automated Data Insight