Skip to content
Prepare Data
Transform Your Data

Transform Your Data

Data Transformation is a process that helps you prepare your raw data for analysis and modeling. It consists of four main steps to make sure your data is accurate and reliable.

  • Data Cleaning: This step involves fixing errors, inconsistencies, and missing values in your data.

  • Data Filtering: This step lets you select only the data that is relevant to your analysis.

  • Data Transformation: This step changes the format of your data so it's easier to work with.

  • Data Sampling: This step involves selecting a smaller portion of your data to save time and resources.

By following these steps, you'll be able to work with high-quality data that will give you accurate results from your analysis and modeling.

Data wrangling with RATH

Data cleaning

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data from a dataset. Proper data cleaning can improve the quality of analysis.

Before using RATH for data cleaning, make sure your datasets have standardized data formats. Which include:

  • DateTime Data: must be standardized as YYYY-MM-DD.
  • Numerical Data: should be correct. For example, consider a dataset for supermarket sales records. The sales data should be standardized as 100 instead of $100 or 100 dollars.

To use RATH for data cleaning, simply import your data from a data source. RATH can automatically clean your data.

You can also choose an option from the Clean Method drop menu on the Data Source tab. Data Cleaning with RATH

Select one of the options that match your requirement to proceed.

Data filtering

You can also filter through your data with RATH. Move to the Meta view, and click on the "Filter" button of a certain field. Data filtering

Enable the filter and select a certain range or value set. In the above example, we are selecting the data whose temperature is between 20 to 30 degrees.

If you just want to remove the anomalies, select the Fast Selection button, and use the fast filtering feature to get the main parts of the data. You can configure more details in the following screen: Fast data filtering

Data transformation

On the Table or Meta view, select the Transforms option on a given field. RATH can automatically generate suggestions for data transformation.

For example, if you select a DateTime object, RATH will suggest you group DateTime by units of time: Transform fields in the table view

For categorical variables, RATH will suggest using the One-hot Encoding algorithm. Transform fields in the table view

If RATH detects potential anomalies in a certain field, RATH will suggest using the Isolation Forest algorithm. Transform fields in the table view

Data sampling

Data sampling is the process of selecting a representative portion of data from a larger dataset to draw inferences about the overall population. It enables efficient and effective exploration and analysis, reducing the amount of data to be processed while providing accurate insights.

For more details about data sampling, refer to the related sections in the Connect your data chapter.