Skip to content
RATH
Prepare Data
Customized Computation

Customized Computation

Customized computation is a powerful feature provided by RATH allowing you to flexibly edit your data with regular expressions.

Customized computation console

After importing the data from your selected data source, you can click on the Customized Computation button to open the console.

The Customized Computation Console has an editor located on the upper left side, where you can enter regular expressions. RATH will prompt suggestions as you input, you can confirm the suggestion by clicking on the prompt or pressing the Tab key.

If an expression is invalid, an error message will be displayed in the lower left corner. If the expression is valid, the output preview (Distribution Charts) of the column will be updated in real-time on the right. Use Customized Computation Console

Expression Syntax

Basics

Fields

Customized Computation expressions refer to a field (both original and extended) by field ID. In the editor, you can enter the name of the field, after intelligent matching to the corresponding field, select auto-completion.

Field types

The data type of the field concerned in the Customized Computation expression, the original type includes the following three types:

  • set type: an ordered number type, representing an ordinal number, not involved in mathematical operations
  • group type: discrete number type, involved in mathematical operations
  • collection type: string type

The operator of the Customized Computation expression is strongly associated with the data type, and if necessary, use the corresponding operator to transform the column (this will construct a new column instead of changing the original column).

Literals

Customized Computation expressions partially support JavaScript number and string literals.

Statement

Customized Computation expressions are composed of operators, field references, literals and operators. Operators are the core of Customized Computation's functionality. Customized Computation expressions should use nestable operators, and generate and export at least one new field. The outermost Customized Computation expression can split multiple calculation statements with commas (",") to quickly create some independent fields for the calculation process. You should not use semicolons in expressions.

DateTime object

A DateTime object is a special kind of object returned by the $toDate operator. A direct export will generate a field with a timestamp (column of type group). It can also be sliced to construct new single or multiple columns. The slicing syntax is <datetime object>.<dimension tag>. Valid dimension notations include Y, M, W, D, h, m, s.

Operator

Operators are functions that operate on fields or other objects. Operator identifiers start with a $ symbol. The calling syntax for an operator is <operator>(<parameter 1>, <parameter 2>...).

Keywords

Operator out

For a new extended column generated by a calculation, add the out keyword before the calculation statement to export it. The out operator can be placed at any level of a nested operation. **Customized Computation expressions must contain at least one **out** statement. ** You can add a word without special characters after the out keyword as the name of the exported field. In particular, For the field generated by the four arithmetic operations, its name must be explicitly declared.

Examples:

Common operators

Type Conversion

$set(group|collection) -> set

Converts a column of the type set from a column of non-set type.

$group(set|collection) -> group

Transforms a column of type group from a column of type non-group. Useful when performing mathematical operations on fields (you need to be sure that the operations make sense).

$nominal(set|group) -> collection

Transforms a column of type collection from a column of non-collection type.

Ordinal number generation

$id() -> set

Generate IDs starting from 1.

$order(set|group) -> set

Generates the mathematical order (starting at 1) of all rows on a field.

$dict(collection) -> set

Generates the lexicographic order (starting at 1) of all rows on a field.

Data Standardization

$inset(group) -> group

Standardize a field to the interval -1 ~ 1.

$bound(group) -> group

Standardize a field to the interval 0 ~ 1.

$normalize(group) -> group

Normalize a field using Z-score.

Nonlinear transformation

$log(group) -> group, $log(group, JS.number) -> group

Logarithmic mapping, the base can be provided, and the default is the natural logarithm.

$log2(group) -> group

$log10(group) -> group

$log1p(group) -> group

Equivalent to $log(group + 1).

$sigmoid(group) -> group

$ReLU(group) -> group

Data cleaning

$isNaN(set|group) -> collection

Returns "1" if the row is NaN on a field, or "0" if not.

$isZero(set|group) -> collection

Returns "1" if the line is 0 on a certain field, or "0" if not.

$zeroFill(group) -> group

Map outliers (±Infinity | NaN ) of the row on a field to 0.

$meanFill(group) -> group

Maps outliers (±Infinity|NaN) for this row on a field to the mean of non-outliers.

$nearestClip(group, JS.number, JS.number) -> group

Provide a range, and the value of the row on a certain field that exceeds this range is mapped to the nearest boundary value as an outlier.

$meanClip(group, JS.number, JS.number) -> group

Provide a range, and the value of the row on a field that is outside this range is mapped to the average value of the non-outlier values as outliers.

$boxClip(group) -> group

Use the boxplot statistics to mark outliers in a field and replace them with NaN.

String manipulation

$concat(collection, ...collection) -> collection, $concat(JS.string, collection, ...collection) -> collection

Concatenate the contents of several string columns sequentially with a delimiter (, by default) as a new column.

DateTime

$toDate(group|set|collection)

This operator introduces a special kind of DateTime object and it returns an instance of that object.

Best Practices

Normalization transformation

Nonlinear Mapping

In this case, we attempt to transform the casual column from the Bike Sharing Demo Database with nonlinear mapping. Non-leaner Mapping

Retrieve DateTime information

We can retrieve the DateTime information from the Bike Sharing Demo Database, where _c_4 is the year and _c_1 is the month.

Next, we can calculate the weekday parameter.