Customized Computation
Customized computation is a powerful feature provided by RATH allowing you to flexibly edit your data with regular expressions.
Customized computation console
After importing the data from your selected data source, you can click on the Customized Computation button to open the console.
The Customized Computation Console has an editor located on the upper left side, where you can enter regular expressions. RATH will prompt suggestions as you input, you can confirm the suggestion by clicking on the prompt or pressing the Tab key.
If an expression is invalid, an error message will be displayed in the lower left corner. If the expression is valid, the output preview (Distribution Charts) of the column will be updated in real-time on the right.
Expression Syntax
Basics
Fields
Customized Computation expressions refer to a field (both original and extended) by field ID. In the editor, you can enter the name of the field, after intelligent matching to the corresponding field, select auto-completion.
Field types
The data type of the field concerned in the Customized Computation expression, the original type includes the following three types:
set
type: an ordered number type, representing an ordinal number, not involved in mathematical operationsgroup
type: discrete number type, involved in mathematical operationscollection
type: string type
The operator of the Customized Computation expression is strongly associated with the data type, and if necessary, use the corresponding operator to transform the column (this will construct a new column instead of changing the original column).
Literals
Customized Computation expressions partially support JavaScript number and string literals.
Statement
Customized Computation expressions are composed of operators, field references, literals and operators. Operators are the core of Customized Computation's functionality. Customized Computation expressions should use nestable operators, and generate and export at least one new field. The outermost Customized Computation expression can split multiple calculation statements with commas (",") to quickly create some independent fields for the calculation process. You should not use semicolons in expressions.
DateTime object
A DateTime object is a special kind of object returned by the $toDate
operator.
A direct export will generate a field with a timestamp (column of type group
).
It can also be sliced to construct new single or multiple columns.
The slicing syntax is <datetime object>.<dimension tag>
.
Valid dimension notations include Y
, M
, W
, D
, h
, m
, s
.
Operator
Operators are functions that operate on fields or other objects.
Operator identifiers start with a $
symbol.
The calling syntax for an operator is <operator>(<parameter 1>, <parameter 2>...)
.
Keywords
Operator out
For a new extended column generated by a calculation, add the out
keyword before the calculation statement to export it.
The out
operator can be placed at any level of a nested operation.
**Customized Computation expressions must contain at least one **out**
statement. **
You can add a word without special characters after the out
keyword as the name of the exported field.
In particular, For the field generated by the four arithmetic operations, its name must be explicitly declared.
Examples:
Common operators
Type Conversion
$set(group|collection) -> set
Converts a column of the type set from a column of non-set type.
$group(set|collection) -> group
Transforms a column of type group from a column of type non-group. Useful when performing mathematical operations on fields (you need to be sure that the operations make sense).
$nominal(set|group) -> collection
Transforms a column of type collection from a column of non-collection type.
Ordinal number generation
$id() -> set
Generate IDs starting from 1.
$order(set|group) -> set
Generates the mathematical order (starting at 1) of all rows on a field.
$dict(collection) -> set
Generates the lexicographic order (starting at 1) of all rows on a field.
Data Standardization
$inset(group) -> group
Standardize a field to the interval -1 ~ 1.
$bound(group) -> group
Standardize a field to the interval 0 ~ 1.
$normalize(group) -> group
Normalize a field using Z-score.
Nonlinear transformation
$log(group) -> group
, $log(group, JS.number) -> group
Logarithmic mapping, the base can be provided, and the default is the natural logarithm.
$log2(group) -> group
$log10(group) -> group
$log1p(group) -> group
Equivalent to $log(group + 1)
.
$sigmoid(group) -> group
$ReLU(group) -> group
Data cleaning
$isNaN(set|group) -> collection
Returns "1" if the row is NaN on a field, or "0" if not.
$isZero(set|group) -> collection
Returns "1" if the line is 0 on a certain field, or "0" if not.
$zeroFill(group) -> group
Map outliers (±Infinity | NaN ) of the row on a field to 0.
$meanFill(group) -> group
Maps outliers (±Infinity|NaN) for this row on a field to the mean of non-outliers.
$nearestClip(group, JS.number, JS.number) -> group
Provide a range, and the value of the row on a certain field that exceeds this range is mapped to the nearest boundary value as an outlier.
$meanClip(group, JS.number, JS.number) -> group
Provide a range, and the value of the row on a field that is outside this range is mapped to the average value of the non-outlier values as outliers.
$boxClip(group) -> group
Use the boxplot statistics to mark outliers in a field and replace them with NaN.
String manipulation
$concat(collection, ...collection) -> collection
, $concat(JS.string, collection, ...collection) -> collection
Concatenate the contents of several string columns sequentially with a delimiter (,
by default) as a new column.
DateTime
$toDate(group|set|collection)
This operator introduces a special kind of DateTime object and it returns an instance of that object.
Best Practices
Normalization transformation
Nonlinear Mapping
In this case, we attempt to transform the casual
column from the Bike Sharing Demo Database with nonlinear mapping.
Retrieve DateTime information
We can retrieve the DateTime information from the Bike Sharing Demo Database, where _c_4
is the year and _c_1
is the month.
Next, we can calculate the weekday
parameter.