ExpectColumnValuesToMatchRegex

Type

transform

Class

fire.nodes.ge.NodeExpectColumnValuesToMatchRegex

Fields

Name

Title

Description

cols

Column Name

The column name.

regex

Regex

regex to match

mostly

Mostly

Mostly value is between 0 and 1, and evaluates it as a percentage and as long as mostly percent of rows evaluate to True, the expectation returns “success”: True.

Details

Expect Column Values To Match Regex Details

This feature enables validation of column values in a DataFrame to ensure they match a specified regular expression (regex) pattern. It is useful for checking that values in a column adhere to a particular format or structure, such as an email or phone number format.

Input

Column Name: Select the column that needs to be validated. The selected column should be of a type compatible with the regex pattern.

Regex: Enter the regular expression pattern that the column values should match.

Mostly: Specifies the minimum percentage (0.0 - 1.0) of rows that must meet the condition for the validation to pass.

Output

A DataFrame with validation results, showing whether each row’s value in the specified column matches the regex pattern.

This validation result can be used to identify rows that do not conform to the expected format for further review or correction.

Example: If a column named “Email” is expected to contain only valid email addresses, set the Regex field to a pattern like ^[w.-]+@[w.-]+.w{2,4}$. This configuration ensures that any invalid email addresses in the “Email” column will be flagged for further inspection.

Examples

If an “ID” column is expected to contain only numbers with exactly 5 digits, setting the Regex field to ^d{5}$ would result in the following outcomes for a sample DataFrame:

ID: 12345 - Pass (matches regex)

ID: 1234A - Fail (does not match regex)

ID: 67890 - Pass (matches regex)

ID: 5432 - Fail (does not match regex)

This setup helps ensure that only values matching the specified 5-digit format are present in the “ID” column.