Sparkflows
01-Data-Connectors
01-Input
01-Database
Execute BigQuery
Execute CRUD Stmt In Snowflake
Execute Query In Snowflake
Execute Stored Procedure In Snowflake
Hive Incremental
Read JDBC
JDBC Incremental Load
Netsuite AutoIncrement
Read Netsuite
AutoIncrement
DB Incremental Ingestion
Read JDBC
Query JDBC
Read Incorta
Read BigQuery
Read Cassandra
Read Databricks Table
Read DynamoDB
Read Elastic Search
Read From Snowflake
Read HIVE Table
Read Iceberg
Read MongoDB
02-Structured-Files
Read Avro
Create Dataset
Read CSV
Read Delta
Empty Dataset
Read Excel
Read Excel Advanced
ReadFlatFile
Read HANA CSV
InMemoryDataset
Read JSON
Read LIBSVM
Read Parquet
Read Shape File
Dataset Structured
URL Text File Reader
URL Single Record JSON Reader
Read XML
03-Unstructured-Files
Binary Files
PDF
PDF Image OCR
Text Files
Tika
Whole Text Files
05-SFTP
SFTP Read
SFTP
06-Enterprise-Applications
Read Marketo
Read Salesforce
02-Output
01-Database
Insert Into HIVE Table
Save JDBC
Write To BigQuery
Save Incorta
Save As HIVE Table
Save Cassandra
Save Databricks Table
Save DynamoDB
Save ElasticSearch
Save Iceberg
Save MongoDB
Update JDBC
Upsert JDBC
Write To Snowflake
02-Structured-Files
Save DOCX
Save Excel
Save Excel Advanced
Save HTML
Save PDF
Save Text
Save Avro
Save CSV
Save Delta
Save JDBC
Save JSON
Save ORC
Save Parquet
04-Streaming
Kafka Producer
05-SFTP
SFTP Write
03-Streaming
01-Real-Time-Streaming
Streaming Kafka
Streaming Socket Text Stream
Streaming Text File Stream
02-Structured-Streaming
Structured Streaming Console Sink
Structured Streaming CSV
Structured Streaming File Sink
Structured Streaming JSON
Structured Streaming Kafka Read
Structured Streaming Kafka Save
Structured Streaming Kinesis
Structured Streaming Socket
Structured Streaming Hive Sink
Structured Streaming Hive Sink2
02-Data-Profiling
Columns Cardinality
Correlation
Correlation
Cross Tab
Distinct Values In Column
Flag Outlier
Graph Week Day Distribution
Graph Year Distribution
Histogram
Graph Month Distribution
MultiFlagOutliers
MultiFlagOutliers
Null Values In Column
Skewness And Kurtosis
Summary Statistics
WhatIf Summary Statistics
03-Data-Preparation
01-DateTime
Date Time Field Extract
Date To Age
Date Difference
Date To String
String To Date
String To Unix Time
Time Functions
Unix Time To String
02-Math
Math Expression
Math Functions
03-String
String Functions
Text Case Transformer
04-Parsing
Apache Logs
Field Splitter
Fixed Length Fields
Multi Regex Extractor
OCR
Paragraph Splitter
Parse JSON Col
Parse XML Col
Regex Tokenizer
05-Cleaning
Count rows columns
Data Cleansing
Data Cleansing Advanced
Data Wrangling
Dedup
Drop Duplicate Rows
Drop Rows With Null
Drop Null Rows for Selected Columns
Find And Replace Using Regex
Find And Replace Using Regex Advanced
Impute Advanced
Imputing With Constant
Imputing With Mean Value
Imputing With Median
Imputing With Mode Value
Lookup
Count Null Values
Remove Duplicate Rows
Remove Unwanted Characters
Remove Unwanted Characters Advanced
Standard Deviation
Value count
06-Add-Column
Add Columns
Add Column Advanced
Case When
Case When Advanced
Concat Columns
Expressions
Record ID
Generate UID
Generate UUID
Data Masking
Row Numbering
Zip With Index
06-Control-Structures
Execute In Loop
Execute Workflow
Read Parameters
Specify Parameters
07-Split
Compare All Columns
Compare All Columns Single Output
Compare Specific Columns
Compare Specific Columns Single Output
Split By Expression
Split By Multiple Expressions
08-Condition
Assert
Decision
09-Cast-DataType
Cast To Single Type
Cast To Different Types-1
Cast To Different Types-2
10-Filter
Select Columns
Drop Columns
Filter Advanced
Filter By Date Range
Filter By String Length
Limit
Filter By Number Range
Regex Advanced
Row Filter
Node Row Filter By Index
Select
FindDuplicate
11-Join-Union
Append Fields
Join On Columns
Join On Common Column
Join On Common Columns
Join Using SQL
Join Advanced
Union Advanced
Union All
Union Distinct
12-Group
Aggregate
Cube
Group By
Melt
Pivot By
Pivot By Advanced
Pivot By
Rollup
13-Code
Scala
Jython
MultiInputPySpark
Multi Input To Multi Output PySpark
Pipe Python
Pipe Python2
PySpark
Run Python Code
Run Python File
Run HIVEQL
Spark
Scala UDF
SQL
SQL Executer
Unix Shell Commands
16-Others
CDC Using Full Table Merge
Columns Rename
Count
Dynamic Rename
Explode
Flatten
Formula
JsonToEDI
Linear Programming Optimisation
Multi Window Analytics
Multi Window Ranking
DeltaMerge
Recover Hive Partitions
Register TempTable
Round Value
Sample
SaveWaterMark
SCDType2DeltaMerge
Scheduling Optimization
Sort By
Sort Columns
Supplier Optimization
Transpose
Transpose Advanced
Window Aggregation
Window Analytics
Window Function
Window Ranking
Word Count
04-Data-Validation
Validate Address
Compare Datasets
Node Schema Validation
Validate Fields Simple
Validate Fields Advanced
06-Data-Quality
01-Great-Expectations
Create CSV from GE Results
ExpectColumnToExist
ExpectColumnValueLengthToBeInBetween
ExpectColumnValueLengthsToEqual
ExpectColumnValueToMatchStrftimeFormat
ExpectColumnValuesToBeInBetween
ExpectColumnValuesToBeInSet
ExpectColumnValuesToBeNull
ExpectColumnValuesToBeUnique
ExpectColumnValuesToMatchRegex
ExpectColumnValuesToNotBeNull
ExpectTableRowCountToBeBetween
GE Decision
Split Into Good And Bad Records
HasDataSize
HasMax
HasMin
PatternMatch
IsNonNegative
Is Not Null
IsPositive
Is Primary Key
Is Value In
NodeDataQualityCheckAndAlert
CheckOutliers
ColumnValuesToBeBetween
07-Data-Visualization
01-SPC
C Chart
EWMA Chart
I-MR Chart
IMR-R Chart
IMR-S Chart
NP Chart
P Chart
Process Capability Index
Run Chart
U Chart
Xbar-R Chart
Xbar-S Chart
Box Plot
Bubble Chart
Gauge Chart
Graph Group By Column
Geo Chart
Graph Values
Hierarchy Chart
Bar Chart
Bar Group Chart
Print Rich Text
Print N Rows
Scatter Chart
Scatter Cluster Chart
Graph Subplots
Word Cloud Chart
08-Machine-Learning
01-ML-Feature-Selection
Data Drift
Feature Selection
Feature Selection With Importance
Feature Selection With Correlation
Group By RFM Features
Moving Average Features
Time Series Features
02-ML-SparkML
02-FeatureScaler
Min Max Scaler
Min Max Scaler
Min Max Scaler Transform
Min Max Scaler Transform
Standard Scaler
Standard Scaler Transform
03-FeatureExtraction
Count Vectorizer
Hashing TF
Markov Chain
R Formula
Word2 Vec
04-FeatureTransformers
Binarizer
Bucketizer
Bucketizer Transform
IDF
Imputer
Imputer
Imputer Transform
Imputer Transform
Index To String
Index To String Transform
Index String
Interaction
Interaction Transform
MaxAbs Scaler
MaxAbs Scaler
MaxAbs Scaler Transform
MaxAbs Scaler Transform
N Gram Transformer
Normalizer
Normalizer
Normalizer Transform
One Hot Encoder
One Hot Encoder Advanced
One Hot Encoder Advanced Transform
One Hot Encoder
One Hot Encoder Transform
Polynominal Expansion
Quantile Discretizer
Quantile Discretizer Transform
Robust Scaler
Robust Scaler
Robust Scaler Transform
Robust Scaler Transform
Signal Processing
SMOTE
SQL Transformer
Stop Words Remover
String Indexer
String Indexer Advanced
String Indexer
String Indexer Transform
String Indexer Advanced Transform
Tokenizer
Vector Assembler
Vector Functions
Vector Indexer
Vector Indexer
Vector Indexer Transform
Vector Indexer Transform
Word To Score Mapping
05-DimensionalityReduction
PCA
SVD
06-FeatureSelection
ChiSq Selector
Vector Slicer
07-SplitDataset
Split
Split Probability Column
Split With Stratified Sampling
08-Clustering
Gaussian Mixture
K-Means
LDA
09-Regression
AFT Survival Regression
Decision Tree Regression
GBT Regression
Linear Regression
Random Forest Regression
XGBoost Regressor
10-Classification
Decision Tree Classifier
Decision Tree Classifier
GBT Classifier
GBT Classifier
Logistic Regression
Logistic Regression
MultiLayer Perceptron
Naive Bayes
Random Forest Classifier
Random Forest Classifier
XGBoost Classifier
11-CollaborativeFiltering
ALS
12-FreqPatternMining
FP Growth
13-Modeling
Binary Classification Evaluator
Binary Classification Evaluator
Clustering Evaluator
Cross Validator
Load MLeap
Spark ML Model Load
Spark ML Model Load
Spark ML Model Save
Multiclass Classification Evaluator
Multiclass Classification Evaluator
Spark Pipeline
Spark Predict
Regression Evaluator
Spark ML ROC
Save MLeap
Train Validation Split
03-ML-H2O
Extract Probabilities
H2O Auto ML
H2O Auto ML
H2O Clustering Evaluator
H2O Clustering Evaluator
H2O Distributed Random Forest
H2O Distributed Random Forest
H2O Gradient Boosting Machine
H2O Gradient Boosting Machine
H2O Generalized Linear Models
H2O Generalized Linear Models
H2O Generalized Low Rank Models
H2O Generalized Low Rank Models
H2O Isolation Forest
H2O Isolation Forest
H2O K-Means
H2O K-Means
H2O ML Model Load
H2O ML Model Load
H2O ML Model Save
H2O ML Model Save
H2O Neural Network
H2O PCA
H2O PCA
H2O Score
H2O Score
H2O Word to Vec
H2O Word to Vec
H2O XGBoost
H2O XGBoost
05-ML-DeepLearning
Dense Layer
Keras Preprocessor
Keras Model Compile
Keras Model Fit
Keras Model Sequential
Keras Predict
ImageSegmentation
06-ML-Sklearn
Classification
Sklearn Gradient Boosting Classifier
Sklearn Logistic Regression
Sklearn Random Forest Classifier
Sklearn XGBoost Classifier
Clustering
Sklearn K-Means
Data
Sklearn Polynomial
Modeling
Custom Metrics
Sklearn Classification Evaluator
Sklearn Model Load From S3
Sklearn Model Load
Sklearn Predict
Sklearn Regression Evaluator
Sklearn Model Save To S3
Sklearn Model Save
Optimization
Optimization
Optimization Model Load And Score
PreProcessing
Sklearn Binarizer
Sklearn Binarizer Transform
Sklearn Label Encoder
Sklearn MinMaxScaler
Sklearn MinMax Scaler Transform
MinMax Scaler Inverse Transform
Sklearn Normalizer
Sklearn Normalizer Transform
Sklearn OneHotEncoder
Sklearn Quantile Fit Transform
Sklearn Quantile Transform
Sklearn StandardScalar
Sklearn StandardScalar Transform
Standard Scaler Inverse Transform
Sklearn TF-IDF Vectorizer
Regression
Sklearn Bayesian Ridge Regression
Sklearn Gradient Boosting Regression
SkLearn Lasso Regression
Sklearn Random Forest Regression
Sklearn Ridge Regression
Sklearn XGBoost Regressor
07-ML-Pycaret
PyCaret AutoML Classification
PyCaret AutoML Regression
08-ML-TimeSeries
Arima
Arima Forecast
Arima Model Load
Arima Model Save
LSTM
Prophet
Prophet Cross Validator
Prophet Model Load
Prophet Make Future Dataframe
Prophet Predict
Prophet Model Save
Sarimax
Sarimax Forecast
Sarimax Model Load
Sarimax Model Save
TS Decompose
VAR
VarForecast
VAR Model Load
VAR Model Save
10-OpenNLP
Open NLP Document Categorizer
Open NLP Name Finder
Open NLP Sentence Detector
09-Utilities
01-Spark-Performance
Cache Data Frame
Print Spark Configuration
Unpersist DataFrame
02-Data-Partition
Coalesce
Number Of Partitions
Repartition
CodeLibrary
Pdf Attachments from Emails
EmailNotification
InlineDQ_Validation
ML Data Metrics
Generate Dynamic Parameters
Rest API Client
PGPDecrypt
ExecuteRedshiftStatement
10-Documentation
Notes
Sticky Note
11-Custom-Processors
pyspark
ScoreCard_Binning
12-Deprecated
String To Date
13-Generative-AI
01-Hugging-Face
Hugging Face Custom Category Sentiment Analysis
Hugging Face Grammatical Correctness
Hugging Face Natural Language Inference
Hugging Face Question Natural Language Inference
Hugging Face Sentiment Analysis
Hugging Face Summarization
Hugging Face Tone Analysis
02-Ingestion
Confluence Reader
Document To Text
Service Now Data Extraction
Sharepoint Data Extraction
Audio Diarization
Video Summarization
Web Scraper
03-Vectorization
Save Chroma DB
Save Faiss DB
Save to Pinecone
Create Text Embedding
04-Retrieval
Read Chroma DB
Read Faiss DB
Read Pinecone DB
05-LLM-Inference
Interactive LLM Agent
Invoice Extraction
Multi LLM Query
Output Formatter
SerperAI Search
Text Analysis
Sparkflows
»
03-Data-Preparation
»
04-Parsing
View page source
04-Parsing
¶
Apache Logs
Field Splitter
Fixed Length Fields
Multi Regex Extractor
OCR
Paragraph Splitter
Parse JSON Col
Parse XML Col
Regex Tokenizer