Logo
  • Documentation
  • Blogs
  • Processors
Sparkflows Logo
  • 01-Data-Connectors
    • 01-Input
      • 01-Database
        • Execute BigQuery
        • Execute CRUD Stmt In Snowflake
        • Execute Query In Snowflake
        • Execute Stored Procedure In Snowflake
        • Hive Incremental
        • Read JDBC
        • JDBC Incremental Load
        • Netsuite AutoIncrement
        • Read Netsuite
        • AutoIncrement
        • DB Incremental Ingestion
        • Read JDBC
        • Query JDBC
        • Read Incorta
        • Read BigQuery
        • Read Cassandra
        • Read Databricks Table
        • Read DynamoDB
        • Read Elastic Search
        • Read From Snowflake
        • Read HIVE Table
        • Read Iceberg
        • Read MongoDB
      • 02-Structured-Files
        • Read Avro
        • Create Dataset
        • Read CSV
        • Read Delta
        • Empty Dataset
        • Read Excel
        • Read Excel Advanced
        • ReadFlatFile
        • Read HANA CSV
        • InMemoryDataset
        • Read JSON
        • Read LIBSVM
        • Read Parquet
        • Read Shape File
        • Dataset Structured
        • URL Text File Reader
        • URL Single Record JSON Reader
        • Read XML
      • 03-Unstructured-Files
        • Binary Files
        • PDF
        • PDF Image OCR
        • Text Files
        • Tika
        • Whole Text Files
      • 05-SFTP
        • SFTP Read
        • SFTP
      • 06-Enterprise-Applications
        • Read Marketo
        • Read Salesforce
    • 02-Output
      • 01-Database
        • Insert Into HIVE Table
        • Save JDBC
        • Write To BigQuery
        • Save Incorta
        • Save As HIVE Table
        • Save Cassandra
        • Save Databricks Table
        • Save DynamoDB
        • Save ElasticSearch
        • Save Iceberg
        • Save MongoDB
        • Update JDBC
        • Upsert JDBC
        • Write To Snowflake
      • 02-Structured-Files
        • Save DOCX
        • Save Excel
        • Save Excel Advanced
        • Save HTML
        • Save PDF
        • Save Text
        • Save Avro
        • Save CSV
        • Save Delta
        • Save JDBC
        • Save JSON
        • Save ORC
        • Save Parquet
      • 04-Streaming
        • Kafka Producer
      • 05-SFTP
        • SFTP Write
    • 03-Streaming
      • 01-Real-Time-Streaming
        • Streaming Kafka
        • Streaming Socket Text Stream
        • Streaming Text File Stream
      • 02-Structured-Streaming
        • Structured Streaming Console Sink
        • Structured Streaming CSV
        • Structured Streaming File Sink
        • Structured Streaming JSON
        • Structured Streaming Kafka Read
        • Structured Streaming Kafka Save
        • Structured Streaming Kinesis
        • Structured Streaming Socket
        • Structured Streaming Hive Sink
        • Structured Streaming Hive Sink2
  • 02-Data-Profiling
    • Columns Cardinality
    • Correlation
    • Correlation
    • Cross Tab
    • Distinct Values In Column
    • Flag Outlier
    • Graph Week Day Distribution
    • Graph Year Distribution
    • Histogram
    • Graph Month Distribution
    • MultiFlagOutliers
    • MultiFlagOutliers
    • Null Values In Column
    • Skewness And Kurtosis
    • Summary Statistics
    • WhatIf Summary Statistics
  • 03-Data-Preparation
    • 01-DateTime
      • Date Time Field Extract
      • Date To Age
      • Date Difference
      • Date To String
      • String To Date
      • String To Unix Time
      • Time Functions
      • Unix Time To String
    • 02-Math
      • Math Expression
      • Math Functions
    • 03-String
      • String Functions
      • Text Case Transformer
    • 04-Parsing
      • Apache Logs
      • Field Splitter
      • Fixed Length Fields
      • Multi Regex Extractor
      • OCR
      • Paragraph Splitter
      • Parse JSON Col
      • Parse XML Col
      • Regex Tokenizer
    • 05-Cleaning
      • Count rows columns
      • Data Cleansing
      • Data Cleansing Advanced
      • Data Wrangling
      • Dedup
      • Drop Duplicate Rows
      • Drop Rows With Null
      • Drop Null Rows for Selected Columns
      • Find And Replace Using Regex
      • Find And Replace Using Regex Advanced
      • Impute Advanced
      • Imputing With Constant
      • Imputing With Mean Value
      • Imputing With Median
      • Imputing With Mode Value
      • Lookup
      • Count Null Values
      • Remove Duplicate Rows
      • Remove Unwanted Characters
      • Remove Unwanted Characters Advanced
      • Standard Deviation
      • Value count
    • 06-Add-Column
      • Add Columns
      • Add Column Advanced
      • Case When
      • Case When Advanced
      • Concat Columns
      • Expressions
      • Record ID
      • Generate UID
      • Generate UUID
      • Data Masking
      • Row Numbering
      • Zip With Index
    • 06-Control-Structures
      • Execute In Loop
      • Execute Workflow
      • Read Parameters
      • Specify Parameters
    • 07-Split
      • Compare All Columns
      • Compare All Columns Single Output
      • Compare Specific Columns
      • Compare Specific Columns Single Output
      • Split By Expression
      • Split By Multiple Expressions
    • 08-Condition
      • Assert
      • Decision
    • 09-Cast-DataType
      • Cast To Single Type
      • Cast To Different Types-1
      • Cast To Different Types-2
    • 10-Filter
      • Select Columns
      • Drop Columns
      • Filter Advanced
      • Filter By Date Range
      • Filter By String Length
      • Limit
      • Filter By Number Range
      • Regex Advanced
      • Row Filter
      • Node Row Filter By Index
      • Select
      • FindDuplicate
    • 11-Join-Union
      • Append Fields
      • Join On Columns
      • Join On Common Column
      • Join On Common Columns
      • Join Using SQL
      • Join Advanced
      • Union Advanced
      • Union All
      • Union Distinct
    • 12-Group
      • Aggregate
      • Cube
      • Group By
      • Melt
      • Pivot By
      • Pivot By Advanced
      • Pivot By
      • Rollup
    • 13-Code
      • Scala
      • Jython
      • MultiInputPySpark
      • Multi Input To Multi Output PySpark
      • Pipe Python
      • Pipe Python2
      • PySpark
      • Run Python Code
      • Run Python File
      • Run HIVEQL
      • Spark
      • Scala UDF
      • SQL
      • SQL Executer
      • Unix Shell Commands
    • 16-Others
      • CDC Using Full Table Merge
      • Columns Rename
      • Count
      • Dynamic Rename
      • Explode
      • Flatten
      • Formula
      • JsonToEDI
      • Linear Programming Optimisation
      • Multi Window Analytics
      • Multi Window Ranking
      • DeltaMerge
      • Recover Hive Partitions
      • Register TempTable
      • Round Value
      • Sample
      • SaveWaterMark
      • SCDType2DeltaMerge
      • Scheduling Optimization
      • Sort By
      • Sort Columns
      • Supplier Optimization
      • Transpose
      • Transpose Advanced
      • Window Aggregation
      • Window Analytics
      • Window Function
      • Window Ranking
      • Word Count
  • 04-Data-Validation
    • Validate Address
    • Compare Datasets
    • Node Schema Validation
    • Validate Fields Simple
    • Validate Fields Advanced
  • 06-Data-Quality
    • 01-Great-Expectations
      • Create CSV from GE Results
      • ExpectColumnToExist
      • ExpectColumnValueLengthToBeInBetween
      • ExpectColumnValueLengthsToEqual
      • ExpectColumnValueToMatchStrftimeFormat
      • ExpectColumnValuesToBeInBetween
      • ExpectColumnValuesToBeInSet
      • ExpectColumnValuesToBeNull
      • ExpectColumnValuesToBeUnique
      • ExpectColumnValuesToMatchRegex
      • ExpectColumnValuesToNotBeNull
      • ExpectTableRowCountToBeBetween
      • GE Decision
      • Split Into Good And Bad Records
    • HasDataSize
    • HasMax
    • HasMin
    • PatternMatch
    • IsNonNegative
    • Is Not Null
    • IsPositive
    • Is Primary Key
    • Is Value In
    • NodeDataQualityCheckAndAlert
    • CheckOutliers
    • ColumnValuesToBeBetween
  • 07-Data-Visualization
    • 01-SPC
      • C Chart
      • EWMA Chart
      • I-MR Chart
      • IMR-R Chart
      • IMR-S Chart
      • NP Chart
      • P Chart
      • Process Capability Index
      • Run Chart
      • U Chart
      • Xbar-R Chart
      • Xbar-S Chart
    • Box Plot
    • Bubble Chart
    • Gauge Chart
    • Graph Group By Column
    • Geo Chart
    • Graph Values
    • Hierarchy Chart
    • Bar Chart
    • Bar Group Chart
    • Print Rich Text
    • Print N Rows
    • Scatter Chart
    • Scatter Cluster Chart
    • Graph Subplots
    • Word Cloud Chart
  • 08-Machine-Learning
    • 01-ML-Feature-Selection
      • Data Drift
      • Feature Selection
      • Feature Selection With Importance
      • Feature Selection With Correlation
      • Group By RFM Features
      • Moving Average Features
      • Time Series Features
    • 02-ML-SparkML
      • 02-FeatureScaler
        • Min Max Scaler
        • Min Max Scaler
        • Min Max Scaler Transform
        • Min Max Scaler Transform
        • Standard Scaler
        • Standard Scaler Transform
      • 03-FeatureExtraction
        • Count Vectorizer
        • Hashing TF
        • Markov Chain
        • R Formula
        • Word2 Vec
      • 04-FeatureTransformers
        • Binarizer
        • Bucketizer
        • Bucketizer Transform
        • IDF
        • Imputer
        • Imputer
        • Imputer Transform
        • Imputer Transform
        • Index To String
        • Index To String Transform
        • Index String
        • Interaction
        • Interaction Transform
        • MaxAbs Scaler
        • MaxAbs Scaler
        • MaxAbs Scaler Transform
        • MaxAbs Scaler Transform
        • N Gram Transformer
        • Normalizer
        • Normalizer
        • Normalizer Transform
        • One Hot Encoder
        • One Hot Encoder Advanced
        • One Hot Encoder Advanced Transform
        • One Hot Encoder
        • One Hot Encoder Transform
        • Polynominal Expansion
        • Quantile Discretizer
        • Quantile Discretizer Transform
        • Robust Scaler
        • Robust Scaler
        • Robust Scaler Transform
        • Robust Scaler Transform
        • Signal Processing
        • SMOTE
        • SQL Transformer
        • Stop Words Remover
        • String Indexer
        • String Indexer Advanced
        • String Indexer
        • String Indexer Transform
        • String Indexer Advanced Transform
        • Tokenizer
        • Vector Assembler
        • Vector Functions
        • Vector Indexer
        • Vector Indexer
        • Vector Indexer Transform
        • Vector Indexer Transform
        • Word To Score Mapping
      • 05-DimensionalityReduction
        • PCA
        • SVD
      • 06-FeatureSelection
        • ChiSq Selector
        • Vector Slicer
      • 07-SplitDataset
        • Split
        • Split Probability Column
        • Split With Stratified Sampling
      • 08-Clustering
        • Gaussian Mixture
        • K-Means
        • LDA
      • 09-Regression
        • AFT Survival Regression
        • Decision Tree Regression
        • GBT Regression
        • Linear Regression
        • Random Forest Regression
        • XGBoost Regressor
      • 10-Classification
        • Decision Tree Classifier
        • Decision Tree Classifier
        • GBT Classifier
        • GBT Classifier
        • Logistic Regression
        • Logistic Regression
        • MultiLayer Perceptron
        • Naive Bayes
        • Random Forest Classifier
        • Random Forest Classifier
        • XGBoost Classifier
      • 11-CollaborativeFiltering
        • ALS
      • 12-FreqPatternMining
        • FP Growth
      • 13-Modeling
        • Binary Classification Evaluator
        • Binary Classification Evaluator
        • Clustering Evaluator
        • Cross Validator
        • Load MLeap
        • Spark ML Model Load
        • Spark ML Model Load
        • Spark ML Model Save
        • Multiclass Classification Evaluator
        • Multiclass Classification Evaluator
        • Spark Pipeline
        • Spark Predict
        • Regression Evaluator
        • Spark ML ROC
        • Save MLeap
        • Train Validation Split
    • 03-ML-H2O
      • Extract Probabilities
      • H2O Auto ML
      • H2O Auto ML
      • H2O Clustering Evaluator
      • H2O Clustering Evaluator
      • H2O Distributed Random Forest
      • H2O Distributed Random Forest
      • H2O Gradient Boosting Machine
      • H2O Gradient Boosting Machine
      • H2O Generalized Linear Models
      • H2O Generalized Linear Models
      • H2O Generalized Low Rank Models
      • H2O Generalized Low Rank Models
      • H2O Isolation Forest
      • H2O Isolation Forest
      • H2O K-Means
      • H2O K-Means
      • H2O ML Model Load
      • H2O ML Model Load
      • H2O ML Model Save
      • H2O ML Model Save
      • H2O Neural Network
      • H2O PCA
      • H2O PCA
      • H2O Score
      • H2O Score
      • H2O Word to Vec
      • H2O Word to Vec
      • H2O XGBoost
      • H2O XGBoost
    • 05-ML-DeepLearning
      • Dense Layer
      • Keras Preprocessor
      • Keras Model Compile
      • Keras Model Fit
      • Keras Model Sequential
      • Keras Predict
      • ImageSegmentation
    • 06-ML-Sklearn
      • Classification
        • Sklearn Gradient Boosting Classifier
        • Sklearn Logistic Regression
        • Sklearn Random Forest Classifier
        • Sklearn XGBoost Classifier
      • Clustering
        • Sklearn K-Means
      • Data
        • Sklearn Polynomial
      • Modeling
        • Custom Metrics
        • Sklearn Classification Evaluator
        • Sklearn Model Load From S3
        • Sklearn Model Load
        • Sklearn Predict
        • Sklearn Regression Evaluator
        • Sklearn Model Save To S3
        • Sklearn Model Save
      • Optimization
        • Optimization
        • Optimization Model Load And Score
      • PreProcessing
        • Sklearn Binarizer
        • Sklearn Binarizer Transform
        • Sklearn Label Encoder
        • Sklearn MinMaxScaler
        • Sklearn MinMax Scaler Transform
        • MinMax Scaler Inverse Transform
        • Sklearn Normalizer
        • Sklearn Normalizer Transform
        • Sklearn OneHotEncoder
        • Sklearn Quantile Fit Transform
        • Sklearn Quantile Transform
        • Sklearn StandardScalar
        • Sklearn StandardScalar Transform
        • Standard Scaler Inverse Transform
        • Sklearn TF-IDF Vectorizer
      • Regression
        • Sklearn Bayesian Ridge Regression
        • Sklearn Gradient Boosting Regression
        • SkLearn Lasso Regression
        • Sklearn Random Forest Regression
        • Sklearn Ridge Regression
        • Sklearn XGBoost Regressor
    • 07-ML-Pycaret
      • PyCaret AutoML Classification
      • PyCaret AutoML Regression
    • 08-ML-TimeSeries
      • Arima
      • Arima Forecast
      • Arima Model Load
      • Arima Model Save
      • LSTM
      • Prophet
      • Prophet Cross Validator
      • Prophet Model Load
      • Prophet Make Future Dataframe
      • Prophet Predict
      • Prophet Model Save
      • Sarimax
      • Sarimax Forecast
      • Sarimax Model Load
      • Sarimax Model Save
      • TS Decompose
      • VAR
      • VarForecast
      • VAR Model Load
      • VAR Model Save
    • 10-OpenNLP
      • Open NLP Document Categorizer
      • Open NLP Name Finder
      • Open NLP Sentence Detector
  • 09-Utilities
    • 01-Spark-Performance
      • Cache Data Frame
      • Print Spark Configuration
      • Unpersist DataFrame
    • 02-Data-Partition
      • Coalesce
      • Number Of Partitions
      • Repartition
    • CodeLibrary
    • Pdf Attachments from Emails
    • EmailNotification
    • InlineDQ_Validation
    • ML Data Metrics
    • Generate Dynamic Parameters
    • Rest API Client
    • PGPDecrypt
    • ExecuteRedshiftStatement
  • 10-Documentation
    • Notes
    • Sticky Note
  • 11-Custom-Processors
    • pyspark
      • ScoreCard_Binning
  • 12-Deprecated
    • String To Date
  • 13-Generative-AI
    • 01-Hugging-Face
      • Hugging Face Custom Category Sentiment Analysis
      • Hugging Face Grammatical Correctness
      • Hugging Face Natural Language Inference
      • Hugging Face Question Natural Language Inference
      • Hugging Face Sentiment Analysis
      • Hugging Face Summarization
      • Hugging Face Tone Analysis
    • 02-Ingestion
      • Confluence Reader
      • Document To Text
      • Service Now Data Extraction
      • Sharepoint Data Extraction
      • Audio Diarization
      • Video Summarization
      • Web Scraper
    • 03-Vectorization
      • Save Chroma DB
      • Save Faiss DB
      • Save to Pinecone
      • Create Text Embedding
    • 04-Retrieval
      • Read Chroma DB
      • Read Faiss DB
      • Read Pinecone DB
    • 05-LLM-Inference
      • Interactive LLM Agent
      • Invoice Extraction
      • Multi LLM Query
      • Output Formatter
      • SerperAI Search
      • Text Analysis
Sparkflows
  • »
  • 01-Data-Connectors »
  • 01-Input »
  • 02-Structured-Files
  • View page source

02-Structured-Files¶

  • Read Avro
  • Create Dataset
  • Read CSV
  • Read Delta
  • Empty Dataset
  • Read Excel
  • Read Excel Advanced
  • ReadFlatFile
  • Read HANA CSV
  • InMemoryDataset
  • Read JSON
  • Read LIBSVM
  • Read Parquet
  • Read Shape File
  • Dataset Structured
  • URL Text File Reader
  • URL Single Record JSON Reader
  • Read XML
Next Previous

© 2026 Sparkflows, Inc. All rights reserved.

Privacy Policy | Terms and Conditions

Built with Sphinx using a theme provided by Read the Docs.