Spark¶
This node runs any given Scala code. The input dataframe is passed in the variable inDF. The output dataframe is passed back by registering it as a temporary table.
Input¶
The input dataframe is passed in the variable inDF.
Output¶
The output dataframe is passed back by registering it as a temporary table
Type¶
scala
Class¶
fire.nodes.code.NodeSpark
Fields¶
Name |
Title |
Description |
|---|---|---|
outTempTable |
Output Temp Table |
Output Temp Table |
code |
Scala |
Scala code to be run. Input dataframe : “inDF”, SparkContext : “sc”, SQLContext : “sqlContext”, Output/Result dataframe should be registered as a temporary table - df.registerTempTable(“outDF”) |
schema |
InferSchema |
|
outputColNames |
Column Names for the CSV |
New Output Columns of the SQL |
outputColTypes |
Column Types for the CSV |
Data Type of the Output Columns |
outputColFormats |
Column Formats for the CSV |
Format of the Output Columns |
Details¶
Scala Details¶
This node receives receives an input dataframe.
The input dataframe is passed into the Scala code as a variable called inDF.
The scala code operates on the dataframe inDF.
Finally the scala code produces a resulting dataframe to be passed on to the next node. It does so by registering a temporary table called outDF.
For DataSet support add below import stmt:
import spark.implicits._
Examples¶
Scala Examples¶
Pass the Input dataframe as Output dataframe.¶
In OUTPUT TEMP TABLE field add the name of the temp table. ex: temp_table
val outDF = inDF
outDF.registerTempTable(“temp_table”)
Calculate Count of Houses by Bathrooms¶
In OUTPUT TEMP TABLE field add the name of the temp table. ex: temp_table
val outDF = inDF.groupBy(“bathrms”).count()
outDF.registerTempTable(“temp_table”)
registerTempTable is used to register the result dataframe as a temporary table. Use the OUTPUT TEMP TABLE field name for that.