Pipe Python¶

This node runs any given Python code. It pipes the incoming DataFrame through pipe to the Python Script. Output back to Spark has to be written out using print.

Input¶

It pipes the incoming DataFrame through pipe to the Python Script. It also passes the Schema of the DataFrame to the Python script through the command line argument - argv[1]

Output¶

Output back to Spark has to be written out using print.

Type¶

transform

Class¶

fire.nodes.etl.NodePipePython

Fields¶

Name	Title	Description
code	Pipe Python	Python code to be run. It receives each record as a string and outputs records back as a string.
outputColNames	Output Column Names	Output Schema of Pipe Python Processor
outputColTypes	Output Column Types	Data Type of the Output Columns
outputColFormats	Output Column Formats	Format of the Output Columns

Details¶

Pipe Python Details¶

The Pipe Python node receives an incoming DataFrame. It pipes the DataFrame through to a Python script that runs the given Python code. The script can operate on each row of the DataFrame and returns an updated row.

The input to the script is passed as a string and the output from the script is also passed as a string. The input schema of the DataFrame is also passed to the Python script through the command line argument - argv[1].

The output from the Python script has to be written back to Spark using print. The node then creates an updated DataFrame based on the output from the script and passes it on to the next node in the pipeline.

Examples¶

Pipe Python Examples¶

Below are some examples of the Python code that can be run in the Pipe Python node.

The schema of the Input DataFrame is : id, price, lotsize, bedrooms, bathrms, stories, driveway, recroom, fullbase, gashw, airco, garagepl, prefarea

Update the value of price¶

def update_price(record):

record[‘price’] = int(record[‘price’]) + 1000

return record

print(update_price(record))