How to create a Python Processor in Apache NiFi 2
(The start of this post is a bit of a âWhat this bolognese recipe means to my familyâ-style rant. If you just want the steps, skip ahead)
I love NiFi, itâs a great tool, and NiFi 2 added support for Python Processors. This was pretty exciting âcause I hate working with Java (the language itself isâŚfine, but the ecosystem and developer experience are terrible). Iâve been using NiFi on-and-off since 2018-ish, and worked at Cloudera (the primary corporate sponsor of NiFi) for a few years doing a lot of NiFi work.
As excited as I was, and confident in my NiFi knowledge, I struggled to get to a âHello Worldâ moment with Python Processors. While I might be a bit out of date, I found the documentation practically useless from a âgetting startedâ perspective, and NiFiâs âI just want to playâ experience is, and has been for some time, just down right awful.
Part of this is due to the secure-by-default decision, made somewhere around version 1.14.0. It somewhat made sense at the time (I was working at Cloudera when this decision was made); NiFi came from the security world and found most of its early adoption in more secure, enterprise environments. Forcing these enterprises to follow sane security best practices stopped people doing stupid things and getting into trouble later onâŚ
âŚbut, setting up security on NiFi isnât trivial, particularly if youâre not close to the âJava WayââŚand if youâre excited by Python Processors, itâs probably because youâre not close to the âJava Wayâ.
It is, frankly, a truly shitty first impression. It was already hard when 1.14.0 was relevant and docs/examples were up-to-date, but now that most of that content is outdated and useless, itâs even more of a confusing mess. Itâs quite sad because, once you get past it, NiFi is such an incredible tool, but expecting people to fight through this clusterfuck of a setupâŚitâs just not going to happen, and it will kneecap adoption. This is my fourth attempt at getting a âHello Worldâ example running, I fought and gave up 3 times before finally getting it working.
I know itâs open source, I know I could contribute. But this needs directional agreement, and I likely wouldnât be fluent enough in Java to contribute it anyway. So this is my contribution: a guide to help people like me, and a plea to the NiFi team (who are all amazing btw), please, please, please, pleaseâŚimprove the NiFi onboarding experience to not suck quite this much.
From 0 to âHello Worldâ steps
This was done on an M1 Macbook Pro, using OrbStack to run a Linux VM (Ubuntu).
Set up your machine
I use OrbStack to run VMs & containers on my Mac.
For this guide, use an Ubuntu 24.04.1 VM. Name the VM nifi
.
If you used OrbStack, the machineâs hostname is configured as nifi
and this will create a local DNS name of nifi.orb.local
. This is important, because the TLS configuration of NiFi absolutely requires hostnames and will not work with IP addresses.
Access the terminal of the VM. Install JDK 21 with:
sudo apt install openjdk-21-jdk
And verify with:
java -version
Now configure your JAVA_HOME. Edit /etc/environment
and add a line at the bottom (replace the path with your own, itâll be under /usr/lib/jvm/
):
JAVA_HOME="/usr/lib/jvm/java-21-openjdk-amd64"
Then run:
source /etc/environment
Now we need to set up Python 3.11 (as this Ubuntu ships with 3.12, which isnât supported by NiFi yet):
sudo apt install software-properties-common
sudo apt update
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11
python3.11 --version
Finally, install wget
and unzip
:
sudo apt install wget unzip
Setting up NiFi 2
Download NiFi 2.0.0 M4 and unzip it:
wget https://dlcdn.apache.org/nifi/2.0.0-M4/nifi-2.0.0-M4-bin.zip
unzip nifi-2.0.0-M4-bin.zip
Edit the nifi-2.0.0-M4/conf/nifi.properties
file. Change the nifi.web.http.host
to 0.0.0.0
. Then update nifi.python.command=python3
line to nifi.python.command=python3.11
and remove the #
from the start of the line. Save the file.
Then download the previous version of NiFi Toolkit (1.27.0) and unzip it:
wget https://dlcdn.apache.org/nifi/1.27.0/nifi-toolkit-1.27.0-bin.zip
unzip nifi-toolkit-1.27.0-bin.zip
You need the old version because tls-toolkit
was deprecated with NiFi 2.0 and no longer ships with it. This makes it even more of a pain in the ass to do the initial TLS setup. Luckily, the old version of tls-toolkit
still works and generates valid certs & config, so we can use it for now.
Run tls-toolkit
like so (make sure to edit the nifi.orb.local
if your FQDN is different):
nifi-toolkit-1.27.0/bin/tls-toolkit.sh standalone -n 'nifi.orb.local'
This will create a folder called nifi.orb.local
which contains 2 .jks
files and a nifi.properties
file.
Move the two jks
files to nifi-2.0.0-M4/conf/
, for example:
mv *.jks nifi-2.0.0-M4/conf/
Next, open the generated nifi.properties
file (the one that tls-toolkit
created inside the nifi.orb.local
), and search for this part:
nifi.security.autoreload.enabled=false
nifi.security.autoreload.interval=10 secs
nifi.security.keystore=./conf/keystore.jks
nifi.security.keystoreType=jks
nifi.security.keystorePasswd=JfvOBGE6i1j+FsHKzvvv4cUjoYEH7HnQBkSjo7SRSB4
nifi.security.keyPasswd=JfvOBGE6i1j+FsHKzvvv4cUjoYEH7HnQBkSjo7SRSB4
nifi.security.truststore=./conf/truststore.jks
nifi.security.truststoreType=jks
nifi.security.truststorePasswd=ckQ5e55245alswyQd0w/ZkuEO1lChTC6mbNv4S/lxio
Copy all of these lines.
Now go open the real nifi.properties
file in nifi-2.0.0-M4/conf/nifi.properties
, find the same lines, delete them, and replace them with the new ones.
Weâre almost there! On to Python & the NiFi UI
Create the extensions dir: nifi-2.0.0-M4/python/extensions
. I donât know why this isnât created by default, when itâs configured by default, but there you go đ¤ˇââď¸.
Create a file for your Hello World
extension: nifi-2.0.0-M4/python/extensions/MyProcessor.py
Paste in the following contents to the file:
from nifiapi.flowfiletransform import FlowFileTransform, FlowFileTransformResult
class WriteHelloWorld(FlowFileTransform):
class Java:
implements = ['org.apache.nifi.python.processor.FlowFileTransform']
class ProcessorDetails:
version = '0.0.1-SNAPSHOT'
def __init__(self, **kwargs):
super().__init__(**kwargs)
def transform(self, context, flowfile):
# Import Python dependencies
input = flowFile.getContentsAsBytes().decode()
# Do something with the input
output = input
return FlowFileTransformResult(
relationship = "success",
contents = output,
attributes = {"greeting", "hello"}
)
Give it a save.
Now start NiFi with:
nifi-2.0.0-M4/bin/nifi.sh start
On your host machine, open your browser and go to https://nifi.orb.local:8443/nifi/
. Youâll get a warning about the cert being self-signed, but you can ignore it and proceed.
You should reach the login screen asking for a user & pass. Youâll need to get these from the NiFi logs. Run:
cat logs/nifi-app.log | grep Username
cat logs/nifi-app.log | grep Password
You should see lines like these:
Generated Username [ba640a89-1532-42f7-bbc4-51f194a75547]
Generated Password [1qmQNTnkFLXHlJ9R7kAaRNBE0CvuPeJz]
Copy and paste these into the login screen.
If all went well, you should now be into the NiFi canvas!
Drag a new processor to the canvas, search for MyProcess
, and you should see your Hello World
processor ready to go!
Oh how painful!
Once you reach this point, the NiFi docs are actually in reasonable shape to work with the Python Processor API. Hopefully this was enough to get you passed the absolute nightmare that is the initial setup.
Again, I love NiFi, and the team working on it are amazing people that Iâve had the privilege to work withâŚbut I hope they find some time to revisit the onboarding experience cause it suuuuuuuuuuucks.
If you didnât know, most of the NiFi team have left Cloudera and formed Datavolo around NiFi, which is pretty cool!