How to create a Python Processor in Apache NiFi 2

(The start of this post is a bit of a “What this bolognese recipe means to my family”-style rant. If you just want the steps, skip ahead)

I love NiFi, it’s a great tool, and NiFi 2 added support for Python Processors. This was pretty exciting ‘cause I hate working with Java (the language itself is…fine, but the ecosystem and developer experience are terrible). I’ve been using NiFi on-and-off since 2018-ish, and worked at Cloudera (the primary corporate sponsor of NiFi) for a few years doing a lot of NiFi work.

As excited as I was, and confident in my NiFi knowledge, I struggled to get to a ‘Hello World’ moment with Python Processors. While I might be a bit out of date, I found the documentation practically useless from a ‘getting started’ perspective, and NiFi’s ‘I just want to play’ experience is, and has been for some time, just down right awful.

Part of this is due to the secure-by-default decision, made somewhere around version 1.14.0. It somewhat made sense at the time (I was working at Cloudera when this decision was made); NiFi came from the security world and found most of its early adoption in more secure, enterprise environments. Forcing these enterprises to follow sane security best practices stopped people doing stupid things and getting into trouble later on…

…but, setting up security on NiFi isn’t trivial, particularly if you’re not close to the “Java Way”…and if you’re excited by Python Processors, it’s probably because you’re not close to the “Java Way”.

It is, frankly, a truly shitty first impression. It was already hard when 1.14.0 was relevant and docs/examples were up-to-date, but now that most of that content is outdated and useless, it’s even more of a confusing mess. It’s quite sad because, once you get past it, NiFi is such an incredible tool, but expecting people to fight through this clusterfuck of a setup…it’s just not going to happen, and it will kneecap adoption. This is my fourth attempt at getting a ‘Hello World’ example running, I fought and gave up 3 times before finally getting it working.

I know it’s open source, I know I could contribute. But this needs directional agreement, and I likely wouldn’t be fluent enough in Java to contribute it anyway. So this is my contribution: a guide to help people like me, and a plea to the NiFi team (who are all amazing btw), please, please, please, please…improve the NiFi onboarding experience to not suck quite this much.

From 0 to ‘Hello World’ steps

This was done on an M1 Macbook Pro, using OrbStack to run a Linux VM (Ubuntu).

Set up your machine

I use OrbStack to run VMs & containers on my Mac.

For this guide, use an Ubuntu 24.04.1 VM. Name the VM nifi.

If you used OrbStack, the machine’s hostname is configured as nifi and this will create a local DNS name of nifi.orb.local. This is important, because the TLS configuration of NiFi absolutely requires hostnames and will not work with IP addresses.

Access the terminal of the VM. Install JDK 21 with:

sudo apt install openjdk-21-jdk

And verify with:

java -version

Now configure your JAVA_HOME. Edit /etc/environment and add a line at the bottom (replace the path with your own, it’ll be under /usr/lib/jvm/):

JAVA_HOME="/usr/lib/jvm/java-21-openjdk-amd64"

Then run:

source /etc/environment

Now we need to set up Python 3.11 (as this Ubuntu ships with 3.12, which isn’t supported by NiFi yet):

sudo apt install software-properties-common
sudo apt update
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11
python3.11 --version

Finally, install wget and unzip:

sudo apt install wget unzip

Setting up NiFi 2

Download NiFi 2.0.0 M4 and unzip it:

wget https://dlcdn.apache.org/nifi/2.0.0-M4/nifi-2.0.0-M4-bin.zip
unzip nifi-2.0.0-M4-bin.zip

Edit the nifi-2.0.0-M4/conf/nifi.properties file. Change the nifi.web.http.host to 0.0.0.0. Then update nifi.python.command=python3 line to nifi.python.command=python3.11 and remove the # from the start of the line. Save the file.

Then download the previous version of NiFi Toolkit (1.27.0) and unzip it:

wget https://dlcdn.apache.org/nifi/1.27.0/nifi-toolkit-1.27.0-bin.zip
unzip nifi-toolkit-1.27.0-bin.zip

You need the old version because tls-toolkit was deprecated with NiFi 2.0 and no longer ships with it. This makes it even more of a pain in the ass to do the initial TLS setup. Luckily, the old version of tls-toolkit still works and generates valid certs & config, so we can use it for now.

Run tls-toolkit like so (make sure to edit the nifi.orb.local if your FQDN is different):

nifi-toolkit-1.27.0/bin/tls-toolkit.sh standalone -n 'nifi.orb.local' 

This will create a folder called nifi.orb.local which contains 2 .jks files and a nifi.properties file.

Move the two jks files to nifi-2.0.0-M4/conf/, for example:

mv *.jks nifi-2.0.0-M4/conf/

Next, open the generated nifi.properties file (the one that tls-toolkit created inside the nifi.orb.local), and search for this part:

nifi.security.autoreload.enabled=false
nifi.security.autoreload.interval=10 secs
nifi.security.keystore=./conf/keystore.jks
nifi.security.keystoreType=jks
nifi.security.keystorePasswd=JfvOBGE6i1j+FsHKzvvv4cUjoYEH7HnQBkSjo7SRSB4
nifi.security.keyPasswd=JfvOBGE6i1j+FsHKzvvv4cUjoYEH7HnQBkSjo7SRSB4
nifi.security.truststore=./conf/truststore.jks
nifi.security.truststoreType=jks
nifi.security.truststorePasswd=ckQ5e55245alswyQd0w/ZkuEO1lChTC6mbNv4S/lxio

Copy all of these lines.

Now go open the real nifi.properties file in nifi-2.0.0-M4/conf/nifi.properties, find the same lines, delete them, and replace them with the new ones.

We’re almost there! On to Python & the NiFi UI

Create the extensions dir: nifi-2.0.0-M4/python/extensions. I don’t know why this isn’t created by default, when it’s configured by default, but there you go 🤷‍♂️.

Create a file for your Hello World extension: nifi-2.0.0-M4/python/extensions/MyProcessor.py

Paste in the following contents to the file:

from nifiapi.flowfiletransform import FlowFileTransform, FlowFileTransformResult

class WriteHelloWorld(FlowFileTransform):
    class Java:
        implements = ['org.apache.nifi.python.processor.FlowFileTransform']
    class ProcessorDetails:
        version = '0.0.1-SNAPSHOT'

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def transform(self, context, flowfile):
        # Import Python dependencies
        input = flowFile.getContentsAsBytes().decode()
        # Do something with the input
        output = input
        return FlowFileTransformResult(
          relationship = "success",
          contents = output,
          attributes = {"greeting", "hello"}
        )

Give it a save.

Now start NiFi with:

nifi-2.0.0-M4/bin/nifi.sh start

On your host machine, open your browser and go to https://nifi.orb.local:8443/nifi/. You’ll get a warning about the cert being self-signed, but you can ignore it and proceed.

You should reach the login screen asking for a user & pass. You’ll need to get these from the NiFi logs. Run:

cat logs/nifi-app.log | grep Username
cat logs/nifi-app.log | grep Password

You should see lines like these:

Generated Username [ba640a89-1532-42f7-bbc4-51f194a75547]
Generated Password [1qmQNTnkFLXHlJ9R7kAaRNBE0CvuPeJz]

Copy and paste these into the login screen.

If all went well, you should now be into the NiFi canvas!

Drag a new processor to the canvas, search for MyProcess, and you should see your Hello World processor ready to go!

Oh how painful!

Once you reach this point, the NiFi docs are actually in reasonable shape to work with the Python Processor API. Hopefully this was enough to get you passed the absolute nightmare that is the initial setup.

Again, I love NiFi, and the team working on it are amazing people that I’ve had the privilege to work with…but I hope they find some time to revisit the onboarding experience cause it suuuuuuuuuuucks.

If you didn’t know, most of the NiFi team have left Cloudera and formed Datavolo around NiFi, which is pretty cool!