This is a pattern I have seen quite frequently, especially in IoT flows. At a high level: we have a stream of sensor data coming in from our IoT devices, and an external service that contains additional contextual data exposed via a REST API. With every sensor message, we want to call the REST API,… Continue reading Enriching Records with LookupRecord & REST APIs in NiFi
This post describes and demonstrates how to use NiFi and NiFi registry to develop flows in Dev, version control the flow, and then deploy the versioned flow to Prod.
It’s always a bit of a learning curve to get started with any new tool, not to mention keeping up to date with a tool that is under heavy active development. Here’s a list of invaluable resources to consult: NiFi Anti-Pattners by Mark Payne Mark Payne is the co-creator of NiFi and has a running… Continue reading NiFi Resources For Learning & Improving
This will quickly discuss how to configure multiple Listeners, with the intent of having a unique Listener for External/Client traffic and another for Internal/Inter-broker traffic (and how this can be done with Cloudera Manager which requires a slight work-around in the current versions pre-2021). There’s several valid use cases for multiple Listeners. In this case,… Continue reading Kafka with multiple Listeners and SASL
If you didn’t catch the previous post, you can check it out here: Modern Streaming Architectures – From The Sky. Ingest sounds simple, but can be hard to get right. We often have many different sources of data, sending in different formats, in different volumes, with different schedules, different delivery guarentees and different delivery mechanisms.… Continue reading Modern Streaming Architectures – Ingest
There’s a few different ways this could be done. I’ll demonstrate one possible way, using ExecuteSQL to connect to Impala via the JDBC driver.
The most important part of this entire architecture is the movement. It’s a streaming architecture, a streaming architecture implies that there is data in motion. I like this term a lot, ‘data in motion’. Catchy.
You should have root access to the CentOS host and a new target directory ready for the MariaDB data. For this guide, our new target directory is /data/database. First, stop MariaDB. systemctl stop mariadb Now, copy your existing database directory to the new location. By default, it is /var/lib/mysql. If it’s not there, check the… Continue reading Moving the data dir of MariaDB on CentOS7/RHEL7
It’s 2020 and we’re creating more data than ever. We each use the internet, in some way or another, for almost eveything we do. Listening to music, reading a book, playing a game, managing our money, talking to our friends and family – these are all tasks that used to be offline experiences that we’ve… Continue reading Modern Streaming Architectures – Intro