The (lacking) future of ETL
ETL provides no real value, itâs just really expensive glue. So-called âZero-ETLâ has the potential to free up so much $$$ that can be put towards something actually useful.
That said, we are a long, long way off from this becoming a reality. ETL is going to be around for a long while yet.
I think weâll see stepped adoption: smart data vendors will begin to build âETLâ into their products, just as a value-add feature, simplifying their usersâ stacks and enabling easier adoption & consumption of their tool.
Slower/larger vendors will either buy the ETL vendors in the market, or create a managed service from FOSS tools. Theyâll package it up to heavily incentivise it over anything else.
The ETL vendors that donât get bought will struggle to convince people that they need to part with their cash for something that should just be a feature and doesnât provide any value of its own.
Adjacent, weâll see more adoption of common backends, storage layers, protocols (+ a bunch of new ones, some good, some just cashing in on the hype train). Using common storage layers, data formats, table structures (Apache Iceberg , Delta Lake), and using common interfaces (duckdb, Apache Arrow).
The developers of these systems will be freed from re-building yet another data format, ser/des, network IPCs, and can focus on building the bits that actually âdo something differentâ or âdo something usefulâ.
The developers using these systems will be freed from needing to learn yet another ETL tool, building pipelines, maintaining more infraâŠand can focus on making their data valuable.
The business will be freed from the ever-increasing spending bloat of ETL, and can instead invest that $$$ in properly utilising, or adopting new, tools that let them do new things.
Again, weâre a long way off from this being a reality, and thereâs going to be a lot of marketing noise from vendors that try to convince you that they have already have Zero-ETL (they donât).
But itâs an exciting vision of the future.
Final thoughts:
duckDB, Apache Arrow and Apache Iceberg are going to be core to the future of data tooling.
The storage layer is ripe for innovation. Each cloud vendor has a blob storage service, and theyâre all pretty old and have limitations that arenât keeping up with everything else (particularly around speed). There needs to be innovation here, ideally with a standardized API.