There is no Data Engineering roadmap
Between Reddit, twitter, LinkedIn and various Slack communities, I see multiple junior folk looking to break into Data Engineering and asking for advice. Every single day. Many ask for a âroadmapâ or some kind of step by step lesson plan that will land them their dream job. I donât believe that such a roadmap exists.
Newbies are welcome đ
I have seen some say âData Engineering is not an entry level roleâ and this is nothing more than toxic gatekeeping. Data Engineering is no more, and no less, complex than any other software discipline. Every discipline is open to newbies. If you want to get into data, you can do it. You donât need to âgraduateâ into it from a different discipline.
Moving sideways into Data Engineering is very common, not because itâs necessary, but because Data Engineering is relatively new as a somewhat well-defined job category. Data Engineering teams havenât been commonplace for long, in fact, thereâs many industries still just starting to catch on. Many Data Analysts and Software Engineers already have at least some level of hands-on experience with data, so it makes total sense to use & develop those skills. This has happened with every single kind of engineering role in the history of engineering. But thereâs only so many people who want to make the switch, and you canât reallocate everyone from your other teams. So, Data Engineering absolutely needs entry-level engineers.
People used to say âSoftware Engineering is not an entry level roleâ. They donât anymore, because people know it is total rubbish.
Everyone is welcome in data.
All you need is love SQL â¤ď¸
So, how does an entry-level engineer get started in Data Engineering?
Firstly, go unfollow all those influencers on LinkedIn and Twitter. You donât need them, in fact, they are dangerous. Theyâre not here to guide, help or teach you. They will take you down a path of failure so that you are more open to giving them your money for a quick win (rant).
With that out the way, understand that there is no roadmap. There is no single path, no clear linear progression of knowledge. No one can tell you that you absolutely must learn A, then B, then C and youâre guaranteed to be a successful Data Engineer.
The same applies to pretty much all engineering roles; front end, back end, embedded systems, networking, analytics. Whatever.
In all of these cases, there are basics that everyone should get familiar with, and these are usually enough to get you your first gig. Remember, there is a fundamental difference between âWhat do I need to get my first job?â and âHow do I progress my career?â.
For Data Engineering, there is only one skill that is absolutely, non-negotiably, the first thing you should learn to get started.
SQL.
Yep, SQL. Itâs not dead. It never will be. SQL is the cockroach of data and itâs not going anywhere. People have tried to displace it, and they have all failed.
SQL is the only skill that every single Data Engineer uses every single day. No other skill or tool can claim the same. Python is common, some folks are using Scala, Snowflake is popularâŚbut thereâs more data teams not using those tools than those who are. But not for SQL.
Youâre an entry level engineer, you donât need to be an expert in SQL. You need to be able to solve problems. When you write some shit SQL, and you absolutely will, one of two things will happen:
Thing 1, someone will tell you itâs shit and youâll learn how to do it better.
Thing 2, people will thank you for solving the problem and ask you to do something else.
Win win.
How should I learn SQL?
If youâre new to SQL and databases, you should know that âSQLâ is very poorly standardised. Youâll hear folks say âANSI SQLâ which many think is some kind of standard, but itâs not really. Anyway, if that topic interests you, read up on it. There is a common SQL base, but pretty much every single database in existence customises and extends SQL to do whatever it wants.
This means that SQL you write for Postgres might work in MySQL, but donât be surprised if it doesnât. The same is true across Microsoftâs SQL Server, Oracle Database, BigQuery, Redshift, Snowflake, ClickHouse and any other database you can think of. The different flavours of SQL used by these databases are called âdialectsâ.
This can make it daunting to get started, but itâs no different than getting into Software Engineering. Thereâs a million programming languages and most people start with one and then try a bunch of others.
So, pick any database, donât worry about the dialect. The database is simply a vehicle for you to learn SQL.
Stuck? Start with Postgres. Itâs the worldâs favorite free, open source database. You canât go wrong starting with Postgres.
Follow some simple tutorials; get it set up, load some data and start asking questions with SQL.
Google around for âSQL challengesâ, thereâs loads. Some are better than others, just go through them all and challenge yourself. As your knowledge improves, look for harder problems and bigger data sets.
When youâre starting to feel confident - change database. Try solving the same problems with MariaDB. Then try out Googleâs BigQuery (there is a generous free-tier for BigQuery, be careful to stay under the limits and you wonât pay anything).
Pay attention to how your queries change, particularly with more complex queries. Notice that different kinds of queries are faster or slower between databases. Get used to reading the SQL reference documentation for each of these databases.
If you need something more guided, there are plenty - literally thousands - of free SQL resources on the internet. Thereâs nothing wrong with following a free SQL introduction course, but always challenge what you learn by applying it to a different database.
You will have time to specialise in a specific database later in your career, now is not the time.
If youâre in the UK, the UK Gov is sponsoring a whole bunch of entry-level bootcamps across loads of sectors. One of the biggest areas of funding is Data Engineering. I canât vouch for these bootcamps, some look better than others, but they are free, so if youâre eligable, why not? https://www.gov.uk/guidance/find-a-skills-bootcamp/
What if I already know SQL?
If youâre coming from a Data Analytics background, or any other role where youâre already reasonably comfortable with SQL, then you have it easy. Anyone with this background should be able to start looking for an entry-level or junior Data Engineering role.
Now, if youâre sitting on 15 years of experience as an Analyst, going back to a junior role might not be something youâre willing to do - but thatâs a different discussion.
Whatâs next?
What about Python? Pandas? dbt? Rust? Airflow? Spark?
Later. These are all things you can learn on the job if the job even needs them.
Go get your first data job. Iâm not going to tell you it will be easy. Lots of people struggle to find the right entry-level job in all fields of engineering.
But when you land it, make it your primary goal to absorb the knowledge from your new colleagues. Learn something every single day.
When the learning stops, move on. Use what youâve learnt to get a pay bump and find new people to learn from.
Rinse and repeat. Thatâs your roadmap.
Everything else comes later. Go get your hands dirty.
đśď¸ A quick rant đśď¸
Unfortunately, I see a lot of bad advice handed out. Now, much of it is others just innocently sharing an opinion, but, often there is a clear financial incentive behind it. Vendors who want their tools to be the âbaselineâ to enter the industry. So-called âinfluencersâ who take money from those vendors, or want to convince you to buy their Data Engineering bootcamps. Because, if data engineering is spooky and complicated, youâre more inclined to buy their âland your first job in 90 daysâ course, right?
Now, thereâs nothing wrong with vendors advertising their tools, or individuals creating genuinely helpful content that earns themselves a living. Tools have a place in the world, as do creators. But certain bad actors target junior and entry-level engineers who donât have the experience to identify blatant bullshit from real advice. Be wary.