October 28, 2024 • 8 minute read •
AI's Long-Term Impact on Data Engineering Roles
- Name
- Fraser Marlow
- Handle
- @frasermarlow
Here, I summarize the discussion and findings from several recent executive-level articles on AI across domains, and how these predictions impact data engineering.
We'll explore two key ways the data engineering landscape will likely be affected over the longer term:
(A) the impact on the demand for DE skills
(B) the application of AI-powered tools in the day-to-day work of data engineers.
Before we get to those topics, let’s summarize how executive teams are viewing the business opportunity of AI overall, according to the business press:
Businesses Are Planning on Spending Big on AI
For decades now, the application of data science in DecisionOps and ML has held a tantalizing promise to radically transform businesses, but for most companies, they have proven onerous and expensive.
Executives are, however, very excited about the potential of Gen AI, likely charmed by the consumer-friendly nature of LLMs and the immediate path-to-value of this revolutionary and versatile technology, meaning they can see immediate and real improvements in productivity on teams that adopt even basic Gen AI tooling.
Vendor-backed reports from outlets like MIT (see Surveying the State of GenAI in Data and Analytics) are now generating executive-level FOMO. McKinsey “boldly proclaimed that LLMs and other forms of Gen AI could grow corporate profits globally by $4.4 trillion annually.” [HBR]
Within a few years, all companies will be competing to build in-house teams to leverage the power of AI in ways that other data science approaches have not. As a result, industry observers predict that spending on AI will double in 2024 and will scale to $151 billion by 2027 (See IDC: Spending on GenAI Solutions Will Double in 2024 and Grow to $151.1 Billion in 2027).
Analysts expect a rapid transition from the adoption of tools like ChatGPT that add immediate value to an individual’s workflow, to the embedding of Gen AI steps into established corporate processes like customer service, sales outreach, or financial forecasting.
Eventually, analysts expect to see rapid adoption and integration of AI techniques across all verticals and company sizes using a variety of in-house and AI-aaS approaches, dedicated AI point solutions (coding, chatbots, writing assistants), and embedded AI capabilities in all major SaaS solutions. This may prove to be part of the early hype cycle, but analysts remain bullish as the technology rapidly matures.
HR teams, meanwhile, are consulting tools like the Evercore AI Impact Navigator to chart changes in job scope.
Part 1: A Growing Demand for AI Skills in Data Engineering
The International Monetary Fund's (IMF) Staff Discussion Note Gen-AI: Artificial Intelligence and the Future of Work finds almost 40% of employment globally is exposed to AI, which rises to 60% in advanced economies.
This has prompted concerns that software engineering jobs will be threatened. These fears were not helped by alarmist reports by the IMF or comments such as Jensen Huang's, "It is our job to create computing technology such that nobody has to program."
Now that things have settled down a bit, workplace analysts predict that, while major disruption should be expected, the advent of AI will only bolster the demand for data engineering skills.
A Forecasted Growth in Demand for Data Engineering Skills
According to the World Economic Forum's Future of Jobs Report 2023, which is based on a survey of 11 million workers across 803 companies, jobs in the data domain will see a surge in demand, namely:
An analysis by Zappia concludes that data engineering jobs will grow at a rate of 21% from 2018-2028 with about 284,100 new positions created over the next decade.
As per Fortune Magazine, “Even with advances in generative AI, this field is likely to continue to grow and offer well-paying and intriguing jobs.” [See compensation data here.]
The consensus is that data domain jobs are safe. The roles most likely to suffer are the repetitive data-processing-related ones such as bank clerks, postal service clerks, cashiers, or any other role that interfaces between the physical world and the digital realm.
Analysts further predict that AI will have a range of impacts on job scope, primarily based on the individual tasks involved rather than the job category itself. In many cases, it will transform current jobs beyond recognition or create entirely new professions.
DE and AI: a blurry line
As organizations continue to adopt AI-related technologies, the boundaries between data engineering and AI development are becoming increasingly blurred. Indeed, senior executives will not always see a difference in these competencies but will be looking to accelerate the adoption of AI technologies across the enterprise.
Traditionally, data engineers have focused on building and maintaining data pipelines, ensuring that high-quality data flows smoothly to support analytics and decision-making. However, the rise of AI means data engineers are now tasked with supporting more complex use cases, such as training machine learning models, managing data for AI applications, and ensuring scalability for inference workloads. Here at Dagster Labs, we are seeing a steady integration of AI calls and sophisticated data science steps being woven into data pipelines.
Skills like model lifecycle management, understanding of ML frameworks, and data preprocessing for machine learning will become essential. Data engineers will be expected to work more closely with data scientists, translating AI requirements into practical data architectures and workflows. To keep up with this trend, data engineers will need to expand their skill sets to include a solid understanding of machine learning concepts, AI model integration/deployment, and familiarity with platforms that facilitate AI development.
Is your tooling and infrastructure ready?
Integrating AI into business processes requires data infrastructure that is capable of supporting real-time and large-scale AI models. Engineers who can navigate the complexities of operationalizing AI, including aspects like data versioning and governance, will be especially valuable. Future data engineers will need to evolve into hybrid roles that bridge data engineering, machine learning operations (MLOps), and cloud infrastructure expertise.
Part 2: AI Tools Will Transform Data Engineering Workflows
While AI will require new skills, it will also empower data engineers by fundamentally changing how we work.
A prime example is GitHub Copilot, which is rapidly becoming an indispensable tool, helping engineers code and troubleshoot issues in real time. A field study on the performance impact of CoPilot, conducted by MIT Sloan, suggests that “developers became more productive, completing 12.92% to 21.83% more pull requests per week at Microsoft and 7.51% to 8.69% at Accenture”. London-based edTech entrepreneur David Lefevre is one such convert: “By incorporating AI-driven tools such as GitHub Copilot and ChatGPT Code Interpreter, his development team’s productivity has tripled.” (Is Your Job AI Resilient? - Shrier, Emanuel, and Harris)
Nonetheless, experts agree: “The question of how to integrate AI into knowledge work to successfully harness these advantages remains a challenge.” (Maryam Alavi writing in HBR)
Workplace observers are seeing clear patterns in how AI adoption is driving performance:
AI adoption tends to be personal rather than a broad process change.
AI is adding the most value for people ramping up a learning curve.
AI is not replacing jobs, but providing pinpoint tools around specific tasks.
AI tools remain individual support systems but do not yet assist across teams.
Data engineering teams can look beyond coding tasks to boost productivity and creativity.
Let’s double-click on each of these.
AI Adoption Is a Personal Journey
As an engineering leader, it’s hard to mandate a team-wide approach to AI. "The way Gen AI is integrated into knowledge work is necessarily fluid and defies exact specification and standardization," writes Maryam Alavi in How Different Fields Are Using Gen AI to Redefine Roles.
A more productive approach is to encourage experimentation and sharing of best practices on how AI can boost productivity, and let each team member figure out their adoption journey.
As an Engineering Leader, you should be encouraging team members to adopt AI (A) for the most challenging/time-consuming tasks (like complex coding) and (B) to handle “the ‘dumbest things’ like generating the changelog for a software release.” [Quinn Slack, CEO Sourcegraph]
AI Tools for Ramping Up New Team Members
Researchers are finding that coding assistants provide the most value during an engineer’s ramp-up phase. Tools like Copilot help new team members quickly get up to speed. Studies suggest that for experienced engineers, assisted approaches rapidly reach a point of diminishing return.
“These early studies demonstrate that less experienced workers can benefit more from the use of Gen AI job crafting compared to with their more experienced and higher performing counterparts. This in turn may lead to accelerated learning and productivity increases for employees new to their roles, while easing the coaching and support burden on their seasoned colleagues.” (Maryam Alavi writing in HBR)
At higher levels of expertise, enforcing AI can become detrimental. Quoting a study conducted in call centers that mandated AI, Waber and Fast write, “... top employees’ performance actually decreased with this system, which presents potential problems for innovation, motivation, and the retention of a firm’s best performers.” (See Is GenAI’s Impact on Productivity Overblown?)
More information on this study is available in the non-gated article from Stanford Business Generative AI Can Boost Productivity Without Replacing Workers.
Not Replacing Jobs, but Handling Tasks
“Tasks, not jobs, will be the building blocks of workplace transformation going forward.” write Ghosh, Wilson, and Castagnino in HBR. They suggest that rather than replace roles or functions, AI will bring companies to redefine the scope of existing jobs.
They specifically point to the data domain: “Take the job of a data scientist, where 76% of all work time can be impacted by Gen AI, enabling a 25% improvement in achievable productivity given the current state of technology and practice.”
“Where we have seen greater effectiveness is in productivity acceleration or productivity enhancement such as when a senior software engineer can automate a significant portion of code development by using AI systems but then manually adjust the code to optimize it.” (See Is Your Job AI Resilient? - Shrier, Emanuel, and Harris)
Industry commentators point to the individual importance of upskilling, and knowing when to lean on AI to help with tasks like field mapping, text analysis, letting stakeholders self-serve, and generally integrating 3rd party AI APIs.
“As a rule of thumb, tasks that entail recurring processes are candidates for full automation with gen AI, while those that require creative reasoning, collaboration, and judgment are candidates for augmentation with AI.” (Bhashkar Ghosh, Accenture).
Personal Assistants, but no Multiplayer Mode
Another observation by analysts is how AI tooling is bringing value to individual practitioners, but has not yet cracked the code on improving performance related to cross-team collaboration.
AI can certainly eliminate some of the drudgery related to teamwork (meeting note summarization, changelog maintenance, helping expand ideas during a brainstorming session, etc.) but does not yet truly foster better collaboration.
Some of the most promising potential in AI is in learning team best practice and coaching an entire team, rather than individual members, to higher levels of productivity. But we are not there yet.
Recent case studies (like McKinsey and Co. or Boston Consulting Group) point to promising development in this area, but ultimately it does come down to tools to help individuals be more productive rather than truly improve teamwork and collaboration.
Beyond Coding: Creativity and Ideation
Vendors in the dataspace make bold claims of AI monitoring pipelines, creating dashboards, and doing many of the higher-order tasks data engineers are concerned with today. While our tools do not yet deliver on this premise, we should expect the data engineering toolkit to start to handle some of these more demanding tasks, enabling engineers to dedicate more time to strategic decision-making and innovation.
Here too, AI can help. Gen AI is great for expanding on ideas and adding to a list of ideas the team has generated. It can help create more comprehensive lists of possible solutions to explore.
But be aware that in some studies, introducing Gan AI into ideation sessions has been shown to stifle innovation: see Don’t Let Gen AI Limit Your Team’s Creativity by Kian Gohar, CEO of GeoLab.
Looking Ahead
The advent of Gen AI will only ramp up expectations of what the data engineering function can deliver, but executives are expecting this to require a sizable investment. As engineering leaders, we need to understand how data drives value within the business and be prepared to engage with leadership in unlocking that value.
The long-term impact of AI on data engineering jobs is a story of transformation—both in the skill sets required and the nature of the work itself. Data engineers will need to embrace AI as both a challenge and an enabler: acquiring new AI-related skills to meet the growing demands of the industry while leveraging AI tools to boost productivity and enhance creativity. The future will favor engineers who adapt to these changes, blending traditional data engineering expertise with an evolving AI-first approach.
We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!
Follow us:
Interactive Debugging With Dagster and Docker
- Name
- Gianfranco Demarco
- Handle
- @gianfranco
10 Reasons Why No-Code Solutions Almost Always Fail
- Name
- TéJaun RiChard
- Handle
- @tejaun
5 Best Practices AI Engineers Should Learn From Data Engineering
- Name
- TéJaun RiChard
- Handle
- @tejaun