June 10, 2024 • 5 minute read •
The Rise of Medium Code
- Name
- Nick Schrock
- Handle
- @schrockn
More people are building more software than ever before.
Through our work on Dagster, a data orchestrator deployed at companies large and small, we have seen a new class of software practitioners emerge. These practitioners go by titles like analytics engineer and data scientist. They aren’t full-stack engineers, but they still ship mission-critical code to production.
We call this new category medium code. It allows more people to write more production code, more productively, through more humane interfaces. It has been emerging for some time across multiple domains: data, infrastructure, front-end, and others. It is a powerful way to build and structure persona- and organization-spanning software platforms.
In discussing a new, novel class of software development, we cannot ignore AI. The tech community is rightly focused on AI's impact on software practitioners of all kinds. The demos of AI writing code are extraordinary, and AI is delivering real incremental value today. Many have concluded that AI will eliminate human-authored software creation entirely, including this emerging class of medium-code practitioners.
But we believe humans will remain in the driver's seat when it comes to building software and that medium code is AI-native development's natural target substrate. Code, languages, tools, models, and techniques will reflexively evolve and influence each other, but LLMs and AIs cannot author software unsupervised because human language is not precise enough to specify requirements.
In this piece, we’ll discuss what we’re already seeing with medium code using our domain—data platforms—as an example, including how medium-code practitioners work and what characterizes them. Then, we’ll discuss the future of medium code, which we believe will be accelerated—not eliminated—by the rise of AI-native software development.
Medium-Code Practitioners Today
Dagster is a data orchestrator that provides the tools and systems to build, orchestrate, and manage a data platform than spans analytics, data pipelining, model training, and production. The data platform and its data assets are essential to every modern organization and business. The data engineers and data platform engineers that bring Dagster into organizations think in terms of stakeholders. Most of a company’s employees are stakeholders in the data platform, at a minimum, as data consumers. Data consumers consume data assets produced and owned by data pipeline builders.
The data pipeline builder goes by many titles: analytics engineer, data analyst, data engineer, and data scientist, to name a few. They are not necessarily full-stack software engineers but build data pipelines that produce mission-critical data assets. They say things like, “I know just enough code to be dangerous”.
More traditional software engineers have trouble modeling a pipeline builder’s behavior. Builders are subject-matter experts who manage complex, mission-critical business logic yet often struggle with some tasks that are intuitive for formally trained engineers.
The two medium-code data pipeline builder personas below should be familiar to those who work in data:
- Analytics Engineers work in SQL or use transformation tools such as dbt. They commit code to source control, author and run automated tests, and participate in the software development life cycle.
- Data Scientists work in SQL, Python, or R, often in managed, visual tools like a Jupyter Notebook. If they are working within a platform like Fabric, Sagemaker, or Databricks, they can automate these notebooks and place them in the context of a software development life cycle.
Neither is a full-stack engineer, but nor are they “no-code” or “low-code” users: They are medium-code practitioners.
The Medium-Code Workflow
Taking these examples, here are the generalizable properties that define the medium-code workflow:
- Business logic in code in a Turing complete language like Python or a powerful declarative text-based language like SQL.
- Code resides in a coarse-grained container—such as a dbt model or notebook cell—that subdivides and organizes business logic.
- Versioned, checked into source control, and part of a software development lifecycle.
- Has internal abstractions (like functions, classes, and macros), unlike low-code or no-code tools.
- Uses tools that abstracts away environment and infrastructure concerns, like a notebooking platform or a cloud data warehouse.
- Runs in production, and therefore practitioners value testing and automation.
Typically medium-code practitioners are directly supported by full stack or platform engineers building or assembling custom infrastructure.
Empowering medium-code practitioners to reliability ship code to production has enormous value for enterprises:
- There are many more potential medium-code practitioners than full-stack software engineers.
- Medium-code practitioners own the business logic and capture and manage an enormous amount of complexity and value in that logic.
- They can ship and operate software end-to-end with the right tooling and processes.
- They have subject-matter expertise that software engineers do not have (and never will).
- There is a dramatic reduction in coordination costs and burden on software engineers, who are an expensive resource.
This point in particular is a huge unlock for the practitioner and for the business. Instead of the costly and exhausting process of convincing software engineers to do something and repeatedly providing them enough context and feedback to drive end-to-end change, a subject matter expert can just do it themselves.
Medium Code Has Porous Borders
Medium-code practitioners do not use siloed low-code, or no-code tools in their primary workflow. They write or generate code that exists in the same substrate as code written by full-stack engineers, and that confers critical advantages, among them:
- Medium-code practitioners build within the existing ecosystem of software processes and tools. Their work can be checked into source control, tested in CI/CD, automated by DevOps, monitored in observability tools, and profiled by performance tools, all like traditional code.
- Medium code has porous borders. Software engineers use familiar tools and can drop in and fix problems directly. For example, an engineer can use their existing toolchain to profile and optimize a SQL statement generated by a medium-code practitioner. This is not possible in opaque, UI-driven, low- or no-code runtime.
Medium Code Across Multiple Domains
Medium code is also a useful lens through which to analyze practitioners, tools, and their interactions not just in data platforms but across multiple domains.
- Infrastructure-as-code like Terraform and Kubernetes are medium-code tools utilized by both full-stack engineers and dev-ops practitioners alike to automate and manage the deployment of computing infrastructure
- Next.js can be classified as a medium-code platform in front-end, bringing together practitioners with different skills and dispositions to collaboratively build front-end applications under the control of a unified software engineering lifecycle.
We see this pattern emerging across many domains as a way to build systems with a wider dynamic range of humans and to give software engineers more leverage. And far from eliminating the software engineering work, we see AI as an accelerant.
Medium Code is the Future of AI-Native Software Development
There is much breathless hype about the End of Software and that LLM- and AI-based processes will replace software engineers entirely or allow for the creation of software at near-zero marginal cost. The hype has been matched by pre-revenue funding rounds at extraordinary valuations, followed by inevitable backlash and skepticism.
Given the current state of LLMs, we do not believe in a future world state where unsupervised, multi-step agents produce meaningful software systems or autonomously drive systemic change in existing complex systems. Nor do we believe that AI is merely a parlor trick or fad. Instead, we see a future that straddles these two extremes.
AIs need constraints and context to work effectively. They are wholly incapable of conceiving and designing coherent, novel, complex systems, and will not be for the foreseeable future, without massive changes in model architecture. There will always be context-specific terms, concepts, and logic that are unknowable to a generalized model. AIs operate within the context of such systems. As result the engineers that design and architect those systems will only become more valuable.
Software Still Demands Precise, Testable Outcomes
We believe that the future of AI-native software development is dramatically increasing the productivity and number of medium-code practitioners versus eliminating them. Here’s why:
- Software development in most domains still requires precise, testable outcomes, and AI is probabilistic and imprecise. Fixing this isn’t just a question of available compute or model iteration, but is a fundamental, unchangeable attribute of all currently known model and computing architectures.
- Nearly all generative models rely on a natural language interface. Even with perfect models, natural language is an insufficiently precise input to produce reliable software in a completely unsupervised manner.
These generative models must produce an artifact that a machine can precisely interpret and execute. The outcome must be assessed and evaluated by a practitioner with domain expertise.
AI can reduce the time it takes to produce an output and reduce the amount of information a human must communicate to produce that software artifact. But a human must remain in the loop to understand, interpret, and verify the output as part of a tight feedback cycle. The relationship between humans and code will mirror that of humans and mathematics, where computers are infinitely faster and accurate at the core activity, but the human must understand the underlying concepts and interpret results.
Medium Code is the Ideal AI Codegen Substrate
The question remains: why is medium code a better substrate for AI than traditional, unconstrained code? A critical reason is that medium-code tools have a property that makes them the perfect vehicle for code generation: an opinionated, coarse-grained container for arbitrary business logic. In dbt this is a model; In a notebook it is a cell; In Next.js it is a component. This container of business logic captures intent and produces a discrete artifact that can be understood by a domain expert and consumed by a runtime or system. It is natural quantum of code for a human to iterate on with code generation.
Ironically these AI front ends to these coarse-grained containers logic will improve such that they will present as low- and no-code tools. But in the end the medium practitioner still has to understand the medium. The medium here is the medium code itself. Companies like Hex are innovating in this area, and we expect more to follow.
Medium Code is Already Here, and It’s Here to Stay
At Dagster, we see medium code at work in data platforms across every imaginable industry. Medium-code practitioners with diverse subject matter expertise ship and operate data pipelines authored in a variety of tools and own the entire end-to-end process, enabled by software engineers who build and assemble an organization-spanning data platform. These engineering teams have enormous leverage to drive company-wide infrastructure management and cross-cutting policy initiatives. It’s a powerful way to work.
Far from eliminating the medium-code practitioner, we believe that AI will only accelerate this trend across multiple domains, not just data platforms. Probabilistic systems cannot produce precise outcomes unsupervised. However, AI will drive massive increases in productivity and accessibility: There will be more software development, more people developing and working with that software, and they will be more leveraged and economically valuable.
Tools and systems that understand and exploit this dynamic will deliver enormous value. At Dagster, we see a future with a vast array of AI-enabled medium-code practitioners building data pipelines in their tool of choice and seamlessly integrating it into an organization-spanning data platform. The data platform is staffed by engineers with enormous leverage and support the medium-code practitioners. As the natural system of record for metadata, operations, and context for the data platform and the data assets it produces, it serves as the foundation of any data and AI strategy at any company. It’s a thrilling future and opportunity, and we’re excited to be part of the story.
We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!
Follow us:
AI's Long-Term Impact on Data Engineering Roles
- Name
- Fraser Marlow
- Handle
- @frasermarlow
10 Reasons Why No-Code Solutions Almost Always Fail
- Name
- TéJaun RiChard
- Handle
- @tejaun
5 Best Practices AI Engineers Should Learn From Data Engineering
- Name
- TéJaun RiChard
- Handle
- @tejaun