Automated Infrastructure as Code Is Not CI/CD
- Robert Glenn
- Nov 19
- 5 min read
CI/CD stands for Continuous Integration & Continuous Deployment (or, less commonly[1], Continuous Delivery). It is a common practice for managing the Software Delivery Lifecycle (SDLC) of decoupled workload architectures. It implies that different components of a software workload are organized into disparate codebases (grouped by function, scope, domain, etc.), each with a distinct lifecycle (and, hopefully, version) that, at least generally, does not impact other such components, and it requires a series of distinct environments in which to validate the efficacy and performance of the most recent updates thereof.
Infrastructure as Code (IaC) is a natural extension of Configuration as Code. It attempts to determine the state of Cloud Resources (typically through an ordered series of API requests) in an idempotent fashion (i.e. the expected result never changes no matter how many times the action is attempted). As such, IaC can be automated: e.g. triggered by updates to a code repository or on a schedule to maintain consistency.
While automating IaC often requires similar techniques as CI/CD (an automation platform and some API/CLI) there is no integration we must manage: either the configuration is consistent, or it isn’t. Even if we’re working in multiple codebases, all dependencies should be one-way and are generally hard. There is no need to “run in an integration environment” for e.g. “smoke testing purposes”, as the configuration will work (based on the presence of resources applied by other code), or it doesn’t.
Similarly, there is no “environment” to deploy into: the IaC creates the environments on which to deliver software when its code is applied against the cloud provider (i.e. successfully made, in order, to completion). If you’re testing out a new and completely different configuration (maybe changing how you group workloads or the topology of your networks), then it should be done in some siloed and sandboxed-off area of your environment (if not an entirely new account for certain redesigns to avoid any chance of accidental permission bleeding or overlap), not a persistent location, especially one that could be considered a “development environment”. How can you guarantee your developers have an environment to develop within if you are deliberately injecting such systemic change? Remember, this is primarily a technique and concept developed around Agile methodologies. Agile projects work well when the environments in which they are delivered are extremely well known, and thus predictable; this allows complete control over the variables in the system. In our case, we are building the system, and thus we must actively maintain a high level of predictability. Incremental changes can be made directly to an environment, so long as they are done so safely and non-destructively (or destroyed after a deprecate/decommission period). Ideally, your siloed/sandboxed Proof of Architecture environment is completely ephemeral, so it can be totally erased or rebuilt rapidly and cleanly.
There is also an implication today that, after the “integration stage”, configurations are treated as immutable, with integration and scaling considerations handled automatically by the system. I can simply take the exact same configuration from “development”, and (perhaps up to some minor environment updates e.g. domain names, IPs/CIDRs, etc.), run it directly in “production”. But we are defining the system itself. We must exactly define the replication and scale of our resources, and provide the very integration harnesses that enable this graduation system for software delivery. Production might span multiple regions or nationalities, whereas development might be fully hosted in a single region. We can (and should) construct the code itself to be immutable and reusable, but perhaps we shouldn’t drive toward a purely no-touch development-to-production pipeline. A capability for building reusable/remote modules may sufficiently provide the level of abstraction necessary to minimize differences between “development” and “production” IaC.
In a way, all of your applied IaC is “production”. It’s certainly live. The running services that comprise your “development environment” are just as real and accurately billed as those that make up your “production environment”. A compromised jump box in dev might still lead back to workloads running on-premises, or a stolen service account key with sufficient permissions could be used to spin up a crypto-mining operation over a long weekend. Don’t be cavalier with your so-called development environments. They are only “development” for your developers.
Perhaps a better term could be Automated Consistency (AC)[2], but this still is fairly suggestive and doesn’t fully capture the nature of the practice. E.g. with CI/CD, and specifically with decoupled architectures, one expects to be able to completely replace one individual member’s instance with a completely new instantiation without any problems. However, non-degenerative IaC implementations will always have some hard dependencies (and cascading effects)[3].
There’s also an implication on validation techniques used in software workload delivery (unit, integration, e2e, functional, performance, load, etc.) that are mostly meaningless in the context of IaC. We can usually provide something akin to linting and perhaps we can construct (or license) something that will perform Static Code Analysis (however clumsily/insufficiently). We can also measure the “performance” of an applied IaC (“environment”) by running performance tests on whatever software workload is expected to run on the environment. However, calling this a “performance test” of the IaC itself is…overly suggestive. Finally, I suppose we could take the time (and incur the API charges/quotas) to detect whether our IaC platform (or really, the underlying cloud provider APIs) lied to us when it reported on what it has applied, but that also will rely on the very same underlying provider’s APIs. Why believe one abstraction, but disbelieve another?
There is indeed some merit to determining what is “live” (i.e. the set of resources actually running in the cloud, and thus incurring consumption charges) that is not encoded in the current IaC context, but this may be difficult to determine with sufficiently complex IaC. The “best” in terms of Occam’s Razor may be to diff the response of listing the resources (e.g. with API/CLI command) with the output reported by the IaC framework under a known parent element, highlighting the delta (which after a successful implementation of the IaC, should just be what is not encoded).
Whatever the term we use (“IaC Dev(Sec)Ops”? “IaC Pipeline”? “Automated Consistency”?) we need something that is distinct from the process associated with the software delivery lifecycle. It is sufficiently different and holds a great deal of real risk (it is your live cloud resources, charging real money and storing real data), and thus is extremely important for everyone to understand and communicate/reason about intelligently.
Footnotes:
[1] At least in vocabulary. Continuous Delivery is actually quite common in practice, especially for production, as it generally implies either a human-approved release gate or a poll-and-pull configuration such as GitOps.
[2] Continuous implies to me a high degree of validation before something (e.g. a binary) is even introduced into an environment. There’s only so much validation one can conceivably do with IaC and parallelization of builds is all but impossible without extremely sophisticated set-ups. I can’t in good conscience recommend anyone look away from their IaC for too long, even in extremely sophisticated set-ups. This is your foundation, upon which all other workloads will emerge; never sleep on it. However, we can guarantee a high degree of automation, and with all modern IaC tools/platforms, consistency across IaC states is the real name of the game.
[3] In my opinion, dependencies can (and should) be diminished (though it may not be worth it to truly minimize) to balance code readability with ease of configuration. Perhaps there is a clever technology or technique that (one day) provides a capability analogous to “service discovery” for IaC, allowing for a fully decoupled architecture, but even so, this will not remove the dependencies/cascading effects, as these are inherent to the nature of the cloud, and not a feature of any delivery technique.
This article was originally published on Medium on 08/13/2022




Comments