The Hardware Lock Principle
A control verification framework for cloud security and DevSecOps engineering
In 1985, AECL engineers removed the physical interlocks from the Therac-25 radiation therapy machine. The software was handling it. The hardware was redundant. Six patients were irradiated with doses up to 200 times the prescribed level. Three died. The machine reported no error each time.
The engineers were not negligent. They made a reasonable decision based on available evidence. The problem was structural: when the software had a race condition, there was nothing left to catch it. The only thing checking for errors was the thing with the error.
The Hardware Lock Principle is a framework for identifying when your cloud security controls have the same structural problem.
The four criteria
A control qualifies as independently verified under the Hardware Lock Principle if it satisfies all four of the following criteria. Missing one is a fail.
Criterion 1: Different mechanism from the primary control
Two controls that operate through the same mechanism are not independent controls. They are two instances of the same control. A second security group rule does not independently verify the first. A second SAST scan does not independently verify the first. They share the same mechanism, which means they share the same failure modes.
Different mechanism means different OSI layer, different AWS service, different control plane, or different organisational boundary. A network ACL and a security group both restrict network ingress, but the ACL operates at the VPC subnet level through the VPC networking layer and is stateless; the security group operates at the instance level through the EC2 service and is stateful. A failure mode that corrupts one does not automatically corrupt the other.
The test: could the same misconfiguration, race condition, or compromise that defeats Control A also defeat Control B without any additional attacker action? If yes, they are not independent.
Criterion 2: Cannot be bypassed by the same failure mode that defeats the primary control
This is the Therac-25 criterion. The physical interlocks on the Therac-20 could not be bypassed by a software race condition. By definition: they were physical. A timing bug in the control software was irrelevant to whether the mechanical beam block was in position.
In cloud architecture, the equivalent question is: what failure mode defeats the primary control, and can that same failure mode reach the independent verification layer? If a developer with write access to a Terraform repository can modify a security group configuration, can they also modify the IAM policy that the secondary control uses to evaluate it? If yes, the secondary control is not independent for that threat model.
AWS SCPs applied at the Organisation level satisfy this criterion for IAM-based threat models. An account-level IAM administrator cannot modify or remove an SCP. The control plane that manages SCPs (AWS Organisations) is separate from the IAM service within any member account. An attacker who has fully compromised an AWS account’s IAM layer still cannot reach the SCP control plane without separate credentials to the management account.
Criterion 3: Produces independent audit evidence
If the log of a control’s operation is stored in the same system as the control itself, a failure or compromise in that system can modify the log. The Therac-25 had no independent record of the doses it delivered. The machine’s own accounting of its actions was the only record. After the incidents, there was no external audit trail to verify or contradict it.
CloudTrail logs stored in an S3 bucket within the same AWS account as the resources they audit can be deleted or modified by any IAM principal with sufficient access to that account. An attacker who has compromised the account may have the access required. The log of what the attacker did is stored in a location the attacker can reach.
Independent audit evidence means: the audit trail is written to a destination that cannot be modified by any principal in the account generating the events. Cross-account write-only S3 destinations with S3 Object Lock enabled is the AWS-native baseline. The generating account has s3:PutObject permission to the destination bucket. It does not have s3:DeleteObject, s3:PutBucketPolicy, or any permission to modify the destination. The destination bucket is owned by a separate AWS account, and the Object Lock configuration prevents modification of objects once written.
Criterion 4: Has been tested in isolation, not only alongside the primary control
A control that has only ever been observed not failing while the primary control was also functioning has not been tested independently. It has been observed under normal conditions. That is different.
To test an independent verification control in isolation: disable or bypass the primary control in a non-production environment, then verify that the independent control detects and responds to the failure correctly. If the primary is an SCP and the independent is an AWS Config rule: remove the SCP and confirm the Config rule fires. If the primary is a CI/CD pipeline security gate and the independent is OPA policy enforcement at the cluster level: push a build without the pipeline gate and confirm OPA blocks the deployment.
This test should be scheduled and documented. “It has always worked” is not evidence that the independent control works independently. It is evidence that the system has not been tested under the conditions that matter.
Implementation examples by control domain
Domain 1: Identity and access management
| Layer | Control | Mechanism | Scope |
|---|---|---|---|
| Primary | IAM policies (identity-based and resource-based) | AWS IAM service evaluation | Account level |
| Hardware Lock | Service Control Policies (SCPs) | AWS Organisations control plane | Organisation / OU level |
| Audit evidence | CloudTrail management events | Cross-account write-only S3 + Object Lock | Centralised Security account |
Why SCPs satisfy all four criteria: They operate through AWS Organisations, a separate service with a separate control plane from IAM. A compromised account-level IAM administrator cannot modify SCPs. An SCP that denies iam:CreateUser, iam:AttachRolePolicy, or s3:DeleteBucketPolicy at the OU level will deny those actions regardless of what any IAM policy within a member account permits. The denial is evaluated before IAM.
Example SCP pattern for protecting audit infrastructure:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ProtectAuditBucket",
"Effect": "Deny",
"Action": [
"s3:DeleteBucket",
"s3:DeleteBucketPolicy",
"s3:PutBucketPolicy",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": [
"arn:aws:s3:::org-audit-logs-*",
"arn:aws:s3:::org-audit-logs-*/*"
]
},
{
"Sid": "ProtectCloudTrail",
"Effect": "Deny",
"Action": [
"cloudtrail:DeleteTrail",
"cloudtrail:StopLogging",
"cloudtrail:UpdateTrail"
],
"Resource": "*"
}
]
}
Apply this SCP to every OU in the Organisation. Not to individual accounts. Any account-level IAM change that grants permission to delete CloudTrail logs will be blocked at the Organisations layer before evaluation reaches IAM.
Domain 2: Network controls
| Layer | Control | Mechanism | Scope |
|---|---|---|---|
| Primary | Security Groups | EC2 service, stateful, instance level | Individual resources |
| Hardware Lock | Network ACLs | VPC networking layer, stateless, subnet level | Entire subnet |
| Audit evidence | VPC Flow Logs | Cross-account write-only S3 in Security account | Centralised Security account |
Security groups and NACLs fail independently. A security group misconfiguration that opens port 22 to 0.0.0.0/0 does not affect the NACL applied to the subnet. The NACL denies the traffic at the subnet boundary before it reaches the instance. Conversely, a NACL misconfiguration does not affect security group evaluation.
The common counter-argument: “NACLs are stateless and hard to manage at scale.” This is accurate. It is not a reason to remove them from Tier 1 subnets. It is a reason to define them narrowly. For subnets containing production database instances, a NACL that denies all inbound traffic except from your application subnet CIDR blocks, regardless of security group configuration, satisfies the Hardware Lock Principle for network controls on those subnets.
Example NACL rule for a production database subnet (applied independently of any security group):
# NACL: production-db-subnet
# Inbound rules
Rule 100: ALLOW TCP 10.0.1.0/24 (app subnet CIDR) port 5432 # PostgreSQL from app tier
Rule 200: ALLOW TCP 10.0.2.0/24 (mgmt subnet CIDR) port 22 # SSH from bastion only
Rule 32766: DENY ALL 0.0.0.0/0 # Explicit deny-all
# This NACL blocks 0.0.0.0/0:5432 regardless of what any security group permits.
# A misconfigured security group cannot override it.
Domain 3: Infrastructure state verification
| Layer | Control | Mechanism | Scope |
|---|---|---|---|
| Primary | Terraform / IaC state | State file comparison, plan/apply workflow | Resources defined in IaC |
| Hardware Lock | AWS Config rules | Actual resource configuration API calls | All resources in scope, including manual changes |
| Audit evidence | AWS Config history + CloudTrail | Cross-account delivery to Security account | Centralised Security account |
This is the IaC drift problem. Terraform state reflects intent. AWS Config evaluates actual deployed configuration by calling the relevant AWS APIs directly. A security group modified manually at 2am to resolve a P1 incident will be invisible to Terraform until the next plan run, and may never appear in a plan run if the engineer reconciles the state manually. AWS Config detects it within minutes of the change, regardless of whether IaC was used.
AWS Config managed rules relevant to security posture:
# High-priority Config rules (enable in Security Hub or directly via Config)
restricted-ssh # Flags any security group permitting 0.0.0.0/0:22
restricted-common-ports # Flags unrestricted access on known attack ports
s3-bucket-public-read-prohibited # Evaluates actual S3 ACL/policy, not IaC state
iam-root-access-key-check # Verifies no active root access keys exist
mfa-enabled-for-iam-console-access
cloudtrail-enabled
vpc-flow-logs-enabled
Config evaluates these against actual resource configuration, not IaC state. A Terraform run that marks a security group as compliant does not affect Config’s evaluation. They are independent.
For the Hardware Lock Principle to hold here, Config findings must be delivered to a Security account that the teams managing the resources cannot modify. Use AWS Config aggregators with a delegated administrator account in your Organisations structure that is separate from the accounts being evaluated.
Domain 4: Cryptographic key management
| Layer | Control | Mechanism | Scope |
|---|---|---|---|
| Primary | KMS key policies and IAM grants | IAM evaluation, software-enforced | Key usage and administration |
| Hardware Lock | AWS CloudHSM / KMS with HSM backing | Hardware security module, FIPS 140-2 Level 3 | Key material protection |
| Audit evidence | CloudTrail KMS data events | Cross-account write-only S3 + Object Lock | Centralised Security account |
AWS KMS Customer Managed Keys are backed by HSMs by default. The key material cannot be extracted. IAM policies control who can use or administer the key, but no IAM action can extract the raw key material from the HSM. This is the hardware interlock: a software vulnerability in the application using the key cannot extract the key material, because the key material never leaves the hardware boundary.
For environments requiring hardware attestation beyond what AWS KMS provides by default (UK Government OFFICIAL-SENSITIVE, FCA regulated systems, PCI-DSS Level 1): AWS CloudHSM gives you a dedicated HSM with exclusive tenancy and FIPS 140-2 Level 3 validation. AWS KMS with standard CMKs gives you FIPS 140-2 Level 2 with shared HSM infrastructure.
Enable KMS CloudTrail data events. By default, CloudTrail records KMS management events (key creation, deletion, policy changes) but not data events (every Decrypt, Encrypt, GenerateDataKey call). In an environment where key usage volume is high, data events can be expensive. For Tier 1 keys protecting regulated data, the cost is the control. You need the record of who used the key, when, and from which principal.
Domain 5: Deployment pipeline integrity
| Layer | Control | Mechanism | Scope |
|---|---|---|---|
| Primary | CI/CD pipeline security gates (SAST, SCA, image scan, OPA policy checks) | Pipeline configuration, software-enforced | Artefacts passing through the pipeline |
| Hardware Lock | Admission controllers (OPA Gatekeeper / Kyverno) or AWS SCP restricting direct API deployment | Kubernetes admission control / Organisations control plane, evaluated independently of pipeline | All deployments, including those bypassing the pipeline |
| Audit evidence | Admission controller audit logs + pipeline deployment records | Forwarded to centralised SIEM, cross-account | Centralised Security account |
The critical failure mode for pipeline-only controls: a developer with direct kubectl apply access or direct AWS API access can deploy to production without passing through the pipeline. Every SAST scan, SCA check, and OPA policy evaluation in the pipeline is bypassed. The pipeline security gate is not enforcing anything; it is only enforcing against people who use it.
OPA Gatekeeper or Kyverno admission controllers enforce policy at the Kubernetes API server level, before any resource is created or modified in the cluster, regardless of whether the request came through the CI/CD pipeline or directly from a terminal. The policy evaluation happens independently of the pipeline. A developer with direct cluster access gets the same policy enforcement as a pipeline-triggered deployment.
Example Kyverno policy blocking containers running as root, independent of what the pipeline verified:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-root-containers
spec:
validationFailureAction: Enforce # Block, not just audit
background: true
rules:
- name: check-runAsNonRoot
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Containers must not run as root. Set runAsNonRoot: true."
pattern:
spec:
containers:
- =(securityContext):
runAsNonRoot: true
This policy blocks the deployment at the API server regardless of pipeline state. An engineer who pushes directly with kubectl apply encounters the same enforcement as a pipeline deployment. The pipeline did not create this control. The pipeline failing does not remove it.
Domain 6: Data access controls
| Layer | Control | Mechanism | Scope |
|---|---|---|---|
| Primary | S3 bucket policies and ACLs | IAM evaluation, software-enforced, account level | Per-bucket configuration |
| Hardware Lock | SCP denying s3:DeleteBucketPublicAccessBlock and s3:PutBucketAcl | AWS Organisations control plane | All S3 in all member accounts |
| Audit evidence | S3 server access logs + CloudTrail S3 data events | Cross-account write-only destination with Object Lock | Centralised Security account |
S3 public access block settings can be disabled by any IAM principal with s3:PutBucketPublicAccessBlock permission. An SCP that denies this action at the Organisation level prevents any principal in any member account from disabling the public access block, regardless of what IAM grants them within the account. The SCP is evaluated before IAM. It cannot be overridden by account-level permissions.
# SCP: deny-s3-public-access-removal
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyS3PublicAccess",
"Effect": "Deny",
"Action": [
"s3:PutBucketPublicAccessBlock",
"s3:DeletePublicAccessBlock",
"s3:PutBucketAcl",
"s3:PutObjectAcl"
],
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:PrincipalAccount": "MGMT-ACCOUNT-ID"
}
}
}
]
}
The condition excludes the management account so that legitimate organisational changes can still be made from the correct account. Every other account in the Organisation cannot remove S3 public access restrictions. A misconfigured bucket policy, an IAM privilege escalation, or a compromised developer account cannot change this.
Control audit template
Use this template to evaluate your current control estate against the Hardware Lock Principle. Complete one row per critical control. A control is critical if its failure could result in data breach, compliance violation, service outage affecting regulated systems, or significant financial loss.
| Control name | Asset protected | Asset tier | Primary mechanism | Primary failure mode | Independent layer | Independent mechanism | C1: Different mechanism | C2: Different failure mode | C3: Independent audit | C4: Isolation tested | Overall: Hardware Lock pass | Audit evidence location | Isolation test date | Remediation owner |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Production IAM restrictions | All production AWS resources | Tier 1 | IAM identity-based policies | IAM policy misconfiguration, privilege escalation | SCPs at Production OU | AWS Organisations control plane | YES | YES | PARTIAL (CloudTrail in same account) | NO | NO (fails C3, C4) | None configured | Never | |
| [Your control] | Tier 1 / 2 / 3 | YES / NO | YES / NO | YES / NO / PARTIAL | YES / NO | YES / NO |
Asset tier definitions
| Tier | Definition | Examples | Hardware Lock requirement |
|---|---|---|---|
| Tier 1 | Compromise results in regulatory breach, data loss affecting customers, or service outage in regulated system | Production databases, customer PII stores, payment processing, regulated API endpoints, KMS keys protecting regulated data | All four criteria required |
| Tier 2 | Compromise results in significant operational impact or internal data exposure | Shared services accounts, CI/CD infrastructure, internal APIs, monitoring and alerting systems | Criteria 1 and 2 required; 3 and 4 strongly recommended |
| Tier 3 | Compromise results in limited blast radius, no regulatory exposure | Development accounts, non-production environments, internal tooling with no production access | Criterion 1 recommended; others optional |
Prioritisation matrix
After completing the audit, prioritise remediation as follows:
| Scenario | Priority | Action |
|---|---|---|
| Tier 1 asset, 0 criteria passed | Critical | Immediate remediation. This is a Therac-25 pattern: software-only control, no fallback, no independent audit. Define the independent layer before any other work. |
| Tier 1 asset, 1-2 criteria passed | High | Address within current sprint. Independent layer exists but audit evidence or isolation testing is missing. Close the gap. |
| Tier 1 asset, 3 criteria passed (C4 missing) | Medium | Schedule isolation test within 30 days. Architecture is sound. Verify it works under the conditions that matter. |
| Tier 2 asset, 0 criteria passed | High | Address within current quarter. Tier 2 systems are often the pivot point for Tier 1 compromise. |
| Tier 2 asset, 1-2 criteria passed | Medium | Address within current quarter. Prioritise C2 (different failure mode) if only one criterion can be addressed. |
| Tier 3 asset, any | Low | Address if Tier 1 and 2 remediations are complete. Tier 3 compromise becomes a Tier 2 or 1 problem only if there is a path to higher-tier systems. |
Common anti-patterns
These are the most frequent failures encountered when organisations apply this framework for the first time.
Anti-pattern 1: Two software controls counted as independent
A team has Checkov running in the CI/CD pipeline and AWS Security Hub evaluating deployed resources. They list these as two independent controls. They are not. Both evaluate software configuration. Both can be defeated by IAM misconfiguration in the same account. Checkov in the pipeline does not evaluate actual deployed state; Security Hub findings can be suppressed by any IAM principal with securityhub:BatchUpdateFindings. Neither satisfies Criterion 2 for IAM-based threat models.
The fix: add an SCP at the Organisation level that denies securityhub:BatchUpdateFindings for findings in a FAILED state. Now Security Hub findings in Tier 1 accounts cannot be suppressed by account-level principals. The SCP operates through a different mechanism and cannot be reached by the same IAM-level failure mode. Checkov remains a pipeline gate. The SCP is the hardware lock.
Anti-pattern 2: Audit logs in the same account as the controls they audit
This is the most common single failure. CloudTrail is enabled. Logs are delivered to an S3 bucket in the same account. The team believes they have audit coverage. They do not have independent audit coverage. Any IAM principal with sufficient access in that account can modify the S3 bucket policy, delete the trail, or delete individual log objects.
The fix is architecturally simple and operationally inexpensive: create a dedicated logging account in your AWS Organisation. Deliver CloudTrail, Config, and VPC Flow Logs to write-only S3 buckets in that account with Object Lock in Governance or Compliance mode. Apply an SCP to the logging account that denies all resource creation except through a designated pipeline. No human has console access to the logging account. No principal in any other account can write to or modify the audit buckets.
Anti-pattern 3: The independent control has never been tested independently
An SCP is applied. AWS Config rules are configured. A Kyverno admission controller is running. Nobody has ever tested whether these controls work correctly when the primary controls are not present. They have been observed functioning under normal conditions. That is not an isolation test.
Schedule a quarterly red team exercise that specifically targets the independent verification layer. Remove or bypass the primary control in a non-production environment and verify that the hardware lock fires. Document the results. If the hardware lock does not fire without the primary, the architecture is broken regardless of what normal operation looks like.
Anti-pattern 4: Hardware Lock applied to Tier 3 assets while Tier 1 assets have none
Audit results frequently show strong independent verification in development environments and weak or absent independent verification in production. The causation is straightforward: the people building the architecture were working in development environments when they designed the controls. Production controls were added later, often under time pressure, and the hardware lock layer was deferred.
Apply the framework top-down: Tier 1 first, Tier 2 second, Tier 3 last. A Tier 3 development account with four Hardware Lock criteria passed and a Tier 1 production database with zero is not a mature security architecture. It is a complete risk inversion.
The maintenance requirement
The Hardware Lock Principle is not a one-time assessment. It is a maintenance commitment.
Controls degrade. SCPs get modified to resolve access issues without documentation. NACLs get relaxed during incident response and never tightened. Cross-account audit bucket policies get changed by a well-intentioned engineer who needed access. Object Lock gets disabled because nobody could explain why it was there. The hardware lock is removed incrementally, one reasonable decision at a time, until the only thing checking for errors is the thing with the error.
The audit template should be reviewed quarterly for Tier 1 controls and semi-annually for Tier 2. Criterion 4 (isolation testing) should be re-run after any significant architecture change to a control in scope. Any change to an SCP, NACL, admission controller policy, or cross-account audit configuration should trigger an immediate re-evaluation of the affected rows in the audit template.
If you cannot answer “when was this last independently verified?” for a Tier 1 control, the answer is “never.” Treat it as such.
Further reading
- Leveson, N.G. and Turner, C.S. (1993). An Investigation of the Therac-25 Accidents. IEEE Computer, 26(7), pp. 18-41. The definitive technical post-mortem. Read the original.
- NIST SP 800-218: Secure Software Development Framework (SSDF). References Therac-25 explicitly in the context of verification and validation requirements.
- IEC 62304: Medical device software — software life cycle processes. The regulatory response to the Therac-25 and similar failures. The independent verification requirements in this standard are the formal equivalent of what this framework describes.
- AWS Security Reference Architecture (SRA): AWS’s own multi-account architecture guidance, which embeds several Hardware Lock patterns (dedicated security account, SCP guardrails, centralised logging) as defaults.
Bola Ogunlana is a Senior DevSecOps Engineer with 25+ years in cloud infrastructure, UK Government delivery, and financial services. He writes at blog.ogunlana.net. Author of Vibe Coding: Build Cloud Infrastructure at the Speed of Thought.
[…] Hardware Lock Principle framework, implementation examples, and control audit template are at blog.ogunlana.net. Apply it to your architecture before you apply it to a post-incident […]