• +44(0)7855748256
  • bolaogun9@gmail.com
  • London

The Hardware Lock Principle

A control verification framework for cloud security and DevSecOps engineering

In 1985, AECL engineers removed the physical interlocks from the Therac-25 radiation therapy machine. The software was handling it. The hardware was redundant. Six patients were irradiated with doses up to 200 times the prescribed level. Three died. The machine reported no error each time.

The engineers were not negligent. They made a reasonable decision based on available evidence. The problem was structural: when the software had a race condition, there was nothing left to catch it. The only thing checking for errors was the thing with the error.

The Hardware Lock Principle is a framework for identifying when your cloud security controls have the same structural problem.


The four criteria

A control qualifies as independently verified under the Hardware Lock Principle if it satisfies all four of the following criteria. Missing one is a fail.

Criterion 1: Different mechanism from the primary control

Two controls that operate through the same mechanism are not independent controls. They are two instances of the same control. A second security group rule does not independently verify the first. A second SAST scan does not independently verify the first. They share the same mechanism, which means they share the same failure modes.

Different mechanism means different OSI layer, different AWS service, different control plane, or different organisational boundary. A network ACL and a security group both restrict network ingress, but the ACL operates at the VPC subnet level through the VPC networking layer and is stateless; the security group operates at the instance level through the EC2 service and is stateful. A failure mode that corrupts one does not automatically corrupt the other.

The test: could the same misconfiguration, race condition, or compromise that defeats Control A also defeat Control B without any additional attacker action? If yes, they are not independent.

Criterion 2: Cannot be bypassed by the same failure mode that defeats the primary control

This is the Therac-25 criterion. The physical interlocks on the Therac-20 could not be bypassed by a software race condition. By definition: they were physical. A timing bug in the control software was irrelevant to whether the mechanical beam block was in position.

In cloud architecture, the equivalent question is: what failure mode defeats the primary control, and can that same failure mode reach the independent verification layer? If a developer with write access to a Terraform repository can modify a security group configuration, can they also modify the IAM policy that the secondary control uses to evaluate it? If yes, the secondary control is not independent for that threat model.

AWS SCPs applied at the Organisation level satisfy this criterion for IAM-based threat models. An account-level IAM administrator cannot modify or remove an SCP. The control plane that manages SCPs (AWS Organisations) is separate from the IAM service within any member account. An attacker who has fully compromised an AWS account’s IAM layer still cannot reach the SCP control plane without separate credentials to the management account.

Criterion 3: Produces independent audit evidence

If the log of a control’s operation is stored in the same system as the control itself, a failure or compromise in that system can modify the log. The Therac-25 had no independent record of the doses it delivered. The machine’s own accounting of its actions was the only record. After the incidents, there was no external audit trail to verify or contradict it.

CloudTrail logs stored in an S3 bucket within the same AWS account as the resources they audit can be deleted or modified by any IAM principal with sufficient access to that account. An attacker who has compromised the account may have the access required. The log of what the attacker did is stored in a location the attacker can reach.

Independent audit evidence means: the audit trail is written to a destination that cannot be modified by any principal in the account generating the events. Cross-account write-only S3 destinations with S3 Object Lock enabled is the AWS-native baseline. The generating account has s3:PutObject permission to the destination bucket. It does not have s3:DeleteObjects3:PutBucketPolicy, or any permission to modify the destination. The destination bucket is owned by a separate AWS account, and the Object Lock configuration prevents modification of objects once written.

Criterion 4: Has been tested in isolation, not only alongside the primary control

A control that has only ever been observed not failing while the primary control was also functioning has not been tested independently. It has been observed under normal conditions. That is different.

To test an independent verification control in isolation: disable or bypass the primary control in a non-production environment, then verify that the independent control detects and responds to the failure correctly. If the primary is an SCP and the independent is an AWS Config rule: remove the SCP and confirm the Config rule fires. If the primary is a CI/CD pipeline security gate and the independent is OPA policy enforcement at the cluster level: push a build without the pipeline gate and confirm OPA blocks the deployment.

This test should be scheduled and documented. “It has always worked” is not evidence that the independent control works independently. It is evidence that the system has not been tested under the conditions that matter.


Implementation examples by control domain

Domain 1: Identity and access management

LayerControlMechanismScope
PrimaryIAM policies (identity-based and resource-based)AWS IAM service evaluationAccount level
Hardware LockService Control Policies (SCPs)AWS Organisations control planeOrganisation / OU level
Audit evidenceCloudTrail management eventsCross-account write-only S3 + Object LockCentralised Security account

Why SCPs satisfy all four criteria: They operate through AWS Organisations, a separate service with a separate control plane from IAM. A compromised account-level IAM administrator cannot modify SCPs. An SCP that denies iam:CreateUseriam:AttachRolePolicy, or s3:DeleteBucketPolicy at the OU level will deny those actions regardless of what any IAM policy within a member account permits. The denial is evaluated before IAM.

Example SCP pattern for protecting audit infrastructure:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ProtectAuditBucket",
      "Effect": "Deny",
      "Action": [
        "s3:DeleteBucket",
        "s3:DeleteBucketPolicy",
        "s3:PutBucketPolicy",
        "s3:DeleteObject",
        "s3:DeleteObjectVersion"
      ],
      "Resource": [
        "arn:aws:s3:::org-audit-logs-*",
        "arn:aws:s3:::org-audit-logs-*/*"
      ]
    },
    {
      "Sid": "ProtectCloudTrail",
      "Effect": "Deny",
      "Action": [
        "cloudtrail:DeleteTrail",
        "cloudtrail:StopLogging",
        "cloudtrail:UpdateTrail"
      ],
      "Resource": "*"
    }
  ]
}

Apply this SCP to every OU in the Organisation. Not to individual accounts. Any account-level IAM change that grants permission to delete CloudTrail logs will be blocked at the Organisations layer before evaluation reaches IAM.

Domain 2: Network controls

LayerControlMechanismScope
PrimarySecurity GroupsEC2 service, stateful, instance levelIndividual resources
Hardware LockNetwork ACLsVPC networking layer, stateless, subnet levelEntire subnet
Audit evidenceVPC Flow LogsCross-account write-only S3 in Security accountCentralised Security account

Security groups and NACLs fail independently. A security group misconfiguration that opens port 22 to 0.0.0.0/0 does not affect the NACL applied to the subnet. The NACL denies the traffic at the subnet boundary before it reaches the instance. Conversely, a NACL misconfiguration does not affect security group evaluation.

The common counter-argument: “NACLs are stateless and hard to manage at scale.” This is accurate. It is not a reason to remove them from Tier 1 subnets. It is a reason to define them narrowly. For subnets containing production database instances, a NACL that denies all inbound traffic except from your application subnet CIDR blocks, regardless of security group configuration, satisfies the Hardware Lock Principle for network controls on those subnets.

Example NACL rule for a production database subnet (applied independently of any security group):

# NACL: production-db-subnet
# Inbound rules

Rule 100: ALLOW TCP 10.0.1.0/24 (app subnet CIDR) port 5432   # PostgreSQL from app tier
Rule 200: ALLOW TCP 10.0.2.0/24 (mgmt subnet CIDR) port 22    # SSH from bastion only
Rule 32766: DENY ALL 0.0.0.0/0                                  # Explicit deny-all

# This NACL blocks 0.0.0.0/0:5432 regardless of what any security group permits.
# A misconfigured security group cannot override it.

Domain 3: Infrastructure state verification

LayerControlMechanismScope
PrimaryTerraform / IaC stateState file comparison, plan/apply workflowResources defined in IaC
Hardware LockAWS Config rulesActual resource configuration API callsAll resources in scope, including manual changes
Audit evidenceAWS Config history + CloudTrailCross-account delivery to Security accountCentralised Security account

This is the IaC drift problem. Terraform state reflects intent. AWS Config evaluates actual deployed configuration by calling the relevant AWS APIs directly. A security group modified manually at 2am to resolve a P1 incident will be invisible to Terraform until the next plan run, and may never appear in a plan run if the engineer reconciles the state manually. AWS Config detects it within minutes of the change, regardless of whether IaC was used.

AWS Config managed rules relevant to security posture:

# High-priority Config rules (enable in Security Hub or directly via Config)

restricted-ssh                    # Flags any security group permitting 0.0.0.0/0:22
restricted-common-ports           # Flags unrestricted access on known attack ports
s3-bucket-public-read-prohibited  # Evaluates actual S3 ACL/policy, not IaC state
iam-root-access-key-check         # Verifies no active root access keys exist
mfa-enabled-for-iam-console-access
cloudtrail-enabled
vpc-flow-logs-enabled

Config evaluates these against actual resource configuration, not IaC state. A Terraform run that marks a security group as compliant does not affect Config’s evaluation. They are independent.

For the Hardware Lock Principle to hold here, Config findings must be delivered to a Security account that the teams managing the resources cannot modify. Use AWS Config aggregators with a delegated administrator account in your Organisations structure that is separate from the accounts being evaluated.

Domain 4: Cryptographic key management

LayerControlMechanismScope
PrimaryKMS key policies and IAM grantsIAM evaluation, software-enforcedKey usage and administration
Hardware LockAWS CloudHSM / KMS with HSM backingHardware security module, FIPS 140-2 Level 3Key material protection
Audit evidenceCloudTrail KMS data eventsCross-account write-only S3 + Object LockCentralised Security account

AWS KMS Customer Managed Keys are backed by HSMs by default. The key material cannot be extracted. IAM policies control who can use or administer the key, but no IAM action can extract the raw key material from the HSM. This is the hardware interlock: a software vulnerability in the application using the key cannot extract the key material, because the key material never leaves the hardware boundary.

For environments requiring hardware attestation beyond what AWS KMS provides by default (UK Government OFFICIAL-SENSITIVE, FCA regulated systems, PCI-DSS Level 1): AWS CloudHSM gives you a dedicated HSM with exclusive tenancy and FIPS 140-2 Level 3 validation. AWS KMS with standard CMKs gives you FIPS 140-2 Level 2 with shared HSM infrastructure.

Enable KMS CloudTrail data events. By default, CloudTrail records KMS management events (key creation, deletion, policy changes) but not data events (every Decrypt, Encrypt, GenerateDataKey call). In an environment where key usage volume is high, data events can be expensive. For Tier 1 keys protecting regulated data, the cost is the control. You need the record of who used the key, when, and from which principal.

Domain 5: Deployment pipeline integrity

LayerControlMechanismScope
PrimaryCI/CD pipeline security gates (SAST, SCA, image scan, OPA policy checks)Pipeline configuration, software-enforcedArtefacts passing through the pipeline
Hardware LockAdmission controllers (OPA Gatekeeper / Kyverno) or AWS SCP restricting direct API deploymentKubernetes admission control / Organisations control plane, evaluated independently of pipelineAll deployments, including those bypassing the pipeline
Audit evidenceAdmission controller audit logs + pipeline deployment recordsForwarded to centralised SIEM, cross-accountCentralised Security account

The critical failure mode for pipeline-only controls: a developer with direct kubectl apply access or direct AWS API access can deploy to production without passing through the pipeline. Every SAST scan, SCA check, and OPA policy evaluation in the pipeline is bypassed. The pipeline security gate is not enforcing anything; it is only enforcing against people who use it.

OPA Gatekeeper or Kyverno admission controllers enforce policy at the Kubernetes API server level, before any resource is created or modified in the cluster, regardless of whether the request came through the CI/CD pipeline or directly from a terminal. The policy evaluation happens independently of the pipeline. A developer with direct cluster access gets the same policy enforcement as a pipeline-triggered deployment.

Example Kyverno policy blocking containers running as root, independent of what the pipeline verified:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-root-containers
spec:
  validationFailureAction: Enforce    # Block, not just audit
  background: true
  rules:
  - name: check-runAsNonRoot
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "Containers must not run as root. Set runAsNonRoot: true."
      pattern:
        spec:
          containers:
          - =(securityContext):
              runAsNonRoot: true

This policy blocks the deployment at the API server regardless of pipeline state. An engineer who pushes directly with kubectl apply encounters the same enforcement as a pipeline deployment. The pipeline did not create this control. The pipeline failing does not remove it.

Domain 6: Data access controls

LayerControlMechanismScope
PrimaryS3 bucket policies and ACLsIAM evaluation, software-enforced, account levelPer-bucket configuration
Hardware LockSCP denying s3:DeleteBucketPublicAccessBlock and s3:PutBucketAclAWS Organisations control planeAll S3 in all member accounts
Audit evidenceS3 server access logs + CloudTrail S3 data eventsCross-account write-only destination with Object LockCentralised Security account

S3 public access block settings can be disabled by any IAM principal with s3:PutBucketPublicAccessBlock permission. An SCP that denies this action at the Organisation level prevents any principal in any member account from disabling the public access block, regardless of what IAM grants them within the account. The SCP is evaluated before IAM. It cannot be overridden by account-level permissions.

# SCP: deny-s3-public-access-removal
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyS3PublicAccess",
      "Effect": "Deny",
      "Action": [
        "s3:PutBucketPublicAccessBlock",
        "s3:DeletePublicAccessBlock",
        "s3:PutBucketAcl",
        "s3:PutObjectAcl"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:PrincipalAccount": "MGMT-ACCOUNT-ID"
        }
      }
    }
  ]
}

The condition excludes the management account so that legitimate organisational changes can still be made from the correct account. Every other account in the Organisation cannot remove S3 public access restrictions. A misconfigured bucket policy, an IAM privilege escalation, or a compromised developer account cannot change this.


Control audit template

Use this template to evaluate your current control estate against the Hardware Lock Principle. Complete one row per critical control. A control is critical if its failure could result in data breach, compliance violation, service outage affecting regulated systems, or significant financial loss.

Control nameAsset protectedAsset tierPrimary mechanismPrimary failure modeIndependent layerIndependent mechanismC1: Different mechanismC2: Different failure modeC3: Independent auditC4: Isolation testedOverall: Hardware Lock passAudit evidence locationIsolation test dateRemediation owner
Production IAM restrictionsAll production AWS resourcesTier 1IAM identity-based policiesIAM policy misconfiguration, privilege escalationSCPs at Production OUAWS Organisations control planeYESYESPARTIAL (CloudTrail in same account)NONO (fails C3, C4)None configuredNever
[Your control]Tier 1 / 2 / 3YES / NOYES / NOYES / NO / PARTIALYES / NOYES / NO

Asset tier definitions

TierDefinitionExamplesHardware Lock requirement
Tier 1Compromise results in regulatory breach, data loss affecting customers, or service outage in regulated systemProduction databases, customer PII stores, payment processing, regulated API endpoints, KMS keys protecting regulated dataAll four criteria required
Tier 2Compromise results in significant operational impact or internal data exposureShared services accounts, CI/CD infrastructure, internal APIs, monitoring and alerting systemsCriteria 1 and 2 required; 3 and 4 strongly recommended
Tier 3Compromise results in limited blast radius, no regulatory exposureDevelopment accounts, non-production environments, internal tooling with no production accessCriterion 1 recommended; others optional

Prioritisation matrix

After completing the audit, prioritise remediation as follows:

ScenarioPriorityAction
Tier 1 asset, 0 criteria passedCriticalImmediate remediation. This is a Therac-25 pattern: software-only control, no fallback, no independent audit. Define the independent layer before any other work.
Tier 1 asset, 1-2 criteria passedHighAddress within current sprint. Independent layer exists but audit evidence or isolation testing is missing. Close the gap.
Tier 1 asset, 3 criteria passed (C4 missing)MediumSchedule isolation test within 30 days. Architecture is sound. Verify it works under the conditions that matter.
Tier 2 asset, 0 criteria passedHighAddress within current quarter. Tier 2 systems are often the pivot point for Tier 1 compromise.
Tier 2 asset, 1-2 criteria passedMediumAddress within current quarter. Prioritise C2 (different failure mode) if only one criterion can be addressed.
Tier 3 asset, anyLowAddress if Tier 1 and 2 remediations are complete. Tier 3 compromise becomes a Tier 2 or 1 problem only if there is a path to higher-tier systems.

Common anti-patterns

These are the most frequent failures encountered when organisations apply this framework for the first time.

Anti-pattern 1: Two software controls counted as independent

A team has Checkov running in the CI/CD pipeline and AWS Security Hub evaluating deployed resources. They list these as two independent controls. They are not. Both evaluate software configuration. Both can be defeated by IAM misconfiguration in the same account. Checkov in the pipeline does not evaluate actual deployed state; Security Hub findings can be suppressed by any IAM principal with securityhub:BatchUpdateFindings. Neither satisfies Criterion 2 for IAM-based threat models.

The fix: add an SCP at the Organisation level that denies securityhub:BatchUpdateFindings for findings in a FAILED state. Now Security Hub findings in Tier 1 accounts cannot be suppressed by account-level principals. The SCP operates through a different mechanism and cannot be reached by the same IAM-level failure mode. Checkov remains a pipeline gate. The SCP is the hardware lock.

Anti-pattern 2: Audit logs in the same account as the controls they audit

This is the most common single failure. CloudTrail is enabled. Logs are delivered to an S3 bucket in the same account. The team believes they have audit coverage. They do not have independent audit coverage. Any IAM principal with sufficient access in that account can modify the S3 bucket policy, delete the trail, or delete individual log objects.

The fix is architecturally simple and operationally inexpensive: create a dedicated logging account in your AWS Organisation. Deliver CloudTrail, Config, and VPC Flow Logs to write-only S3 buckets in that account with Object Lock in Governance or Compliance mode. Apply an SCP to the logging account that denies all resource creation except through a designated pipeline. No human has console access to the logging account. No principal in any other account can write to or modify the audit buckets.

Anti-pattern 3: The independent control has never been tested independently

An SCP is applied. AWS Config rules are configured. A Kyverno admission controller is running. Nobody has ever tested whether these controls work correctly when the primary controls are not present. They have been observed functioning under normal conditions. That is not an isolation test.

Schedule a quarterly red team exercise that specifically targets the independent verification layer. Remove or bypass the primary control in a non-production environment and verify that the hardware lock fires. Document the results. If the hardware lock does not fire without the primary, the architecture is broken regardless of what normal operation looks like.

Anti-pattern 4: Hardware Lock applied to Tier 3 assets while Tier 1 assets have none

Audit results frequently show strong independent verification in development environments and weak or absent independent verification in production. The causation is straightforward: the people building the architecture were working in development environments when they designed the controls. Production controls were added later, often under time pressure, and the hardware lock layer was deferred.

Apply the framework top-down: Tier 1 first, Tier 2 second, Tier 3 last. A Tier 3 development account with four Hardware Lock criteria passed and a Tier 1 production database with zero is not a mature security architecture. It is a complete risk inversion.


The maintenance requirement

The Hardware Lock Principle is not a one-time assessment. It is a maintenance commitment.

Controls degrade. SCPs get modified to resolve access issues without documentation. NACLs get relaxed during incident response and never tightened. Cross-account audit bucket policies get changed by a well-intentioned engineer who needed access. Object Lock gets disabled because nobody could explain why it was there. The hardware lock is removed incrementally, one reasonable decision at a time, until the only thing checking for errors is the thing with the error.

The audit template should be reviewed quarterly for Tier 1 controls and semi-annually for Tier 2. Criterion 4 (isolation testing) should be re-run after any significant architecture change to a control in scope. Any change to an SCP, NACL, admission controller policy, or cross-account audit configuration should trigger an immediate re-evaluation of the affected rows in the audit template.

If you cannot answer “when was this last independently verified?” for a Tier 1 control, the answer is “never.” Treat it as such.


Further reading

  • Leveson, N.G. and Turner, C.S. (1993). An Investigation of the Therac-25 Accidents. IEEE Computer, 26(7), pp. 18-41. The definitive technical post-mortem. Read the original.
  • NIST SP 800-218: Secure Software Development Framework (SSDF). References Therac-25 explicitly in the context of verification and validation requirements.
  • IEC 62304: Medical device software — software life cycle processes. The regulatory response to the Therac-25 and similar failures. The independent verification requirements in this standard are the formal equivalent of what this framework describes.
  • AWS Security Reference Architecture (SRA): AWS’s own multi-account architecture guidance, which embeds several Hardware Lock patterns (dedicated security account, SCP guardrails, centralised logging) as defaults.

Bola Ogunlana is a Senior DevSecOps Engineer with 25+ years in cloud infrastructure, UK Government delivery, and financial services. He writes at blog.ogunlana.net. Author of Vibe Coding: Build Cloud Infrastructure at the Speed of Thought.

1 thought on “The Hardware Lock Principle

Leave a Reply

Your email address will not be published. Required fields are marked *