Bare-metal, colocation, or appliance: where to put on-prem AI (CAPEX and OPEX)

Fryderyk Pryjma·published May 22, 2026·updated June 9, 2026·20 min · 4477 words

[on-prem]on-premdeploymentCAPEXOPEX

Bare-metal, colocation, or appliance: where to put on-prem AI (CAPEX and OPEX)

Why distinguish three models
Model A: bare-metal in your own server room
Model B: colocation with dedicated hardware
Model C: a vendor's managed appliance
The NIS2 risk profile across the three models
A numbers comparison: 500 FTE, 36 months
When none of the three models makes sense
A five-question decision aid
Disclosure and biases
What I don't cover here

Why distinguish three models

In the previous pillar on on-prem AI architecture I defined on-prem as the variant where inference sits inside the organisation's perimeter and operational data never leaves the trust boundary. That's still a fairly coarse category. In conversations with IT directors and CISOs, I see that under the "on-prem" label people buy three quite different deployments — with different CAPEX, different OPEX, different regulatory risk profiles, and different vendor dependence.

Three models worth distinguishing as early as the RFP stage:

Model A: bare-metal in your own server room. GPUs, server chassis, and switches sit in the client's infrastructure, in a room managed by the internal IT team. The client buys the hardware, maintains it, and is responsible for power and cooling.
Model B: colocation with dedicated hardware. The client buys the hardware (CAPEX), but the chassis sits in a commercial colocation data centre. The colo provider supplies power, cooling, physical access control, and connectivity. The client is still responsible for the OS layer and above.
Model C: the platform vendor's managed appliance. The vendor delivers and maintains the full hardware stack plus layers 2 to 5. The client provides the space (a server room, colo, or a hybrid), pays a subscription that covers hardware as OPEX, and is responsible for integrations plus governance.

Each of the three models passes NIS2 if done properly. Each has a different cost distribution over time, a different split of responsibility toward the auditor, and a different time-to-value curve. In this piece I take each in turn, show numbers for a mid-sized 500-FTE manufacturer, and close with a five-question decision aid you can use to wrap up a board conversation.

I'm leaving out the managed cloud appliance hosted at the vendor (e.g. AWS Outposts for inference). That's hybrid in disguise, regardless of marketing. I'm sticking to the three variants where operational data genuinely doesn't leave the client's perimeter.

Model A: bare-metal in your own server room

The classic, most common in European manufacturing. The firm has had a server room for two decades, has an infrastructure team, has patching procedures and monitoring. Adding another rack with two to four GPUs fits existing processes. It doesn't require the board to step outside familiar territory.

What you buy. A rack-mount server with 2x or 4x H100 80GB (alternatively 4x L40S 48GB for a smaller budget and volume), 512 GB to 1 TB RAM, 8 to 16 TB local NVMe, redundant power, a BMC with out-of-band management. A 25 or 100 GbE switch if you don't have networking at that class. A UPS of adequate capacity, usually an extension of the existing one. Precision cooling, if the existing system can't handle an extra 2 to 6 kW of heat. That last item is often the one that surprises the budget.

What you maintain. Everything from layer 1 to layer 7 of the previous pillar. Hardware, OS, NVIDIA and CUDA drivers, container runtime, model serving, RAG pipeline, application layer, observability. Plus power, cooling, physical access, backup, disaster recovery. That means a dedicated team of at least 0.5 FTE platform engineer plus 0.2 FTE security plus part of a data-centre engineer's time plus access to MLOps at layer 4 to 5 (usually 0.3 to 0.5 FTE internally or a contract with a platform vendor).

Deployment timeline. From decision to production: 12 weeks in an optimistic scenario, 16 to 24 weeks realistically. Hardware delivery alone takes 4 to 12 weeks depending on GPU availability and the firm's procurement policy. Server-room adaptations (power, cooling) can add another 4 to 8 weeks if it wasn't previously prepared for cards at 700 W TDP each.

Risk profile and compliance. The strongest of the three models. A NIS2 auditor sees full client control over every layer. All attack vectors are in the client's responsibility area, which means they're also in the client's visibility. The SIEM sees everything. The IdP sees everything. Patch management is under your control, so patch delays (and their audit consequences) depend on your people's discipline, not on a vendor contract.

The weak point of this profile: if your IT team lacks competence at layer 4 to 5 (model serving, RAG pipeline), the audit will find gaps in LLM observability, in tracing a specific query, in the source-citation mechanism. Owning your own platform doesn't mean you understand it. The auditor requires understanding.

When it makes sense. You have an existing, mature server room with power and cooling headroom. You have an infrastructure team comfortable adding a GPU stack to existing procedures. You have a longer horizon (3 to 5 years) and a calculation that full CAPEX amortises. You have contractual constraints that rule out colo (e.g. group NIS2 policies defining the data-processing location as "the company's premises").

When it doesn't. The server room can't take an extra 2 to 6 kW, and a refit within an 18-month horizon isn't realistic. The IT team has no one who wants to take on a GPU stack. The project horizon is unclear (a 12-month pilot with no decision on what's next). You have a geographic-redundancy requirement that a single server room can't meet.

Model B: colocation with dedicated hardware

The second model, growing in popularity in European manufacturing since 2024. The client buys the hardware (so CAPEX doesn't disappear), but the chassis goes to a commercial colo (Equinix, Atman, Beyond.pl, Polcom, regional data centres). The client pays a monthly fee for space, power, cooling, connectivity, and physical access control. The client is still responsible for the OS layer and above.

What you buy. The same as Model A, from layer 1 to 7. Hardware (GPUs, server, switches), software (OS licenses if RHEL, vLLM or TGI or TensorRT-LLM, RAG stack, application layer). Plus a colo contract for specific rack U, a specific power feed (usually redundant A+B at 16A or 32A), a specific cooling class (in-row CRAC, cold corridor; some modern colos offer direct-to-chip liquid cooling for GPU racks), and specific connectivity (BGP to the internet plus a dedicated VPN to the client's premises).

What you maintain. The OS layer and above, identical to Model A. Plus managing the colo relationship: SLA, physical access (who from your team enters the data centre, in what mode, with what break-glass procedure), emergency procedures. Not having your own power plant, cooling, and physical access control reduces your responsibility scope by 30 to 50%, but requires contractual discipline.

Timeline. Deployment is faster than Model A if the colo is ready for a GPU stack. From decision to production: 8 to 16 weeks. Hardware delivery is the same, but not having to adapt your own server room shortens the critical path. It does require time for due diligence on the chosen colo (location, ISO 27001, ISO 22301, EN 50600 certifications, in some cases a vendor NIS2 self-assessment), adding 2 to 4 weeks of compliance work.

Risk profile and compliance. A bit weaker than Model A. A NIS2 auditor sees that the client controls layers 1 to 7, but physical access to the hardware sits with the colo provider. That's a sub-processor in the Article 21 NIS2 mapping (Article 21(2)(d), security of the supply chain). It requires an additional DPA with the colo provider, audit-right clauses, an SLA, and incident-reporting procedures. It requires separate mapping.

A second aspect: the data is in a different location than the company's premises. If a group policy or sector requirement specifies "data on company premises," colo is out of scope. That can be a trap in sectors with extra regulation (energy, some essential entities in chemicals, defence-adjacent).

Plus: a colo usually has its own access control (badges, biometrics, mantraps, strict procedures for admitting service staff) that is better than a typical plant server room. That's a compliance support, not just a complication.

When it makes sense. You don't have an enterprise-class server room, or refitting one doesn't pay back within the project horizon. You want geographic redundancy and a second location in another colo region fits your BCP policy. You have an infrastructure team willing to move to remote management. The group policy allows colo and accepts a sub-processor process.

When it doesn't. You have an existing, capable server room with space and power headroom. You have strict sector requirements on data location. The infrastructure team isn't ready for data-centre remote operations (which require a specific procedural maturity).

A middle variant worth considering: hybrid colo, where the production stack is in a commercial colo and development plus staging is in your own server room. More complex for audit, but operationally sensible for mid-sized firms of 500 to 1000 FTE.

Model C: a vendor's managed appliance

The third model, the youngest in European manufacturing but with the fastest growth since 2025. The AI platform vendor delivers a pre-configured appliance (typically 1U or 2U rack-mount, sometimes 4U for multi-GPU). The client provides the space (own server room or colo; some vendors also offer hosting in selected data centres), power, and cooling. The vendor maintains layers 2 to 5 (OS, runtime, model serving, RAG pipeline). The client is responsible for layer 1 physically (access to the appliance), layer 6 partly (application layer, integrations, your own UI if you customise), and layer 7 (observability at the organisation level).

What you buy. The appliance as CAPEX or, more often, as OPEX in a subscription with hardware included. The subscription typically covers: the appliance, remote maintenance of layers 2 to 5, access to new models and their integration into the stack, an SLA, and a vendor due-diligence pack for NIS2 (access control, audit log, security architecture, sub-processor list). The subscription is typically 3-year with a renewal option.

What you maintain. Layer 1 physically (power, cooling, access), layer 6 functionally (integrations with your ERP, MES, PLM, your own workflows), layer 7 partly (observability from your systems, SIEM, IdP). Requires 0.2 to 0.4 FTE platform engineer plus 0.2 FTE security plus an integration team for the first 12 months.

Timeline. The fastest of the three. From decision to production: 6 to 12 weeks for a typical use case (service desk, proposals, instructions). The vendor delivers a pre-configured appliance; the base stack is ready to run in week one. The rest of the time goes to integrations and tuning the RAG pipeline for your documentation.

Risk profile and compliance. The most nuanced. The vendor is a sub-processor in a deeper sense than in Model B (colo). The vendor has privileged access to the appliance for maintenance, which means potential access to the layers where operational data flows. That's a fundamental due-diligence decision: whether this specific vendor, with this specific contract, with these specific access controls, is acceptable in the Article 21 NIS2 mapping.

A point worth naming: good productized-platform vendors have audit logs better than most firms will build internally. They have a built-in source-citation mechanism, prompt versioning, an eval pipeline. That accelerates audit readiness significantly. The downside: you have to trust that this mechanism works as documented.

Specific contractual clauses whose absence should stop the signature:

Logging of vendor access to the appliance, available to the client read-only in real time or with at most 24h delay.
A break-glass procedure with client notification and an auditable decision in the SIEM.
A right to audit once a year, physical or remote, with access to logs and procedures.
A sub-processor list with the client's right to veto new sub-processors.
An exit clause with a procedure to hand over data in a readable format plus configuration code (model weights, prompt templates, retrieval indexes) on contract termination or vendor insolvency.
Source-code escrow or an equivalent mechanism for access to critical code if the vendor fails. Less standard, but worth it for essential entities.
The vendor's financial stability (rating, financial audit, contractual jurisdiction). For NIS2 essential entities this isn't optional.

When it makes sense. You want the first workflow in production in 6 to 12 weeks. The internal IT team is small (up to 8 people) and doesn't plan to build ML competence at layer 4 to 5. Audit readiness is a priority (the vendor delivers a ready NIS2 pack). The model and pipeline roadmap sits with the vendor, and that's a feature, not a problem (you don't want to track new Llama or Mixtral models yourself).

When it doesn't. You have a unique workflow no productized vendor covers (a niche sector, custom AI for very specific data). A group policy bans privileged external-vendor access to infrastructure (rare but real in some energy and defence-adjacent essential entities). The vendor doesn't meet your due-diligence requirements (financial, geographic, contractual clauses). You have a mature ML team that wants to build an internal platform for strategic control.

The NIS2 risk profile across the three models

NIS2 and Poland's amended national cybersecurity act (UKSC, April 2026) don't rule out any of the three models. Each can pass an essential-entity audit if done properly. The difference is in the distribution of responsibility and in how much work the mapping requires.

Model A: bare-metal. The fewest sub-processors, the most own responsibility. The Article 21 mapping covers the internal IT department (not a sub-processor in the NIS2 sense, but with its own obligations in the internal control system) plus hardware suppliers (NVIDIA, the server OEM, switches). Typically 3 to 5 sub-processors, all in the hardware layer. The auditor checks internal security organisation, patch-management procedures, IAM, SIEM, BCP. These are controls your organisation already has for existing infrastructure; you just need to extend the scope to the AI workload.

Model B: colo. Medium complexity. The Article 21 mapping covers internal IT, hardware suppliers, plus the colo provider. Typically 5 to 8 sub-processors. The colo provider is a key sub-processor with physical access to the hardware (physical-access audit, admission procedures, identity control, CCTV monitoring) and partly to the network layer (if you use their internet links). Requires a separate DPA with the colo provider and a separate vendor NIS2 self-assessment (most European colos provide this as standard, but verify it).

Model C: managed appliance. The highest mapping complexity. The platform vendor is the deepest sub-processor, with access to the layers where operational data flows (layer 4 to 5 obviously, layer 6 partly). Typically 6 to 12 sub-processors, including the platform vendor plus its own sub-processors (cloud provider for model updates, observability provider, security provider). Each sub-processor requires separate mapping under Article 21, plus a right to veto new sub-processors.

Specific Article 21 sections worth watching in each model:

Article 21(2)(d) (supply-chain security). All three models require supplier mapping. Model C requires the deepest mapping.
Article 21(2)(e) (security in the acquisition, development, and maintenance of network and information systems). Model A gives full control. Model C delegates a significant part to the vendor, which requires contractual clauses and a right to audit.
Article 21(2)(h) (policies and procedures on the use of cryptography). In all models the client should control the keys. Model C requires a clause that the vendor has no access to the client's keys (BYOK or a client-side HSM).
Article 21(2)(i) (human-resources security). Models B and C require assessing sub-processors' staff (background checks, NDAs, mechanisms for handling terminations of staff with access to the client's systems).

For a NIS2 essential entity in manufacturing, full mapping for Model C usually takes 40 to 80 hours of compliance work plus 20 to 40 hours of CISO time plus 20 to 40 hours of a lawyer experienced in NIS2. For Model A, full mapping takes 20 to 40 hours of compliance work. That's a meaningful difference in the first year of deployment.

A quick analysis of the specific Article 21(1)(d) in the public-cloud LLM context is in a separate post on the supply chain. The pillar on full Article 21 mapping for the AI vendor supply chain publishes in this portal's second month.

A numbers comparison: 500 FTE, 36 months

All numbers in this section are approximate and indicative. Real contracts are negotiated, hardware has different price cycles, and labour costs in European manufacturing rise every quarter. I average for the scenario: 500 FTE, mixed workload (service desk dominant plus proposals plus instructions), 36-month amortisation, the "package B" from the previous pillar (2x H100 80GB plus a smaller redundancy node).

Model A: bare-metal in your own server room.

CAPEX year 0: EUR 205,000 (two nodes plus switches plus UPS plus cooling adaptation). Annual OPEX: 4,700 (power) + 8,000 (server maintenance) + 60,000 to 100,000 (team: 0.5 FTE platform engineer + 0.2 FTE security + part of a data-centre engineer) + 15,000 (pen-test) + 60,000 to 120,000 (AI platform software licenses if you buy a productized stack at layer 4 to 5; if DIY, the cost shifts toward an MLOps team of an extra 0.5 to 1 FTE). 36-month total: CAPEX 205 + (148 to 248) × 3 = EUR 649,000 to 949,000.

Model B: colocation with dedicated hardware.

CAPEX year 0: EUR 180,000 (no UPS and no cooling adaptation — those are in the colo price, though the cost shift is sometimes offset by a higher monthly fee). Annual OPEX: colo fee (typically EUR 2,000 to 4,000/month for a rack with a 6 kW power feed) = 30,000 to 50,000 + 8,000 (server maintenance) + 50,000 to 80,000 (team: 0.4 FTE platform engineer + 0.2 FTE security, no data-centre engineer) + 15,000 (pen-test) + 60,000 to 120,000 (AI platform software licenses or DIY MLOps). 36-month total: CAPEX 180 + (163 to 273) × 3 = EUR 669,000 to 999,000.

Roughly at parity with Model A. The difference is in cost distribution (CAPEX vs OPEX, less data-centre team involvement, more fixed monthly colo cost) and in the risk profile, not in the absolute value.

Model C: a vendor's managed appliance.

CAPEX year 0: EUR 0 to 30,000 (hardware is usually in the subscription, sometimes an onboarding fee). Annual OPEX: appliance plus platform plus maintenance subscription (market rate EUR 150,000 to 300,000/year for a mid-sized 500-FTE firm with a mixed workload; the spread depends on vendor, model, custom integrations) + 4,700 (power) + 8,000 (colo or own server room) + 30,000 to 60,000 (team: 0.3 FTE platform engineer + 0.2 FTE security, far less than Models A and B) + 15,000 (pen-test). 36-month total: 0 to 30 + (207 to 387) × 3 = EUR 621,000 to 1,191,000.

The spread is wider because it depends heavily on the vendor's pricing policy. At the low end, Model C is the cheapest of the three (less of your own team offsets the subscription). At the high end, it's the most expensive (enterprise pricing for essential entities can be steep). For a mid-sized firm with contractual discipline, Model C over 36 months is comparable to A and B, but with a significantly shorter time-to-value (6 to 12 weeks vs 16 to 24 for A and 8 to 16 for B).

What disappears from this calculation.

Time-to-value as a cost of lost benefit. If AI genuinely shortens the proposal cycle by 30%, every month of deployment delay is 1/12 of the annual saving you didn't realise. For Model C you save 8 to 16 weeks vs Model A. For a mid-sized firm bidding EUR 5 to 15 million a year, that's EUR 50,000 to 200,000 of unrealised saving in the first year.
The cost of due diligence when changing vendors in the future. Model A: none. Model B: medium (changing colo takes 2 to 4 months). Model C: high (changing the platform vendor is effectively a new deployment).
Long-term vendor lock-in risk. Hard to quantify, but real. Model A: minimal. Model B: low. Model C: medium to high, depending on vendor and clauses.
Own-team risk in Models A and B. If the platform engineer leaves, who takes over? A single point of failure for teams under 5 people at layer 1 to 7.

The numbers in this section are a starting point, not a final verdict. Every organisation has its context (existing infrastructure, team, group contracts, sector policies) that modifies the picture.

When none of the three models makes sense

Honestly: there are scenarios where on-prem AI in 2026 isn't the right decision. I list them for completeness:

A firm under 100 FTE without intensive document workflows. The query volume doesn't justify the hardware CAPEX or OPEX. A public-cloud Copilot or hybrid is cheaper and less operationally burdensome.
A firm with no clear use case, in "AI because the board wants AI" mode. On-prem is a decision-stage investment, not an experimental one. You run experiments on hybrid or public cloud with a DPA.
A firm with an M&A plan within 18 months. New infrastructure right before M&A complicates due diligence and often lands on the decommission list. Better to wait.
A firm with a group policy mandating a specific cloud stack (Azure, AWS, Google). Fighting the group policy costs more than you gain from on-prem. Sensible only if you have parent-company board backing.
A firm in a sector where public cloud with an adequate DPA is acceptable for regulation and there's no strategic risk. Not every firm is a NIS2 essential entity. Not every workflow requires on-prem.

The conclusion isn't "everyone to on-prem." It's: if you already know on-prem is your direction, these three models are real options.

A five-question decision aid

Five questions to ask at a board meeting or a deployment committee. Each one practically decides between models.

1. Do we have an enterprise-class server room with power and cooling headroom for an extra 2 to 6 kW, or is a refit within 18 months realistic?

Yes → Models A and C are in play, Model B optional. No → Models B and C are in play; Model A requires a server-room investment no one usually planned.

2. How many people do we have in platform engineering plus security who can take on layers 1 to 7 of a GPU stack?

Above 3 FTE with the right experience → Models A and B are realistic. 1 to 3 FTE → Model C is safer; A and B need contractual MLOps support. Below 1 FTE → only Model C; other models are an operational risk.

3. What's the acceptable time-to-value for the first workflow?

3 to 4 months → only Model C. 6 months → Model C or B; Model A is borderline. 12+ months → all three models realistic.

4. Does a group policy or sector regulation ban privileged external-vendor access to infrastructure?

Yes → Model C ruled out; A and B remain. No → all three realistic; Model C requires contractual clauses.

5. What's the project horizon and how flexible is the decision over the next 36 months?

Stable 3 to 5 years → CAPEX-heavy Models A and B make sense; amortisation works in your favour. Unclear or pilot with an exit option → Model C, OPEX-heavy, easier to stop. Rapid volume growth of 1 to 10x in 18 months → Model C is more flexible to scale; A and B require planned expansion.

The five answers usually converge on one model or a mix of two. If the answers are fully scattered and every model has arguments for and against, that's usually a signal you haven't defined the use case or the organisation profile precisely enough. Go back to the AI roadmap, not the deployment-model choice.

// disclosure & biasesDisclosure and biases

The author works on an on-prem AI platform for European manufacturers. Three places where this perspective may be biased:

The Model C section is more detailed than the A and B sections on contractual clauses. That's because, as a productized-platform vendor, I see those clauses up close and know which are critical. It doesn't mean A and B don't require clauses (with hardware and colo suppliers), just that those are more standard and less discussed. An independent auditor might consider this an asymmetric balance.
The five-question decision aid favours Model C in three of five questions (time, team, unclear horizon). That comes from my experience that a mid-sized manufacturer has smaller IT teams and less appetite for a multi-year CAPEX project than the firm believes at the start of the conversation. The bias is there, though. A CIO with a strong infrastructure team and a strategic 5-year horizon has every right to choose Model A for reasons I don't list.
The Model C TCO numbers are wider (EUR 621,000 to 1,191,000) than A and B. At the low end C is cheapest, at the high end most expensive. I've seen the low end in real deployments. The high end comes from enterprise negotiations that don't necessarily represent the European mid-market sweet spot. A realistic estimate for a mid-sized 500-FTE firm is closer to the middle of that range (EUR 700,000 to 900,000 over 36 months), not the low end.

Where this bias could sway a decision, I recommend verifying with an independent consultant or a lawyer specialising in NIS2 and the AI Act. Comments, corrections, counter-arguments welcome — best on the author's LinkedIn.

What I don't cover here

Full Article 21 NIS2 mapping for each of the three models. A separate pillar in the NIS2 cluster, month two.
Specific MSA contractual clauses for Model C. A cluster post planned in the vendor-evaluation cluster, month three to four.
A comparison of specific colo providers in the European market. Requires separate market research; I won't do it from memory.
Hardware financing (leasing, lending, hardware-as-a-service beyond Model C). Affects the CAPEX vs OPEX structure, but is beyond architecture.
The hybrid colo plus on-site variant for geographic redundancy. Mentioned in passing; deserves its own piece.
Disaster recovery and backup strategies for the three models. Cross-cutting, planned in the architecture cluster.
Appliance models hosted at the vendor (not pure client colo). Excluded by the on-prem definition in this piece; planned in a separate post on hybrid.

In later pieces I return to individual models in more depth, especially on contractual clauses and Article 21 mapping.

// author

Fryderyk Pryjma

Building CortexMine, an on-prem AI platform for European manufacturers under NIS2. Where this bias could affect conclusions, it is flagged inline.

More about the author LinkedIn CortexMine

Want to apply this to your case: architecture, compliance, and cost?

→ Book 30 min

// related notes

DIY, productized, or managed: three on-prem AI models and who maintains them

9 min · June 2, 2026

"On-prem AI" isn't one deployment model but at least three, with different cost, risk, and team-load profiles. We break them down so CISOs and CIOs know which conversation they're really having before the RFP.

On-prem AI in European manufacturing 2026: a complete architecture guide

23 min · May 14, 2026

Architecture, GPU sizing, security, integrations, TCO, build vs buy. A practical guide to deploying on-prem AI for CISOs and CIOs in European manufacturing in 2026.