NVIDIA NemoClaw and OpenShell: The Future of Autonomous AI Agents

At GTC 2026, NVIDIA dropped what I believe is the most consequential announcement for anyone building autonomous AI agents in production: NemoClaw — an open-source stack for running always-on autonomous agents safely — and OpenShell, the runtime that makes it possible. After spending the past two years deploying agentic systems on-prem at Presight AI, I can say with confidence that this changes the game.

Let me break down what NemoClaw and OpenShell actually are, why they matter, and how they compare to the agent frameworks most of us have been using.

What Are NemoClaw and OpenShell?

NemoClaw is NVIDIA's open-source stack (Apache 2.0 licensed) for building, deploying, and managing autonomous AI agents — what NVIDIA calls "claws." These aren't your typical single-shot LLM tool-calling chains. Claws are persistent, autonomous agents that:

Remember context across sessions — they pick up where they left off
Spawn subagents to handle subtasks in parallel
Write their own code to learn new skills on the fly
Use tools and external APIs autonomously
Keep executing long after you close your laptop

Think of a claw as a senior engineer you can hand a complex, multi-day task to. It figures out the plan, delegates sub-tasks, writes helper scripts, and checks in with results — except it runs 24/7.

OpenShell is the runtime that makes this safe. It's part of the NVIDIA Agent Toolkit and provides the sandboxed execution environment where claws actually run. If NemoClaw is the brain, OpenShell is the secure operating room where the brain is allowed to operate.

NemoClaw is built on NVIDIA's open-source Nemotron models — including the newly released Nemotron 3 Super, a hybrid Mamba-Transformer Mixture-of-Experts architecture specifically optimized for agentic reasoning. More on that later.

The OpenShell Architecture: Why It Matters

Here's where things get genuinely interesting from an infrastructure perspective. OpenShell provides three critical capabilities that no existing agent framework offers at the runtime level:

1. Sandboxed Execution

Every claw runs inside a sandboxed environment — think of it like the browser tab model, but for AI agents. NVIDIA calls this out-of-process policy enforcement. The agent's code execution is isolated from the host system, just like how a rogue browser tab can't crash your entire machine or read files from other tabs.

# Create a sandboxed agent environment on DGX Spark
openshell sandbox create --remote spark --from openclaw

# Or run locally on an RTX GPU PC
openshell sandbox create --local --gpu rtx --from openclaw

# List running sandboxes
openshell sandbox list

# Attach to a running claw session
openshell sandbox attach --id claw-0a3f

This isn't just a Docker container with extra steps. The sandbox enforces policies at the process level — filesystem access, network calls, and spawned processes are all governed by explicit rules.

2. Policy Engine

The policy engine is the real differentiator. You define declarative constraints that govern what an agent can and cannot do:

# policy.yaml — OpenShell policy configuration
version: "1.0"
sandbox:
  name: "data-pipeline-claw"
  base: openclaw

filesystem:
  allow:
    - /workspace/data/**
    - /workspace/output/**
  deny:
    - /etc/**
    - /home/**
    - /workspace/credentials/**

network:
  allow:
    - api.internal.company.com:443
    - storage.googleapis.com:443
  deny:
    - "*"  # deny all other network access

processes:
  allow:
    - python3
    - pip
    - curl
  deny:
    - sudo
    - apt
    - rm -rf

resources:
  max_memory: "16Gi"
  max_cpu: "4"
  max_gpu_memory: "24Gi"
  max_runtime: "24h"

# Launch a claw with the policy applied
openshell sandbox create --remote spark --from openclaw --policy policy.yaml

This is enforced at the infrastructure level, not the prompt level. The agent literally cannot access files outside its allowed paths, even if it writes code that tries to. This is a fundamental architectural distinction that I'll compare to existing frameworks below.

3. Privacy Router

This is the feature that made me sit up straight. The privacy router automatically routes sensitive data to local models and non-sensitive data to frontier models. In practice, this means:

Queries involving PII, internal codebases, or proprietary data → routed to a local Nemotron model running on your DGX or RTX hardware
General reasoning, code generation, or public-knowledge queries → routed to a frontier model for maximum capability

For those of us deploying agents in regulated industries (finance, healthcare, government), this solves one of the hardest problems: getting the benefit of frontier model capabilities while keeping sensitive data on-prem.

How NemoClaw Compares to Existing Agent Frameworks

I've built production systems with LangGraph, CrewAI, and AutoGen. They're excellent frameworks, but they operate at a fundamentally different layer of the stack. Here's how I see the landscape:

LangGraph

LangGraph gives you fine-grained control over agent workflows as stateful graphs. It's great for building complex, multi-step agentic pipelines. But it's a programming framework — it doesn't provide execution isolation, policy enforcement, or hardware-level sandboxing. If your LangGraph agent writes and executes malicious code, your host system is exposed.

CrewAI

CrewAI focuses on multi-agent collaboration with role-based agents. It's intuitive and great for prototyping. But like LangGraph, safety is handled through prompt engineering and application-level guardrails — you're trusting the model to follow instructions, not enforcing constraints at the OS level.

AutoGen

Microsoft's AutoGen is the closest to NemoClaw's vision — it supports multi-agent conversations, code execution, and human-in-the-loop patterns. It even has a Docker-based code execution sandbox. But AutoGen's sandboxing is opt-in and limited to code execution; it doesn't cover network access, filesystem policies, or privacy-aware routing.

Where NemoClaw Is Different

The key insight is that NemoClaw and OpenShell operate at the infrastructure layer, not the application layer:

Capability	LangGraph	CrewAI	AutoGen	NemoClaw + OpenShell
Agent orchestration	✅	✅	✅	✅
Multi-agent support	✅	✅	✅	✅ (subagent spawning)
Sandboxed execution	❌	❌	Partial (Docker)	✅ (OS-level)
Declarative policy engine	❌	❌	❌	✅
Privacy-aware routing	❌	❌	❌	✅
Persistent across sessions	❌	❌	❌	✅
Self-coding / skill learning	❌	❌	Partial	✅
Hardware-optimized (GPU)	❌	❌	❌	✅ (DGX/RTX)

NemoClaw isn't a replacement for these frameworks — you can actually use OpenShell as the runtime underneath them. It's compatible with OpenClaw, Claude Code, Codex, and Cursor. Think of it as the secure operating system on which your existing agent code runs.

Getting Started: A Practical Walkthrough

Here's how to get a claw running from scratch. You'll need an NVIDIA GPU (RTX series or DGX) and the NVIDIA Agent Toolkit installed.

# Install the NVIDIA Agent Toolkit
pip install nvidia-agent-toolkit

# Verify your GPU is recognized
nvidia-smi

# Pull the OpenClaw base image
openshell pull openclaw:latest

# Create your first sandbox
openshell sandbox create --local --from openclaw --name my-first-claw

# The claw is now running. Attach to it:
openshell sandbox attach --name my-first-claw

Once attached, you can give the claw a task:

# Inside the claw session
> Analyze the CSV files in /workspace/data/, build a classification model,
  evaluate it with cross-validation, and write a summary report to /workspace/output/

# The claw will:
# 1. Explore the data files
# 2. Write Python code for preprocessing
# 3. Train and evaluate models
# 4. Generate a markdown report
# All within the sandbox constraints you defined

For remote deployment on DGX Spark:

# Deploy to DGX Spark with a custom policy
openshell sandbox create --remote spark --from openclaw \
  --policy policy.yaml \
  --env TASK="nightly-data-pipeline" \
  --schedule "0 2 * * *"

# Monitor running claws
openshell sandbox list --remote spark --status running

# Check logs
openshell logs --name nightly-data-pipeline --tail 100

Nemotron 3 Super: The Brain Behind the Claws

Alongside NemoClaw, NVIDIA released Nemotron 3 Super — a hybrid Mamba-Transformer Mixture-of-Experts model designed specifically for agentic reasoning. The hybrid architecture is significant:

Mamba layers handle long-context sequential reasoning efficiently (linear scaling vs. quadratic for standard attention)
Transformer layers handle complex cross-attention and retrieval tasks
MoE routing activates only the relevant expert parameters per token, keeping inference costs manageable

In my early testing, Nemotron 3 Super shows notably stronger performance on multi-step tool-use benchmarks compared to similarly-sized open models. The Mamba component seems particularly beneficial for long agent trajectories where the model needs to maintain coherent state across dozens of tool calls.

Implications for Enterprise Deployment

This is where I get genuinely excited, because this directly addresses problems I've wrestled with at Presight AI. Deploying autonomous agents in enterprise environments has three recurring nightmares:

1. Security and Compliance

Every CISO I've talked to has the same reaction to autonomous agents: "You want to let an AI write and execute arbitrary code on our infrastructure?" With OpenShell's sandboxed execution and declarative policies, you can give your security team a machine-readable specification of exactly what the agent can and cannot do. That's auditable, reviewable, and enforceable — not just a system prompt you're hoping the model respects.

2. Data Privacy

In regulated industries, data can't leave certain boundaries. The privacy router is the first infrastructure-level solution I've seen that lets you use frontier models for general reasoning while guaranteeing that sensitive data never leaves your network. Previously, we had to choose between running everything on weaker local models or building complex proxy layers ourselves.

3. Reliability at Scale

Claws that persist across sessions, self-heal, and spawn subagents mean you can build genuinely autonomous data pipelines and monitoring systems. At Presight AI, we've been manually stitching together cron jobs, Airflow DAGs, and LLM API calls to approximate this. NemoClaw makes the persistent autonomous agent a first-class primitive.

The fact that this runs on hardware ranging from an RTX 4090 desktop to a full DGX Station means teams can develop locally and deploy to production-grade hardware without rearchitecting.

What I'm Watching For

NemoClaw is still early. A few things I'll be tracking closely:

Community adoption: Apache 2.0 licensing is the right call. The question is whether the ecosystem builds around OpenShell the way it built around Docker.
Policy engine maturity: The declarative policy system needs to support more granular controls — token-level budget caps, API rate limits, and cross-agent communication policies.
Framework integration depth: Compatibility with Claude Code, Codex, and Cursor is promising. I want to see deeper integration with LangGraph and CrewAI for teams already invested in those ecosystems.
Benchmarks on consumer hardware: DGX Spark is impressive but expensive. How well do claws run on an RTX 4090 with Nemotron 3 Super quantized to 4-bit?

Key Takeaways

NemoClaw is an open-source stack for running persistent, autonomous AI agents ("claws") safely — announced at GTC 2026 under Apache 2.0.
OpenShell provides infrastructure-level safety — sandboxed execution, declarative policy enforcement, and privacy-aware routing. This is fundamentally different from prompt-level guardrails.
The privacy router is a breakthrough for enterprise deployment — sensitive data stays on local models while non-sensitive queries leverage frontier model capabilities.
NemoClaw complements, not replaces, existing frameworks — it's compatible with OpenClaw, Claude Code, Codex, Cursor, and can serve as the runtime layer beneath LangGraph, CrewAI, or AutoGen.
Nemotron 3 Super's hybrid Mamba-Transformer MoE architecture is purpose-built for the long-context, multi-step reasoning that autonomous agents demand.
For enterprise teams deploying agents on-prem, this is the most complete open-source solution I've seen — covering execution safety, data privacy, and hardware optimization in a single stack.

The shift from "agents as chat features" to "agents as always-on autonomous systems" is happening. NemoClaw and OpenShell are the infrastructure that makes it safe to actually ship these systems to production. I'll be writing more as I integrate this into our workflows at Presight AI.