Magic
Magic is an AI research company building frontier code models to automate software engineering and research. The team believes the most promising path to safe AGI runs through automating AI research and code generation so models can improve themselves and alignment work can scale beyond what humans can do alone. Their site positions the work as fundamental research on a direct path to AGI, not a shrink-wrapped SaaS product.
The technical bet combines frontier-scale pre-training, domain-specific reinforcement learning, ultra-long context, and inference-time compute. Magic’s LTM (Long-Term Memory) architecture targets software development specifically: models that can hold entire codebases, documentation, and private libraries in context during inference instead of relying on fuzzy memorization from training alone.
Magic has raised $515 million, runs thousands of NVIDIA GB200 GPUs on Google Cloud supercomputers, and publishes safety commitments through an AGI Readiness Policy developed with METR. The company is a small engineering and research group based in San Francisco, currently hiring across kernels, pre-training systems, inference, RL, and security.
LTM-2-mini reasons across 100M tokens, roughly 10 million lines of code
HashHop benchmark stress-tests retrieval without easy semantic shortcuts
Custom CUDA training and inference stack built without torch autograd
Google Cloud GB200 NVL72 cluster scales to tens of thousands of Blackwell GPUs
AGI Readiness Policy tracks dangerous capabilities with METR guidance
Prototype models edited real open source repos like Documenso without human help
LTM architecture targets 100M-token context at a fraction of standard attention memory cost.
Published AGI Readiness Policy with METR input and explicit dangerous-capability thresholds.
$515M in funding and Google Cloud GB200 supercomputers for large-scale training.
Demonstrated unassisted code edits on real open source repositories in research prototypes.
HashHop benchmark addresses known weaknesses in popular long-context evaluations.
No public product, pricing page, or self-serve access on magic.dev as of research date.
Frontier models remain in research; blog notes prototype synthesis quality was not yet competitive.
Careers-focused site gives limited detail on consumer or enterprise availability timelines.
What does Magic build?
Magic builds frontier code models and autonomous agents aimed at automating software engineering and AI research. Its LTM (Long-Term Memory) models focus on ultra-long context windows so code synthesis can see full repositories, docs, and libraries during inference, not just what was memorized in training.
How large is Magic’s context window?
Magic’s LTM-2-mini model supports up to 100 million tokens of context during inference, which Magic equates to about 10 million lines of code or 750 novels. An earlier LTM-1 model announced on the Magic blog had a 5 million token context window.
How much funding has Magic raised?
Magic has raised $515 million in total funding, according to its homepage and August 2024 research update. Investors listed include Nat Friedman, Daniel Gross, CapitalG, Elad Gil, Sequoia, Jane Street, Eric Schmidt, and Atlassian.
Does Magic have public pricing or a signup?
Magic’s website does not list product pricing or a public self-serve signup. The site focuses on research updates, safety policy, and careers. There is no pricing page in the fetched site navigation.
How does Magic approach AI safety?
Magic publishes an AGI Readiness Policy (version 1.0, July 2024) created with help from METR. It commits to dangerous-capability evaluations before deploying frontier coding models, covering threat models like cyberoffense, AI R&D acceleration, autonomous replication, and bioweapons assistance. Safety inquiries go to [email protected].
Where is Magic based?
Magic is based in San Francisco. Career listings on magic.dev place most engineering and research roles in SF, with some kernel and infrastructure roles open to remote work.
What is HashHop?
HashHop is Magic’s long-context evaluation benchmark that uses random hash pairs instead of semantically obvious needles. Magic designed it because existing benchmarks like Needle In A Haystack let models cheat by spotting unusual text, and Magic published HashHop on GitHub for others to use.

