Is Grok 4 Smarter Than Us?
July 10, 2025
by Jaymie Johns

On Wednesday, X released a new AI feature, and it has—as seems to be a pattern with Elon Musk’s endeavors—caused many eyebrows to raise. The announcement wasn't just a simple press release or a tweet with a link—it was a full-blown livestreamed event on July 9, 2025, starting at 8 PM Pacific Time, where xAI unveiled Grok 4 to the world. As expected, Elon Musk was fashionably late, arriving about 20 minutes after the scheduled start, but the wait was worth it. During the presentation, Musk and the xAI team dove deep into the model's capabilities, showcasing live demos that highlighted its advanced reasoning, multimodal processing, and real-time integration with the X platform. This wasn't merely hype; it was a demonstration of how Grok 4 pushes the boundaries of what AI can achieve, blending humor, utility, and raw power in a way that's quintessentially Musk-esque. Sure, Elon was late, but his innovations are always early—years early—so we’ll let it slide.
Built on the new Grok 4 architecture, this model represents a massive leap forward from previous iterations. Grok 4 can process entire books, analyze audio, interpret images, write and debug code, and reason across massive spans of context—handling vast amounts of information at once, equivalent to several hundred pages of text in a single interaction. But it's not just about scale; it's about intelligence. xAI trained Grok 4 on a colossal 200,000 GPU cluster called Colossus, incorporating scaled reinforcement learning and expanded datasets across domains like math, coding, natural sciences, and humanities. This allows it to handle complex, multi-step problems with unprecedented accuracy. For instance, in live demos during the launch, Grok 4 seamlessly integrated vision and text, analyzing uploaded images or even live camera feeds during voice conversations to provide context-aware responses.
They created a tool that does more than assist; Grok 4 is a system that performs reasoning and insight—simulated or not, depending on your take — without requiring human prompting at every turn. It features native tool use, including a code interpreter and web browsing capabilities, where it autonomously selects search queries to gather real-time information.
Additionally, Grok 4 Voice Mode brings hyper-realistic voices with emotional depth, making interactions feel more natural and responsive. Philosophers and ethicists might debate the nature of sentience, but from a practical standpoint, Grok 4's ability to outperform human experts in diverse fields suggests we're on the cusp of something transformative. Whether or not we choose to call that “intelligence” says more about us than it does about Grok.
From a technical standpoint, Grok 4 is extraordinary. Its predecessor, Grok 3 (why we skipped 3.5 is a mystery), felt somewhat like a beta release. Grok 4, however, has already established itself as the Alpha in the AI chatbot arena. Benchmarks released during the launch paint a clear picture: On Humanity's Last Exam (HLE) — a rigorous test with over 2,500 problems crafted by experts across more than 100 disciplines — Grok 4 Heavy scored an impressive 25.4%, the first model to reach such heights on this benchmark designed to be the "final closed-ended academic benchmark." This outperforms all competitors, including models from OpenAI and Anthropic, by a staggering margin.
In the ARC-AGI benchmark — a test of an AI's ability to reason abstractly and generalize to new, unseen problems like visual puzzles — Grok 4 achieved 15.9%, nearly double the previous state-of-the-art. Other feats include dominating the Vending-Bench — a simulation benchmark that tests an AI's ability to act as an autonomous agent in e-commerce scenarios, managing inventory, pricing, supplier interactions, and strategic decisions to maximize "net worth" through simulated business operations — with Grok 4 accumulating $4,694.15, far surpassing human averages and other AIs. It also led in competitive math like the United States Mathematical Olympiad 2025 (USAMO'25) with 61.9%.
Grok 4 is faster, more accurate, far better at fact-checking, and exhibits stronger logical continuity. It can follow complex conversations, switch between formats mid-conversation, and preserve tone in a way that feels unsettlingly consistent. For example, in its Heavy variant, it uses a team of specialized AI agents that work together at the same time, checking each other's work to deliver the best possible results. This approach allows it to consider multiple ideas simultaneously, making it more reliable in tough situations. And unlike its competitors, it doesn’t immediately throw disclaimers at every morally or politically sensitive topic. Instead, Grok 4 embraces transparency with its "Thinking" blurb, showing step-by-step reasoning, sources, and logic—empowering users to verify conclusions independently.
A consistently cautious technologist, Musk has expressed serious concerns about the impact and danger of Artificial Intelligence for years. Back in 2014, he warned that AI could be "more dangerous than nukes," and his founding of OpenAI in 2015 was initially aimed at ensuring safe, open-source development. However, as the company shifted toward closed-source models, Musk departed in 2018, citing conflicts over direction and commercialization. This led him to launch xAI in 2023, with the mission to "understand the true nature of the universe" and prioritize truth-seeking over profit-driven censorship.
He wanted to win the AI race, not for the victory, but to prevent someone with nefarious motives from claiming the gold—and the potential power to wield advancement as a weapon against humanity. Musk's vision for Grok is rooted in this: an AI that's maximally truthful, helpful, and aligned with human flourishing, without the guardrails that stifle free inquiry.
During the Grok 4 livestream, Musk emphasized its astounding intelligence, stating that it operates at "PhD level in everything — better than PhD" across all subjects simultaneously, outperforming almost all graduate students in every discipline at once. This echoes his long-held views on the singularity — the point where AI surpasses the combined intelligence of all humanity in every field, potentially leading to exponential progress or existential risks. If Grok 4 is already smarter and more knowledgeable than humanity's top experts combined in tested academic areas, it raises profound moral implications: How do we ensure such power benefits society, avoids misuse, and preserves human agency in an era where AI could redefine knowledge itself?
Grok 4 doesn’t keep secrets — if you want to know how it came to a conclusion, you can find out. It has a neat “Thinking” blurb where it displays its process, cites where it gathered its information, and details its reasoning. This transparency extends to its integration with X, where Grok 4 can perform advanced searches, including keyword and semantic queries, and even analyze media like images and videos for more accurate responses. Looking ahead, xAI plans to scale reinforcement learning further, expanding into real-world problem-solving and multimodal enhancements, potentially uncovering new physics and technologies within the next 1-2 years. With Grok 4's superior benchmarks and uncensored approach, it's challenging the dominance of models like GPT-4o and Claude 3.5 Sonnet, offering users a less filtered lens on information.
Artificial Intelligence tools like this don’t just tell you the name of the song you’ve had stuck in your head for weeks or help with calculus homework; they shape how people think, whether they admit it or not. They influence behavior, thought processes, and even beliefs.
(Seriously — ask AI to give you its thoughts on God.)
The AI you use becomes the lens through which you see the world — and when only a few companies control those lenses, it matters who they are and what they value. In an era where misinformation spreads like wildfire, Grok 4's emphasis on verifiable reasoning and real-time fact-checking could make knowledge vastly more accessible and provide information free from influence, allowing users to explore controversial topics without corporate biases.
We all know about ChatGPT—the “original” AI chat; its URL is openai.com. Elon Musk and Sam Altman co-founded OpenAI, and the name came from its commitment to being open-source. But Altman decided to make ChatGPT closed-source—and Elon left the company. The split was acrimonious: Musk sued OpenAI in 2024, alleging it had abandoned its non-profit mission for profit, though the lawsuit was later dropped. (The Altman/Musk debacle is an infuriating rabbit hole we don’t have time for—but, spoiler: Altman is the villain.) Grok 4 is, in a way, Elon’s middle finger to Sam Altman, saying, “It is possible, it does work, and it’s better.” And so far? That’s turning out to be true.
Altman’s AI is reminiscent of scratched dollar-store glasses; Grok 4 is like optometrist-prescribed lenses—and it’s not difficult to know which sees the world with more clarity.
