Ask A Genius 1617: High-Range IQ Tests in the AI Era: Mega Test, SAT, and Measuring g

How has the Mega Test held up over time as an attempt to measure something like g at the far right tail, and how should we think about that question now that AI tools and constant device use have changed the cognitive environment so dramatically?

Scott Douglas Jacobsen and Rick Rosner discuss whether “high-range” IQ tests like Ronald K. Hoeflin’s Mega Test still measure g meaningfully after decades of public circulation and the rise of AI tools that make many items searchable or solvable by machines. They examine how IQ testing began with Binet’s educational aims, how standardization can imply false precision, and how device-coupled cognition reshapes what “ability” even means. The conversation ranges from SAT/ACT test-optional policies and labor-market turbulence to ethical misuse of group IQ claims and the limited talent-search value of niche extreme-tail testing.

Scott Douglas Jacobsen: One thing I wanted to get your take on concerns the “penetrating cubes” problem on the Mega Test, which has often been described—by people who have taken the test—as one of its most difficult items.

For those who do not know, the Mega Test was created by Ronald K. Hoeflin, the founder of the Mega Society. The Mega Test was published in Omni magazine in April 1985, which means its questions have been publicly available for decades.

The Mega Society remains active and publishes material. The Society’s own account also notes that the Mega Test became “compromised” in the practical sense—meaning that discussion and answer-sharing made later scores unreliable—and it states that scores after 1994 are not accepted for that test.

Relatedly, Hoeflin also created the Titan Test; the Mega Society reports that the Titan Test was likewise compromised and was retired in 2020 for that reason.

Rick Rosner: So, when we say “compromised,” we are not making a mysterious claim. We mean that the content has been publicly available long enough for solutions and partial solutions to circulate, and that modern search tools make some item types easier to solve than they were in the 1980s.

Jacobsen: Before we get to the main point, I want to clarify terminology. In that community, people sometimes say “ultra-high IQ tests,” but I prefer “high-range tests,” because it avoids assuming the instrument is cleanly measuring IQ in the standard psychometric sense. The question of whether such tests measure g—general cognitive ability—at the extreme right tail remains contested, especially given their unsupervised, untimed nature and reliance on specialized reasoning.

With that framing in place: how do you think the Mega Test has held up over time as an attempt to measure something like g at the far right tail? And how should we think about that question now that AI tools and constant device use have changed the cognitive environment so dramatically?

Rick Rosner: The smartphone era effectively began with the introduction of the Apple iPhone in 2007, and the broader point is that people now carry an information portal almost all the time—something that simply was not true in the 1990s and earlier.

For most of human history, people relied largely on their own mental resources on a moment-to-moment basis. You could go to a library, look things up, or consult an expert, but in everyday life, during most waking hours, you were using the resources available in your own brain. Now we are tightly coupled to devices, which changes that fundamentally.

Even if there were no problems with IQ measurement before, there certainly are now, and those problems will grow in the future—assuming IQ remains something we even care about—because we are becoming increasingly intertwined with thinking devices. That forces us to step back and ask why IQ was measured in the first place.

As we have discussed many times, intelligence testing originated with Alfred Binet. The original goal was practical: to identify which children needed additional educational support and which might benefit from more advanced material. That was the core purpose. Later, Lewis Terman, working in California, adapted Binet’s work and helped formalize the scoring system that centered intelligence scores around a mean of 100.

That standardization was based largely on population norms drawn from Western populations, including British samples, which served as an early reference point. Everything became relative to that average. The use of a 100-point mean gives IQ tests a sense of numerical precision that they do not truly possess.

You could just as easily define a scale with a mean of 1,000, giving more digits and the illusion of greater precision. On such a scale, someone with an IQ of 1,005 would not meaningfully differ from someone with an IQ of 997, beyond statistical noise and chance. Even on a 100-point scale, there is a degree of artificial exactness that overstates what these tests can reliably distinguish.

As we move into the future, there probably should be some measure of how well people are functioning in the world we actually inhabit and the world we are moving toward. There is a great deal that is currently broken in educational systems. In the United States, college enrollment declined by roughly 15 percent between 2010 and 2022. Much of that decline appears to have been among men. University enrollment is now close to 60–40 female-to-male, though the exact ratio varies by institution and country.

Boys and young men are being left behind in multiple ways. At the same time, there is a growing anti-elitist distrust of authority and expertise in American culture, which is deeply corrosive. With AI steadily eroding traditional entry-level and white-collar jobs, the outlook for recent graduates is increasingly bleak.

According to the same reporting, many current graduates struggle to secure internships while still in school and cannot find stable employment after graduating. The unemployment rate among recent college graduates is now around 5.8 percent, which is substantially higher than historical norms and roughly double the rate for older graduates. That signals a serious structural problem rather than a temporary fluctuation.

The idea that college will reliably pay for itself through higher wages is now open to question and will likely become even more so. The bigger question is how people get jobs at all anymore, aside from precarious freelance work like Uber Eats or platforms such as OnlyFans.

AI cannot fully replace some of those roles—although, to be clear, there is already a large amount of AI-generated pornography. What AI cannot yet offer consumers is the perceived thrill of a real person exposing themselves. OnlyFans reportedly has millions of creators posting largely sexualized content, and obviously, none of that requires a college degree.

There is a great deal of turmoil in education and in how we think about skills and competence, and that turmoil is going to increase. Somewhere in that chaos is a serious question: do we need a tool for measuring how well people can actually navigate the modern world?

When IQ testing was taken very seriously—say, around 1960—the assumption was that IQ captured something close to everything. It was treated as a general indicator of intelligence and, by extension, of success in life. That assumption no longer holds. The real question now is what people actually need to function and succeed, and how—if at all—you would measure that.

Jacobsen: I think there are at least two rough dimensions when it comes to analytic intelligence. First, there is a functional floor. Most people I have met throughout my life are above that floor. It is very rare to encounter someone who simply cannot operate in the world at all.

When I worked with a special-needs child, I took him to the PNE in Vancouver. He was in a wheelchair, and I was pushing him around. He saw the roller coaster, paused, pointed, and said, “Train.” Functionally the same concept and object to him, roller coaster and freight train.

Rosner: I have also worked with special-needs individuals in volunteer settings.

Jacobsen: “Special needs” is simply a neutral way of describing noticeable gaps in function. It is not a judgment; it is a description of the constraints someone lives with. Even so, people with special needs often retain meaningful areas of functionality. Kim Peek, for example, is a well-known case of extreme cognitive strengths coexisting with serious deficits.

Rosner: What you are getting at, I think, is that most people fall within a relatively narrow band of basic cognitive functionality. It is similar to physical organs. Everyone has a heart, kidneys, and a liver, and while there is variation, especially as people age, among those under 50 or 60 it is rare to find someone with a profoundly defective organ. Evolution imposes a kind of quality control. The same is broadly true of brains. There is a minimum level of cognitive functionality that allows someone to exist in the world. Finding someone who falls below that threshold is unusual.

Jacobsen: Once you are above that minimum, what matters next is sustained investment of time and effort. Even an associate’s or bachelor’s degree in a field—assuming the person is a serious student rather than disengaged—can make a large difference. That applies equally to manual disciplines like piano or carpentry and to more abstract ones like history or literature. That is where you begin to see meaningful differentiation in expertise and capability.

You can see people develop along one vertical or one lateral dimension and go extremely far in that direction. In the future, AI verticals will almost certainly exceed our laterals’ verticals in every domain of intelligence. But once a basic cognitive floor is met, specialization is where domain expertise really emerges.

Some people have very broad capabilities—Terence Tao is an obvious example—but generally what you see is specialization. That specialization does involve IQ, but it is often more revealing and more useful as a composite of personality traits layered on top of IQ. Those combinations are what lead to very high levels of real-world functionality. We tend to label those people “geniuses” occasionally because they are solving real-world problems that no IQ test item has ever come close to approximating.

So my question for you is this: do we need formal measures at all, or is the real test simply putting people into roles and seeing how they perform? The deeper question is always, “to what end?” Are the investments of time and resources into measurement justified by the goals being pursued?

For people who are deeply invested in IQ combined with racial pseudoscience, those investments feel justified because they want a rationale for their perceived group superiority.

Rosner: That is an illegitimate and deeply troubling motivation. Anyone who talks seriously about the IQ of groups is almost always advancing a racist or otherwise anti-humanist agenda.

Jacobsen: The contrast I want to draw is this: consider someone like Charles Murray. One of my former psychology professors—who had scored perfectly on the verbal, quantitative, and analytical sections of the GRE—made an important observation about Murray’s work. Even if you grant, for the sake of argument, that Murray’s strongest empirical claims were true, the ethical conclusion would be the opposite of Murray’s own. It would imply a greater obligation to invest resources and support into people who struggle, not less.

Murray’s argument, by contrast, has often been interpreted as a reason to withdraw support, under the logic that there is little that can be done. That is a moral failure, not a scientific one.

If we return to evidence-based science and evidence-based use of cognitive measurement, the most defensible application is the original and genuinely humane one: identifying who needs help. That might mean extra educational support, targeted instruction, or recognizing that someone is particularly strong in areas like mathematics or reading. It might also mean identifying where learning simply is not clicking—when someone can see the symbols in a math equation or a foreign language but cannot grasp the underlying operations or structure.

Rosner: From that perspective, we may not need extensive formal testing at all. As education becomes fully technologized—with teachers still present, but students interacting continuously with adaptive digital systems—those systems will be able to track performance statistically and dynamically. They will be able to identify where each learner is, most of the time, without relying on blunt, one-off tests.

Another factor is that during COVID, the SAT and ACT largely disappeared. The SAT, in particular, functions as a rough IQ surrogate and as a predictor of college performance based on academic ability.

There has long been an argument that adding SAT scores to an application does not significantly improve predictive accuracy beyond what is already captured by grades, coursework, and recommendations. During the pandemic, thousands of U.S. colleges dropped the requirement and made the tests optional because in-person group testing was impractical. After COVID subsided, many schools reinstated them, at least partially. But the question remains: do they actually add much value?

In practice, they do not add very much, especially given the time investment required if you are not naturally strong at standardized testing. For years, the College Board claimed that you could not study for the SAT because it measured inherent ability, like an IQ test. That turned out not to be true. You can study for the SAT, but doing so often requires dozens of practice tests and hundreds of hours—time that could be better spent learning substantive material.

That seems like a poor trade-off: investing enormous effort to optimize performance on a narrow test rather than acquiring real knowledge or skills. For highly selective schools, it is not even clear that the payoff exists. An ACT score below roughly 33 or 34 out of 36 does not help much at Ivy League–level institutions, where a large fraction of applicants have perfect or near-perfect scores. The same applies to the SAT, where perhaps a quarter of applicants have perfect scores and many more cluster just below that.

That was not always the case. In the 1960s, 1970s, and 1980s, only a handful of students nationwide achieved perfect SAT scores in a given year. A perfect score once carried enormous signaling value. Today, with score compression at the top, that signal has largely evaporated. In that context, it can make more sense simply not to submit scores at all when they are optional.

Jacobsen: This brings us back to the original question about high-range tests like the Mega Test. How well has that approach held up over time? And what should we make of attempts to use SAT-based items—often drawn from older versions of the test, such as the 1995 SAT—to construct IQ-like measures aimed at the extreme right tail, sometimes described as four standard deviations above the mean or more, roughly one in 31,560 people or above?

Rosner: Some members of the Mega Society have argued that these kinds of tests could identify highly intelligent individuals who were missed by conventional educational talent searches. The education system is supposed to function as a meritocratic filter, but it often fails. There are people who are genuinely very intelligent yet lack other traits—social conformity, organizational skills, stable family support, or conventional motivation—that allow them to excel in school.

These individuals may be eccentric, neurodivergent, or simply mismatched with institutional expectations. The argument is that high-range tests might surface those overlooked cases. I am, in some sense, one of those people.

He found at least several people that way, and that is one legitimate purpose of these tests. That person has also been working with others to try to develop new, extremely difficult tests in this vein—tests that cannot easily be gamed.

Part of the problem with the Mega Test and the Titan Test was that aspirants and outright fraudsters looked for shortcuts. They searched the internet for solutions and tried to inflate their scores artificially. There has been a history, not just with these tests but with others as well, of people fraudulently claiming extraordinarily high scores, sometimes with partial success.

In the grand scheme of things, this is not a major problem. It is personally irritating to see someone claim a statistically impossible IQ score and be taken seriously, but it does not have large real-world consequences. No one is being made prime minister of a country on that basis. At worst, someone might get invited to give a talk here or there. It is not a crisis.

If the goal is to find undiscovered talent, however, you want to cast a wide net. That has always been a weakness of these tests. They require a very high time commitment. The person in the Mega Society working on this problem is trying to design a test that reaches very high levels without requiring people to spend 120 hours grinding through extremely difficult problems.

Even so, the Mega Test—the most widely taken high-range test—was probably attempted only about 5,000 times in its entire history. Roughly 4,000 people took it through Omni magazine, a few hundred took it before that, and there has been a slow trickle since. That is an extremely narrow sample and a poor way to identify talent at scale.

I tend to think of high-range IQ testing as a kind of sport that very few people play. It is like the World’s Strongest Man competition. Billions of people try to get stronger through exercise, but only a tiny fraction compete in elite strength sports. Compared to football, basketball, or soccer, powerlifting attracts a very small population.

IQ testing at the extreme right tail is even more niche. It is a strange little sport, played by perhaps one hundred-thousandth the number of people who engage with major athletic competitions. At this point, with individual intelligence increasingly intertwined with—and often overshadowed by—technologically assisted intelligence, you would have to convince me that there is still a compelling use for this entire IQ-testing subculture. That remains an open question.

Rick Rosner is an accomplished television writer with credits on shows like Jimmy Kimmel Live!, Crank Yankers, and The Man Show. Over his career, he has earned multiple Writers Guild Award nominations—winning one—and an Emmy nomination. Rosner holds a broad academic background, graduating with the equivalent of eight majors. Based in Los Angeles, he continues to write and develop ideas while spending time with his wife, daughter, and two dogs.

Scott Douglas Jacobsen is the publisher of In-Sight Publishing (ISBN: 978-1-0692343) and Editor-in-Chief of In-Sight: Interviews (ISSN: 2369-6885). He writes for The Good Men Project, International Policy Digest (ISSN: 2332–9416), The Humanist (Print: ISSN 0018-7399; Online: ISSN 2163-3576), Basic Income Earth Network (UK Registered Charity 1177066), A Further Inquiry, and other media. He is a member in good standing of numerous media organizations.

Photo by Dmitry Ratushny on Unsplash

Last updated May 3, 2025. These terms govern all In Sight Publishing content—past, present, and future—and supersede any prior notices. In Sight Publishing by Scott Douglas Jacobsen is licensed under a Creative Commons BY‑NC‑ND 4.0; © In Sight Publishing by Scott Douglas Jacobsen 2012–Present. All trademarks, performances, databases & branding are owned by their rights holders; no use without permission. Unauthorized copying, modification, framing or public communication is prohibited. External links are not endorsed. Cookies & tracking require consent, and data processing complies with PIPEDA & GDPR; no data from children < 13 (COPPA). Content meets WCAG 2.1 AA under the Accessible Canada Act & is preserved in open archival formats with backups. Excerpts & links require full credit & hyperlink; limited quoting under fair-dealing & fair-use. All content is informational; no liability for errors or omissions: Feedback welcome, and verified errors corrected promptly. For permissions or DMCA notices, email: scott.jacobsen2025@gmail.com. Site use is governed by BC laws; content is “as‑is,” liability limited, users indemnify us; moral, performers’ & database sui generis rights reserved.

Share this:

Related

Leave a comment Cancel reply