AI in Higher Education Assessment: Reliability Over Hype

Written by Rasmus Blok | Jun 30, 2026

Generative AI has crossed a line that the sector is still catching up with. In barely two years it has gone from a fringe curiosity to ubiquitous infrastructure — available to every student, at near-zero cost, on demand. The comfortable assumption that a piece of written work represents a single person’s unaided thinking no longer holds by default. And unlike most technology shifts in higher education, this one does not allow for a multi-year policy cycle before institutions respond. The pressure is immediate.

That puts assessment, not teaching, at the centre of the AI conversation. Because if AI can reproduce the outputs long used to certify learning, then the credential itself is at risk. Employer trust, accreditation and public confidence all rest on one thing: the credibility of the assessment behind the qualification. This is the real story of AI in education — not productivity hacks or essay generators, but a structural test of whether a qualification still means what it claims to mean.

Here is how I see the next few years unfolding — three trends I think will define them — and why I believe the institutions that thrive will be the ones that choose reliability over spectacle, and demand the same of their technology partners.

The dual mandate: embrace and shield

Institutions face two imperatives at once, and they pull in opposite directions. On one side, they must embed AI literacy into the curriculum, because graduates are entering a workforce already restructured around these tools; pretending AI does not exist serves no one. On the other, they must protect the integrity of assessments that exist precisely to certify individual, unassisted capability.

These two mandates cannot be solved in separate places. They have to coexist within the same programmes, the same platforms and the same processes. That is uncomfortable, and it is why simplistic answers — “ban it” or “embrace it” — both fail. The honest position is that some assessment should welcome AI and some must deliberately exclude it, and an institution needs the infrastructure to do both, consistently, at scale.

Trend one: assessment becomes a system, not a format

For decades, assessment has often been defined by a single artefact — the essay, the multiple-choice test, the final exam. That model is breaking down. As confidence in unsupervised written work erodes, institutions are rethinking formats, and the response is converging on something more deliberate: multi-modal assessment.

Expect to see more controlled contexts where integrity is critical — on-campus and proctored digital exams. Expect the return of oral and hybrid formats, where a written submission is followed by an oral defence that validates authorship. Expect portfolios and continuous assessment that build evidence over time rather than betting everything on a single high-stakes moment.

The deeper shift is that these will stop being isolated events and start being designed as a coherent sequence — a system that intentionally combines open and controlled contexts to balance pedagogy, efficiency and integrity. The institutions that come out ahead will not be those clinging to a single assessment type. They will be the ones that treat assessment as a system they can design, control and evolve over time — and that choose the tools to support it accordingly.

Trend two: reliable beats impressive

It is easy to be impressed by AI right now. Demos are dazzling, and the temptation to bolt a clever feature onto an exam workflow is strong. But high-stakes assessment is unforgiving in a way that consumer software is not. A grading suggestion that is brilliant nine times out of ten and confidently wrong the tenth is not a 90% success — in an exam context it is a fairness, appeals and accreditation problem.

For AI in education, reliability matters more than brilliance. A feature that behaves predictably, explains itself, leaves an audit trail and fails safely is worth more than one that is occasionally spectacular and never quite accountable. This is the standard UNIwise holds itself to: AI that assists institutional judgement rather than replacing it, with a human in the loop and a clear line of accountability at every step.

Trend three: “vibes” do not scale to high stakes

A lot of early AI development runs on intuition. A model “feels” right, the outputs “look” good, and that is enough to ship in a low-stakes setting. That approach has its place — but it does not survive contact with summative assessment.

When the output determines whether someone qualifies as a nurse, an engineer or a lawyer, “it usually works” is not a standard anyone can defend to an external examiner, a regulator or a student lodging an appeal. High-stakes use cases demand something that intuition-led development cannot provide: measurable consistency, transparency about how a conclusion was reached, defensibility under scrutiny, and behaviour that holds up across millions of cases rather than a handful of demos. The gap between a promising prototype and a system you can stake a degree on is exactly the gap the sector now has to close — and it is far wider than the hype suggests.

Why UNIwise treats assessment as core infrastructure

This is the principle that shapes everything UNIwise builds. However an institution chooses to adopt it — as the full WISEflow platform or as a focused capability bundle such as Marking Studio — assessment is never treated as a cosmetic add-on to a learning platform. It is treated as core infrastructure: the load-bearing system that institutions, students and employers depend on, and that has to keep working under pressure, at volume, with full accountability.

Treating assessment as infrastructure has practical consequences. It means full audit trails, not opaque automation. It means EU-based data processing and a clear point of accountability, rather than fragmented tools that each add risk. It means supporting the full range of formats — written, oral, portfolio, proctored — within one trusted environment, so that as an institution’s approach evolves, the foundation does not have to be ripped out and replaced. And it means institutions adopt at their own pace, starting where the pressure is greatest, always retaining academic ownership and the initiative.

It is why WISEflow is used by more than 150 higher education institutions across 14-plus countries, with millions of users. Not because it is the flashiest tool in the room, but because assessment at that scale has to be dependable.

The bottom line

AI is not the product. Trust, integrity and control — amplified by AI — are the product. The institutions that come through this transition strongest will be the ones that design assessment as a system, insist on reliability over novelty, and refuse to let high-stakes decisions rest on intuition. That is the future UNIwise is building for, and in my view it is the standard the whole sector should expect of its technology partners.

If you want to see what AI-assisted, infrastructure-grade assessment could look like at your institution, UNIwise would be glad to talk.

View full post