Blog / Content / How to detect AI text
Content · AI · updated June 2026

How to detect AI text: markers, code-level evidence, and detection tools

We built our own AI detector and ran thousands of texts through it. Here is the whole system: GPT language templates, syntax patterns, special-character evidence invisible to the eye — and what to do about it to stay clear of Google penalties.

AI TEXT SCAN 2026 In today's world… It is worth noting… — 19 em dashes per page Thus, in conclusion… DETECTOR VERDICT 85% AI AI DETECTED
Template phrases, special characters, machine logic — AI text leaves traces. Below is the full evidence map.
Why we know this

We built our own AI detector — here is what we learned

When AI texts flooded the SERPs, we had to check client content by hand. Paid services (Copyleaks, Originality.ai) helped but never explained the "why". Once we understood their logic, we realized they analyze text with… AI itself. So we built our own free AI detector that doesn't just give a verdict — it shows where and how the AI left its traces.

This article distills what we learned from thousands of checked texts. From our observations across client projects:

85%of AI texts contain the template marker phrases listed belowSEOquick observations
90%of AI texts have perfectly even grammar and punctuation — humans don't write like thatSEOquick observations
~25%of sites that came to us with traffic drops had lost it after mass-publishing raw AI contentSEOquick project stats
Important: Google doesn't punish AI as such — it punishes useless content created to manipulate rankings. Checking text for AI is a check for "rawness", not a witch hunt.

Google's official position is in its AI content guidance: quality matters, not the production method. But raw machine text is recognizable — by algorithms and by readers. Here is exactly how.

Level 1 · language

GPT language markers: the phrases that give the machine away

GPT builds text from statistically frequent constructions. It sounds human, but over a paragraph the recognizable clichés emerge. The most common openers:

In today's worldIt's no secret thatIt is worth noting thatIt is important to understandOne of the key aspects isOne should considerThusThis article explores

The second group is bureaucratic glue AI starts sentences with: given that, within the framework of, in the context of, based on, with regard to, in accordance with. The third — empty generalizations GPT uses to fill a paragraph when it has no facts:

There are many ways toEach case requires an individual approachThere is no definitive answerSeveral factors must be consideredThis is especially relevant in

If such constructions appear back-to-back and there are no concrete facts, numbers, or examples — you are almost certainly looking at raw AI. How to add facts properly — see our copywriter brief guide.

GPT vs human: the comparison table

TraitGPTHuman
StructurePerfectly logical: thesis → argument → conclusionCan be loose, improvised
TonePolite, academic, judgment-freeEmotional, personal, with humor
TransitionsExplicit connectors: "nevertheless", "thus"Often intuitive, unmarked
MistakesNonePresent — sometimes deliberate
ParagraphsSame length, symmetricalUneven: from one line to a wall of text
ArgumentsAlways "by the textbook", no digressionsSometimes illogical yet convincing
Level 2 · syntax

Machine logic: the rule of three and perfect symmetry

Even if you ban GPT's pet phrases, you can't fool the logic. Humans mess up: odd constructions, missing commas (my editor will confirm). GPT doesn't. Hence three stable patterns:

  • The rule of three. "Useful, structured, and fact-based" — GPT adores splitting ideas into threes: three adjectives, three bullets, three blocks under every heading (intro → explanation → conclusion).
  • Structural symmetry. Same-length paragraphs; each opens with a lead-in and closes with a bridge to the next. We asked GPT itself why — it answered: "I build text like a well-structured article, by the textbook."
  • Excessive politeness. Instead of "this doesn't work" — "some users may find this approach insufficiently effective under certain conditions". Bluntness, humor, and doubt are human; neutral diplomacy in every sentence is machine.
GPT: symmetry Human: a living rhythm
The visual "rhythm" of text: GPT produces same-length blocks with identical connectors; humans produce a ragged, living paragraph pattern.

Suspect AI content is already dragging your site down?

Run a free audit: we'll find the problem pages and the traffic-recovery levers.

Check my site →
Level 3 · code

Special-character evidence: invisible to the eye, visible in code

The most reliable part of our system. Humans physically don't type these characters — GPT inserts them constantly. Open the text in HTML mode and look for:

— (—)

The em dash. Humans use 1–2 per text. GPT drops up to 19 per page.

“ ” (“ ”)

"Typographic" curly quotes. Almost never seen in real web copy — humans type plain "straight" quotes.

→ (→)

Arrow glyphs. A human draws an arrow the rustic way: hyphen + greater-than (->).

  (0xa0)

The non-breaking space. Authors type regular spaces and don't bother.

’ (’)

The "proper" apostrophe instead of the human '. Machine typography.

… (…)

The ellipsis character. Humans type three dots in a row...

 

The thin space. Most authors don't even know it exists.

© ® (© ®)

A human writes (c) or (R) — these symbols aren't on the keyboard.

Evidence in the markup

  • Perfectly closed tags. Every <p>, <li>, <div> closed to standard — without a single human slip.
  • Mechanical lists. <ul><li><p>Text</p></li></ul> instead of a simple <li>Text</li>.
  • <hr /> with the closing slash and horizontal divider lines between sections — GPT's signature, a "coming-out in front of Google".
  • data-start / data-end attributes in headings and lists — technical markup no human ever adds.
Automation

Detection tools: where to start

Manual marker analysis is the most accurate but slow. At scale, the combo "detector + spot manual checks" works best:

Unmiss AI Detectorfree

Our tool: paste the text → get not just a verdict but a breakdown of where and how the AI left traces. Built on the experience behind this article. Try it →

Copyleaks

One of the most accurate commercial detectors, supports many languages. Good for checking contractors at scale.

Originality.ai

The western market standard: AI detection + plagiarism in one report. Paid, optimized for English.

GPTZero

The popular academic detector: scores text "perplexity" and "burstiness". Free tier available.

Fair warning: every detector makes mistakes. Well-edited AI text passes, while dry human officialese gets "caught". A detector verdict is a reason for a manual check — not a sentence.

By the way, building your own tool today is easier than it seems — see our AI tool development service. And for using AI in SEO the smart way — our ChatGPT for SEO guide and the 50 mega-prompts collection.

Practice

The manual check checklist: 7 steps

  • Search for marker phrases. Ctrl+F: "in today's world", "it is worth noting", "thus". 3+ hits — yellow flag.
  • Fact check. Are there concrete numbers, names, examples? Generalizations without facts are AI filler's main sign.
  • Paragraph rhythm. Step back from the screen: if every paragraph looks identical — that's machine symmetry.
  • The rule of three. Count the triples: three adjectives, three bullets, three blocks per section.
  • Code audit. Open the HTML: &mdash; more than three times, curly quotes, data attributes, <hr />.
  • Run it through a detector. Unmiss / Copyleaks / GPTZero — for confirmation, not instead of your head.
  • The usefulness test. Google's key question: will the reader learn something the top-3 results don't offer? If not — it doesn't matter who wrote it.

The same approach works in reverse — to "humanize" an AI draft: remove the markers, add facts and personal experience, break the symmetry. How to write commercial copy that sells — in our commercial content article.

The 2026 context

Why it matters: AI text and visibility in Google and AI search

The paradox of 2026: AI search engines (AI Overviews, ChatGPT, Perplexity) themselves dislike citing raw AI content. They rely on sources with expertise, facts, and authority — we covered this in detail in our pieces on GEO optimization and backlink sources.

  1. Raw AI text → templates, zero facts → never cited, risks falling under "scaled content abuse" in Google's spam policies.
  2. AI draft + editor + facts + experience → full-fledged content that ranks and gets cited. Google doesn't care about the production method.

So checking text for AI is really a check of your content process. The detector catches not "AI" but the absence of human work on the text.

Takeaways

In short: a three-level system

  1. Language: marker phrases, bureaucratic glue, fact-free generalizations.
  2. Syntax: the rule of three, symmetrical paragraphs, excessive politeness, perfect grammar.
  3. Code: special characters (&mdash;, “ ”, →, &nbsp;) and machine markup (data attributes, <hr />).
  4. Tools speed things up but don't replace you: always verify a detector's verdict against the evidence above.
  5. The goal isn't to "catch AI" — it's to never publish useless content: that's what Google penalizes and AI search ignores.

Data sources

  1. Google — official position on AI content: Search and AI content; spam policies (scaled content abuse): spam policies.
  2. Unmiss — our free AI detector with evidence breakdown: ai-content-detector.
  3. CopyleaksAI content detector; Originality.aioriginality.ai; GPTZerogptzero.me.

The percentages at the top (85% template phrases, 90% perfect grammar, ~25% of sites dropping after AI spam) are SEOquick's internal observations across checked texts and client projects — practical reference points, not an academic study. The marker and special-character lists come from our work on the Unmiss detector.

Content that even AI cites

SEOquick builds content processes for Google and AI search: strategy, briefs, facts, quality control — and visibility growth. Since 2008 · 500+ projects · 11 countries.