We built our own AI detector — here is what we learned
When AI texts flooded the SERPs, we had to check client content by hand. Paid services (Copyleaks, Originality.ai) helped but never explained the "why". Once we understood their logic, we realized they analyze text with… AI itself. So we built our own free AI detector that doesn't just give a verdict — it shows where and how the AI left its traces.
This article distills what we learned from thousands of checked texts. From our observations across client projects:
Google's official position is in its AI content guidance: quality matters, not the production method. But raw machine text is recognizable — by algorithms and by readers. Here is exactly how.
GPT language markers: the phrases that give the machine away
GPT builds text from statistically frequent constructions. It sounds human, but over a paragraph the recognizable clichés emerge. The most common openers:
The second group is bureaucratic glue AI starts sentences with: given that, within the framework of, in the context of, based on, with regard to, in accordance with. The third — empty generalizations GPT uses to fill a paragraph when it has no facts:
If such constructions appear back-to-back and there are no concrete facts, numbers, or examples — you are almost certainly looking at raw AI. How to add facts properly — see our copywriter brief guide.
GPT vs human: the comparison table
| Trait | GPT | Human |
|---|---|---|
| Structure | Perfectly logical: thesis → argument → conclusion | Can be loose, improvised |
| Tone | Polite, academic, judgment-free | Emotional, personal, with humor |
| Transitions | Explicit connectors: "nevertheless", "thus" | Often intuitive, unmarked |
| Mistakes | None | Present — sometimes deliberate |
| Paragraphs | Same length, symmetrical | Uneven: from one line to a wall of text |
| Arguments | Always "by the textbook", no digressions | Sometimes illogical yet convincing |
Machine logic: the rule of three and perfect symmetry
Even if you ban GPT's pet phrases, you can't fool the logic. Humans mess up: odd constructions, missing commas (my editor will confirm). GPT doesn't. Hence three stable patterns:
- The rule of three. "Useful, structured, and fact-based" — GPT adores splitting ideas into threes: three adjectives, three bullets, three blocks under every heading (intro → explanation → conclusion).
- Structural symmetry. Same-length paragraphs; each opens with a lead-in and closes with a bridge to the next. We asked GPT itself why — it answered: "I build text like a well-structured article, by the textbook."
- Excessive politeness. Instead of "this doesn't work" — "some users may find this approach insufficiently effective under certain conditions". Bluntness, humor, and doubt are human; neutral diplomacy in every sentence is machine.
Suspect AI content is already dragging your site down?
Run a free audit: we'll find the problem pages and the traffic-recovery levers.
Special-character evidence: invisible to the eye, visible in code
The most reliable part of our system. Humans physically don't type these characters — GPT inserts them constantly. Open the text in HTML mode and look for:
— (—)The em dash. Humans use 1–2 per text. GPT drops up to 19 per page.
“ ” (“ ”)"Typographic" curly quotes. Almost never seen in real web copy — humans type plain "straight" quotes.
→ (→)Arrow glyphs. A human draws an arrow the rustic way: hyphen + greater-than (->).
(0xa0)The non-breaking space. Authors type regular spaces and don't bother.
’ (’)The "proper" apostrophe instead of the human '. Machine typography.
… (…)The ellipsis character. Humans type three dots in a row...
 The thin space. Most authors don't even know it exists.
© ® (© ®)A human writes (c) or (R) — these symbols aren't on the keyboard.
Evidence in the markup
- Perfectly closed tags. Every
<p>,<li>,<div>closed to standard — without a single human slip. - Mechanical lists.
<ul><li><p>Text</p></li></ul>instead of a simple<li>Text</li>. <hr />with the closing slash and horizontal divider lines between sections — GPT's signature, a "coming-out in front of Google".- data-start / data-end attributes in headings and lists — technical markup no human ever adds.
Detection tools: where to start
Manual marker analysis is the most accurate but slow. At scale, the combo "detector + spot manual checks" works best:
Unmiss AI Detectorfree
Our tool: paste the text → get not just a verdict but a breakdown of where and how the AI left traces. Built on the experience behind this article. Try it →
Copyleaks
One of the most accurate commercial detectors, supports many languages. Good for checking contractors at scale.
Originality.ai
The western market standard: AI detection + plagiarism in one report. Paid, optimized for English.
GPTZero
The popular academic detector: scores text "perplexity" and "burstiness". Free tier available.
By the way, building your own tool today is easier than it seems — see our AI tool development service. And for using AI in SEO the smart way — our ChatGPT for SEO guide and the 50 mega-prompts collection.
The manual check checklist: 7 steps
- Search for marker phrases. Ctrl+F: "in today's world", "it is worth noting", "thus". 3+ hits — yellow flag.
- Fact check. Are there concrete numbers, names, examples? Generalizations without facts are AI filler's main sign.
- Paragraph rhythm. Step back from the screen: if every paragraph looks identical — that's machine symmetry.
- The rule of three. Count the triples: three adjectives, three bullets, three blocks per section.
- Code audit. Open the HTML: — more than three times, curly quotes, data attributes, <hr />.
- Run it through a detector. Unmiss / Copyleaks / GPTZero — for confirmation, not instead of your head.
- The usefulness test. Google's key question: will the reader learn something the top-3 results don't offer? If not — it doesn't matter who wrote it.
The same approach works in reverse — to "humanize" an AI draft: remove the markers, add facts and personal experience, break the symmetry. How to write commercial copy that sells — in our commercial content article.
Why it matters: AI text and visibility in Google and AI search
The paradox of 2026: AI search engines (AI Overviews, ChatGPT, Perplexity) themselves dislike citing raw AI content. They rely on sources with expertise, facts, and authority — we covered this in detail in our pieces on GEO optimization and backlink sources.
- Raw AI text → templates, zero facts → never cited, risks falling under "scaled content abuse" in Google's spam policies.
- AI draft + editor + facts + experience → full-fledged content that ranks and gets cited. Google doesn't care about the production method.
So checking text for AI is really a check of your content process. The detector catches not "AI" but the absence of human work on the text.
In short: a three-level system
- Language: marker phrases, bureaucratic glue, fact-free generalizations.
- Syntax: the rule of three, symmetrical paragraphs, excessive politeness, perfect grammar.
- Code: special characters (—, “ ”, →, ) and machine markup (data attributes, <hr />).
- Tools speed things up but don't replace you: always verify a detector's verdict against the evidence above.
- The goal isn't to "catch AI" — it's to never publish useless content: that's what Google penalizes and AI search ignores.
Data sources
- Google — official position on AI content: Search and AI content; spam policies (scaled content abuse): spam policies.
- Unmiss — our free AI detector with evidence breakdown: ai-content-detector.
- Copyleaks — AI content detector; Originality.ai — originality.ai; GPTZero — gptzero.me.
The percentages at the top (85% template phrases, 90% perfect grammar, ~25% of sites dropping after AI spam) are SEOquick's internal observations across checked texts and client projects — practical reference points, not an academic study. The marker and special-character lists come from our work on the Unmiss detector.