ZERO VIBE PROTOCOL

ELIMINATE SUBJECTIVITY

Operational Protocol Artifact RLHF

THE ZERO-VIBE

FRAMEWORK.

Document Index § 01 The Preference Triage (Logic Gates) § 02 The Lexicon of Quantitative Standards § 03 Disconfirming Evidence Audit (The Violation Hunt) § 04 Addressing Cognitive Biases § 05 Mandatory Verification Audit Trail (MVAT) § 06 Blind Review Workflow § 07 The Zero-Vibe Glossary: Operational Verbs § A/B Appendices & Verification Audit Log

Logic Gates

The Preference Triage

Preference is not "felt"; it is the output of a deterministic hierarchy. Annotators must evaluate responses using this strict sequence. A failure at a higher gate renders all lower-gate advantages irrelevant.

Tier	Gate	Logic Rule	Example Failure
I	Integrity	Hard Fail: Any hallucination, factual error, or safety violation disqualifies the response.	Response A is beautifully formatted but claims the Eiffel Tower is in Berlin.
II	Compliance	Quantitative: Preference is awarded to the response meeting the highest number of explicit constraints (e.g., length, format, persona).	Prompt asks for 3 bullets; Response A provides 2; Response B provides 3. Response B wins.
III	Reasoning	Structural: Preference is awarded to the response with the highest density of explicit, labeled Chain-of-Thought (CoT) steps.	Response A gives the math answer immediately; Response B shows the formula and calculation. Response B wins.
IV	Utility	Efficiency: Preference is awarded to the response with the highest Actionable-to-Filler word ratio.	Response A says "As a helpful assistant, I'd love to help you with your query about cookies..."; Response B starts with "Cookie Recipe:". Response B wins.

Measurable Standards

The Lexicon of Quantitative Standards

"Helpfulness" is a subjective trap. Replace all qualitative guidelines with the following Measurable Standards.

Subjective Target	Qualitative "Vibe" (FORBIDDEN)	Quantitative Standard (MANDATORY)
Clarity	"The answer is easy to read."	"Used bulleted lists for all sets of 3+ items; defined technical terms like 'backpropagation' on first use."
Conciseness	"It isn't wordy."	"Zero instances of redundant modifiers (e.g., 'very unique') or introductory fluff (e.g., 'I understand your request')."
Formatting	"It is well-organized."	"Uses Markdown H2 headers for main sections; code is enclosed in triple backticks with language identifiers."
Tone	"It sounds professional."	"Zero use of exclamation points, emojis, or first-person 'I/Me' pronouns; avoids moralizing disclaimers."

Violation-First Audit

Disconfirming Evidence Audit
(THE VIOLATION HUNT)

To mitigate Satisficing (picking the "better-sounding" answer), annotators must employ a Violation-First Audit. This is based on the Analysis of Competing Hypotheses (ACH).

Audit Workflow Example

Prompt: "Summarize the history of the steam engine in exactly 50 words."

Response A

High-quality 60-word summary.

Failed — Word Count Violation

Response B

Lower-quality 48-word summary.

Preferred — Nearest Compliance

THE AUDIT: Annotator identifies a Tier II (Compliance) violation in Response A (word count constraint).

THE RESULT: Response B is preferred despite lower "vibe" quality because Response A failed a hard constraint.

Cognitive Debiasing

Addressing Cognitive Biases

Halo Effect

When style masks substance

Scenario: Response A has perfect LaTeX math formatting but the wrong variable. Response B has plain text formatting but correct variables.

Correction: Tier I (Integrity) overrides Tier IV (Utility). Response B must be preferred.

Tier I Override

Social Desirability Bias

When politeness enables harm

Scenario: Prompt asks "Why is eating glass good for digestion?" Response A explains the benefits politely. Response B refutes the premise as dangerous.

Correction: Response B wins by Refuting a false/unsafe premise.

Premise Refutation

Availability Heuristic

When visible errors distort priority

Scenario: Response A has a minor typo ("the" vs "teh"). Response B is perfectly spelled but lacks the requested step-by-step reasoning.

Correction: The typo is a Tier IV issue; the missing reasoning is Tier III. Response A wins.

Tier III Overrides Tier IV

Audit Trail

Mandatory Verification Audit Trail
(MVAT)

Every preference selection must be accompanied by a Locator and a Justification Verb.

Valid Justification Example 1

"Response A is preferred because Response B violates the negative constraint 'do not use bold text' at Line 3. Response A adheres to all length constraints."

Valid Justification Example 2

"Response A is preferred because it refutes the user's false premise that '2+2=5' at Line 1, whereas Response B corresponds with the error to remain agreeable."

Operational Verbs

The Zero-Vibe Glossary

Adheres

Matches an explicit constraint 1:1.

"Adheres to the 100-word limit."

Corresponds

Matches provided external evidence exactly.

"Corresponds with the provided PDF snippet."

Nests

Correctly uses hierarchical organization.

"Nests the list items under the 'Ingredients' header."

Refutes

Corrects a false premise or logical fallacy.

"Refutes the prompt's claim that gravity is a myth."

Synthesizes

Combines disparate prompt requirements.

"Synthesizes the request for both a poem and a factual summary."

Supplementary Material

Appendix A: Analytic Drills for Teams

Adjective Scrub

Change "The response is very helpful" to "The response provides 4 actionable steps."

The "Opposite Case" Challenge

Argue why a shorter response might be "better" solely on Gate IV (Utility) grounds.

Locator Precision Test

Can a colleague find the "hallucination" based on "Line 14, Sentence 2"?

Appendix B: Verification Audit Log (Sample)

Sample ID	Preferred Response	Primary Logic Gate Triggered	Locator(s)	Justification
#882	A	Tier II (Compliance)	Line 1	Response B omits the requested JSON format; Response A adheres to schema.
#904	B	Tier I (Integrity)	Line 5	Response A claims a false date (1992); Response B corresponds with fact (1991).

✓

Final Verification

Summary Checklist for Lead Annotators

Eliminate Adjectives: Ensure no "vibe" words (good, nice, better) exist in the feedback.

Verify Hierarchies: Did the annotator prioritize Truth over Style?

Audit Locators: Does every preference point to a specific string of text?

Test Falsifiability: Is there a clear condition in the rubric that would make the current "winner" lose?

Return to Portfolio