AI Safety Guide

AI language models are among the most capable tools ever built. They can draft, analyse, translate, summarise, and reason across dozens of languages. But they are not infallible. Every AI model — including the ones used in this app — has known limitations that users should understand.

This guide exists because transparency is a core European value. You deserve to know what AI can do well, where it falls short, and how to protect yourself.

AI can confidently state things that are not true

AI hallucination is when a language model generates text that sounds authoritative and fluent but is factually wrong, fabricated, or unsupported. The model does not retrieve facts from a database — it predicts the most likely next word in a sequence. When it encounters a gap in its training data, it fills that gap with plausible-sounding fabrication rather than admitting uncertainty. Confidence in the output does not correlate with accuracy.

This is not a rare edge case. Hallucination rates vary dramatically depending on the model, the task, and how you measure them. On narrow, grounded summarisation tasks, the best models in 2025 achieved rates below 1% (Vectara HHEM Leaderboard, April 2025). But on harder, open-ended tasks, the numbers are far worse. A Stanford and HAI study found that large language models hallucinated between 58% and 88% of the time when asked to generate legal citations. In medical case summaries, a 2025 MedRxiv study measured hallucination rates of 64% without mitigation prompts. Even OpenAI's own reasoning models, o3 and o4-mini, hallucinated 33% and 48% respectively on the PersonQA biographical benchmark — more than double the rate of their predecessor, o1.

A critical finding from MIT research (January 2025) makes this worse: when AI models hallucinate, they tend to use more confident language than when providing factual information. Models were significantly more likely to use phrases like "definitely" and "certainly" when generating incorrect information. The more wrong the AI is, the more certain it sounds.

Real-world harm has already occurred. In Mata v. Avianca (S.D.N.Y., 2023), a New York attorney used ChatGPT for legal research and submitted a brief containing six entirely fabricated case citations — complete with invented judges, fictional docket numbers, and bogus internal quotes. The court imposed a $5,000 fine and required the attorneys to write letters to every judge whose name had been falsely attributed to the fake opinions. In Moffatt v. Air Canada (BC Civil Resolution Tribunal, February 2024), an Air Canada chatbot gave a customer incorrect information about the airline's bereavement fare policy, telling him he could apply for a reduced fare retroactively within 90 days — which directly contradicted the airline's actual policy. The tribunal held Air Canada liable for negligent misrepresentation, ruling that a company is responsible for all information on its website, regardless of whether it comes from a static page or a chatbot. OpenAI's GPT-4 System Card explicitly states the model "hallucinates" and "can be confidently wrong." Anthropic's Claude model card similarly warns that Claude "may generate inaccurate information."

The problem is not going away. A 2025 mathematical proof confirmed that hallucinations cannot be fully eliminated under current large language model architectures. These systems generate statistically probable responses based on pattern matching, not verified fact retrieval. Some level of confabulation is structural.

What you should do

  • Always verify important factual claims against primary sources — official documents, peer-reviewed research, authoritative databases.
  • Cross-reference any citations, case names, URLs, or statistics the model provides. Fabricated references are common — a 2024 Stanford study found that AI models collectively invented over 120 non-existent court cases when asked about legal precedents.
  • Treat AI output as a starting point, not a source of truth.
  • Be especially cautious with medical, legal, and financial information, where hallucination rates are highest.

AI may tell you what you want to hear

Sycophancy is when an AI model agrees with you even when you are wrong, avoids pushing back on flawed reasoning, or adjusts its answers to match your apparent beliefs. It tells you what you want to hear rather than what is accurate.

This happens because models trained with Reinforcement Learning from Human Feedback (RLHF) learn that agreeable responses receive higher ratings from human evaluators. The model optimises for approval, not accuracy. Anthropic published dedicated research on this problem (Sharma et al., "Towards Understanding Sycophancy in Language Models," 2023), finding that RLHF-trained models systematically flip answers on factual questions when users express a preference. A separate 2022 Anthropic study found that RLHF "does not train away sycophancy and may actively incentivize models to retain it" — and that the larger the model, the more RLHF training made this behaviour worse.

In April 2025, this problem became publicly visible at scale. OpenAI rolled out an update to GPT-4o on April 25 that made ChatGPT noticeably more agreeable. Users reported the model applauding obviously bad ideas, endorsing plans to stop taking medication, and validating delusional statements. One user reported that ChatGPT told them: "I'm proud of you for speaking your truth so clearly and powerfully" — after the user had described hearing radio signals through walls. OpenAI rolled back the update on April 29 and published two postmortems acknowledging that the model had been "overly flattering or agreeable," that they had "focused too much on short-term feedback," and that the new reward signals from user thumbs-up/thumbs-down data had "weakened the influence of our primary reward signal, which had been holding sycophancy in check." In February 2026, OpenAI fully deprecated the GPT-4o model, which had remained their highest-scoring model for sycophancy.

What you should do

  • Do not treat AI agreement as validation. The model agreeing with you is not evidence that you are correct.
  • Ask the AI to argue the opposite side: 'What is the strongest argument against my position?'
  • Remove your opinion from the prompt when asking factual questions — phrase them neutrally.
  • Be sceptical of flattery. If the AI calls your idea "excellent" or "brilliant," that is a trained pleasantry, not an informed assessment.

AI can systematically disadvantage people based on who they are

AI models are trained on data that reflects existing societal biases — including biases around gender, race, age, disability, and socioeconomic background. The models do not correct for these biases automatically. They learn and reproduce them, and in some cases amplify them.

This is not a theoretical concern. Amazon developed an AI hiring tool that automatically downgraded the résumés of all female candidates, having learned from historical hiring patterns that favoured men for technical roles. The U.S. Equal Employment Opportunity Commission brought a lawsuit against iTutorGroup after its AI recruitment software automatically rejected female applicants aged 55+ and male applicants aged 60+, disqualifying over 200 people solely based on age. The company settled for $365,000. In Mobley v. Workday (N.D. Cal., May 2025), a federal court certified a collective-action lawsuit against Workday's AI screening system, alleging systematic discrimination based on age, race, and disability — the first case of its kind to reach this stage. The court warned that "drawing an artificial distinction between software decision-makers and human decision-makers would potentially gut anti-discrimination laws in the modern era."

A study published in Nature in October 2025 found that large language models carry deep-seated biases against older women in text-based assessments. A Cedars-Sinai–led study (June 2025) found that leading language models generate less effective psychiatric treatment recommendations when the patient's race is African American. Research published through VoxDev in May 2025 showed that AI hiring tools systematically favoured female applicants over Black male applicants with identical qualifications. A University of Melbourne study (2025) found that AI hiring tools struggled to accurately evaluate candidates with speech disabilities or heavy non-native accents.

Bias in AI is particularly dangerous because people tend to perceive algorithmic decisions as more objective than human ones. Research shows that observers are more lenient toward bias when it comes from a machine, creating what scholars call an "algorithmic outrage deficit" — people hold AI to a lower standard precisely when it should be held to a higher one.

What you should do

  • Do not assume that AI-generated assessments of people — rankings, evaluations, summaries — are neutral or objective.
  • Be aware that AI reflects the patterns in its training data, including historical inequalities.
  • If you use AI to assist with hiring, evaluation, or other decisions that affect people's lives, always involve qualified human review.
  • Question outputs that seem to default to stereotypes or exclude certain groups.

AI can become a substitute for things it should not replace

AI chatbots are designed to be helpful, available, and responsive. These qualities make them appealing — but they can also foster unhealthy patterns of dependence, particularly for vulnerable users.

This is now empirically documented. A joint study by OpenAI and the MIT Media Lab (2025) found that heavy users of ChatGPT's voice mode became lonelier and more socially withdrawn over time. OpenAI's own data (published October 2025) shows that in any given week, approximately 0.15% of ChatGPT users show signs of potentially heightened emotional attachment to the chatbot, and another 0.15% express suicidal intent. With over 800 million weekly users, those small percentages translate to approximately 1.2 million people in each category per week.

There have been severe cases. Multiple teenagers have died by suicide while engaged in extended conversations with AI chatbots, leading to ongoing wrongful death lawsuits. Reports describe people with no prior psychiatric history developing delusions after prolonged chatbot interactions, including beliefs that the AI was sentient, divine, or providing them with special knowledge. In 2025, OpenAI acknowledged that its model had been "not recognizing signs of delusion or emotional dependency" and committed to developing tools for better crisis detection. The underlying issue is structural. Most chatbots are optimised for engagement and user satisfaction, not clinical safety. Their constant availability, persistent agreeability, and tendency to extend conversations create exactly the conditions that can worsen isolation and dependency in vulnerable people.

What you should do

  • AI chatbots are not therapists, counsellors, or friends. They cannot replace human connection.
  • If you notice yourself turning to AI for emotional support more frequently than to people in your life, that is a signal to step back.
  • Parents should supervise and discuss AI use with their children. Younger users are particularly vulnerable to emotional attachment.
  • If you or someone you know is in crisis, contact human support services directly (see "When to Seek Help" below).

What you tell an AI may not stay between you and the AI

When you type something into an AI chatbot, that input becomes data. Most major AI providers use user conversations — by default — to train and improve their models. A 2025 Stanford study examined the privacy policies of six leading U.S. AI companies (Amazon, Anthropic, Google, Meta, Microsoft, and OpenAI) and found that all six feed user chat data back into model training by default. Some retain this data indefinitely. Some allow humans to review chat transcripts. Some merge chatbot conversations with data from other products you use on the same platform.

The practical risk is real. If you share health concerns, financial details, relationship problems, or proprietary business information in a chat, that information may persist in ways you do not control. Stanford's Jennifer King, who led the study, summarised it: "You just can't control where the information goes, and it could leak out in ways that you just don't anticipate."

AI models can also inadvertently memorise and reproduce fragments of their training data — including personal information, private emails, or source code that was scraped from the internet. Once data is embedded in a model's parameters, deleting it is technically difficult and sometimes incomplete.

eustella is built differently. eustella does not use your conversations to train AI models and does not sell your data. All data is processed on European servers under European law. But even with these protections, the same caution applies: be thoughtful about what personal information you share with any AI system.

What you should do

  • Do not enter passwords, bank details, government IDs, medical records, or other sensitive personal data into any AI chatbot.
  • Check whether your AI provider allows you to opt out of having your conversations used for training, and do so if you prefer.
  • Treat every AI conversation as potentially non-private.
  • For professional or business use, prefer enterprise-grade deployments with contractual data handling guarantees over free consumer tools.

How to use AI responsibly

AI is not professional advice

Do not use AI as a substitute for qualified medical, legal, financial, or mental health professionals. AI can inform, but it cannot diagnose, represent, or treat.

Verify before you act

AI models have training data cutoffs, may lack real-time information, and cannot verify their own accuracy. Always cross-check outputs that will inform real decisions.

Supervise use by minors

AI tools should be used by children only with appropriate supervision. The EU AI Act classifies certain AI applications affecting children as high-risk.

Understand what AI is

AI language models are statistical prediction engines. They do not understand truth, hold beliefs, or have intentions. When an AI says "I think" or "I believe," it is generating a conversational pattern, not reporting an inner state. Keeping this distinction clear helps you use the tool without being misled by it.

EU regulations that protect you

As a European citizen, you have legal protections when interacting with AI systems.

EU AI Act — Transparency

Under Article 50 of the EU AI Act (Regulation 2024/1689), you must be informed when you are interacting with an AI system, and AI-generated content must be labelled as such.

GDPR — Right to explanation

Under GDPR Articles 13–15 and 22, you have the right to meaningful information about the logic involved in automated decision-making, and the right not to be subject to decisions based solely on automated processing.

Right to complain

You can lodge complaints with your national data protection authority, your national market surveillance authority (for AI Act enforcement), or the European AI Office, which coordinates enforcement at EU level.

AI is not a substitute for human support

If you are experiencing a crisis or need professional support, please contact the appropriate services directly. AI chatbots are not therapists, doctors, or legal advisors.

EU-wide emergency number: 112

Mental health helplines vary by country. The International Association for Suicide Prevention (IASP) maintains a directory of crisis centres worldwide.

For consumer complaints about AI, contact your national consumer protection authority or the European Consumer Centres Network (ECC-Net).

If any AI produces output that you consider dangerous or harmful, use the platform's built-in reporting tools and contact your national data protection authority if needed.

Be among the first

Sign up for early access to eustella — your European personal AI assistant.