Complete AI Companion Glossary
Conversational AI is evolving rapidly, and the vocabulary surrounding it — drawn from machine learning research, software engineering, and cognitive science — can be opaque to non-specialists. This glossary defines the concepts most relevant to understanding how AI companions work, what shapes their behavior, and how to use them effectively. Definitions aim for practical clarity over technical completeness.
A
- Active Recall
- The process by which an AI companion retrieves and surfaces relevant information from earlier in a conversation or from stored memory, without the user explicitly requesting it. Active recall mimics a human conversational partner remembering relevant context — “you mentioned last week that you were anxious about that meeting; how did it go?” Systems with active recall improve conversational continuity and can make interactions feel more genuinely attentive.
- Alignment
- The degree to which an AI system’s outputs and behavior correspond to human values, intentions, and safety constraints. Alignment research addresses the challenge that a highly capable AI optimizing for the wrong objective can produce harmful outcomes even without any malicious intent. In the context of AI companions, alignment work shapes how a model handles sensitive topics, avoids harmful suggestions, and maintains honesty even when a user might prefer a different answer.
- Attention Mechanism
- A computational technique, foundational to the transformer architecture, that allows a model to weigh the relevance of different parts of an input sequence when generating each output token. When generating a pronoun like “it,” the attention mechanism determines which earlier noun the pronoun refers to by assigning higher attention weights to related words. Multi-head attention — running this process in parallel across multiple learned projections — allows transformers to capture different types of relationships simultaneously.
C
- Chain of Thought
- A prompting technique where a model is guided to produce intermediate reasoning steps before reaching a final answer, rather than jumping directly to a conclusion. Chain-of-thought reasoning improves performance on multi-step problems — mathematics, logical inference, planning — because errors in early steps are surfaced and can be corrected within the same response. The technique was formalized in research showing that simply including “let’s think step by step” in a prompt substantially improved model accuracy on benchmark tasks.
- Chat Completion
- The API operation that sends a sequence of messages to a language model and receives a generated response. Most modern AI APIs structure interactions as a list of messages with roles (system, user, assistant), and the model generates the next assistant message. Chat completion is the foundation on which AI companions, customer service bots, coding assistants, and most other conversational AI products are built.
- Context Window
- The maximum amount of text — measured in tokens — that a language model can process in a single inference call, encompassing both the input and the generated output. A model with a 128,000-token context window can “see” roughly 100,000 words of text simultaneously. Context window size determines how much conversation history, background information, and documents a model can reference at once. Information outside the context window is not accessible to the model unless stored externally and retrieved via RAG or summarization.
E
- Embedding
- A numerical vector representation of text (or images, audio, or other data) that encodes semantic meaning in a high-dimensional space. Texts with similar meanings produce embeddings that are geometrically close to each other. Embeddings are the computational foundation for semantic search, clustering, retrieval-augmented generation, and recommendation systems. A word embedding might represent “king” as a 1,536-dimensional vector such that king − man + woman ≈ queen.
F
- Few-Shot Learning
- A prompting technique where a small number of input-output examples are included in the prompt to demonstrate the desired behavior to the model without modifying its underlying weights. A few-shot prompt for sentiment classification might include three examples of reviews labeled positive or negative before presenting the unlabeled review to classify. Few-shot learning exploits the model’s pattern-matching capability and is often sufficient to unlock behaviors that the model could not perform reliably with a zero-shot prompt.
- Fine-Tuning
- A training process that adapts a pre-trained language model to a specific task or style by continuing training on a curated dataset. Fine-tuning updates the model’s weights, producing a model that performs the target behavior without requiring explicit prompting. It is more expensive and technically demanding than prompt engineering but produces more consistent, controllable results for narrow tasks. AI companions often use fine-tuning to establish a consistent persona, communication style, or domain expertise.
G
- Grounding
- The process of connecting a language model’s outputs to verifiable, real-world information rather than relying solely on knowledge encoded during training. Grounded responses cite sources, reference retrieved documents, or perform tool calls (like web search) to verify claims before stating them. Grounding reduces hallucination by anchoring the model to external evidence. An AI companion with grounding capabilities can answer questions about current events or a user’s specific documents accurately, whereas an ungrounded model may confabulate plausible-sounding but incorrect details.
H
- Hallucination
- A confident, fluent model output that states false information as fact. Hallucinations occur because language models are trained to produce plausible continuations of text, not to verify claims against reality. A model asked about a historical figure may generate a believable but fabricated quotation. Hallucination rates vary by model, topic, and prompting strategy. Mitigations include grounding, retrieval-augmented generation, and prompting models to express uncertainty when confidence is low.
I
- Inference
- The process of running a trained model on new input to generate output. Inference is distinct from training: training updates model weights using large datasets over many hours or days on specialized hardware; inference uses the fixed weights to respond to a single query, typically in seconds. Inference cost and latency are the primary operational considerations for AI companion deployment because each user message triggers an inference call.
K
- Knowledge Cutoff
- The date after which a language model has no information from its training data. Events, publications, and developments after the cutoff are unknown to the model unless provided in the context window. Knowledge cutoffs are a fundamental limitation of static trained models — an AI companion asked about a recent news story may either admit uncertainty or, if poorly aligned, confabulate an answer based on related patterns in its training data. Grounding and RAG pipelines are the primary ways to extend a model’s effective knowledge beyond its cutoff.
L
- Latent Space
- The high-dimensional mathematical space in which a model represents its internal understanding of concepts, encoded as vectors of real numbers. Similar concepts cluster together in latent space; arithmetic operations on latent vectors can produce semantically meaningful results. The latent space is not directly inspectable as human-readable knowledge — it is an emergent property of the training process. Embeddings are projections from latent space into a form that applications can use for similarity calculations.
- Long-Term Memory
- A system architecture that stores information from past conversations outside the model’s context window and retrieves it for future sessions. Without long-term memory, a language model treats each conversation as entirely new; with it, an AI companion can remember a user’s name, preferences, previous topics, and goals across multiple sessions. Long-term memory is typically implemented via a vector database that stores conversation summaries or key facts as embeddings and retrieves semantically relevant items at the start of each new session.
M
- Multimodal
- Capable of processing and generating multiple types of data — typically combining text with images, audio, or video. A multimodal AI companion can analyze a photo a user shares, describe what it sees, answer questions about it, and continue the conversation naturally. Multimodal capabilities expand the range of tasks an AI companion can assist with beyond purely text-based interaction. GPT-4V, Claude’s vision capability, and Gemini are examples of multimodal language models.
N
- Natural Language Processing (NLP)
- The field of computer science and linguistics concerned with enabling computers to understand, interpret, and generate human language. NLP encompasses a broad range of tasks: text classification, named entity recognition, sentiment analysis, machine translation, question answering, and conversational AI. Modern large language models have subsumed many classical NLP tasks under a single general-purpose architecture, though specialized NLP pipelines remain common in production systems for specific tasks requiring precise, auditable outputs.
- Neural Network
- A computational architecture loosely inspired by biological neurons, composed of layers of interconnected nodes (neurons) that transform input data through learned weight matrices. Neural networks learn by adjusting weights during training to minimize prediction error. Deep neural networks — those with many layers — can represent highly complex functions. Large language models are deep neural networks with billions to trillions of parameters, trained on internet-scale text corpora.
P
- Persona
- The defined character, communication style, name, and behavioral traits assigned to an AI companion through system prompts and/or fine-tuning. A persona makes an AI companion feel consistent and purposeful — a wellness companion might have a calm, empathetic tone, while a coding assistant might be concise and precise. Effective persona design balances consistency with the flexibility to serve diverse user needs within the character’s defined scope.
- Prompt Engineering
- The practice of crafting input text to elicit desired outputs from a language model without modifying the model’s weights. Prompt engineering techniques include role assignment (“you are an expert nutritionist”), few-shot examples, chain-of-thought instructions, output format specifications, and explicit constraints. As language models have become more capable, effective prompt engineering increasingly involves describing the task clearly rather than elaborate tricks — though nuanced formatting and framing still meaningfully affect output quality.
R
- RAG (Retrieval-Augmented Generation)
- An architecture that combines a language model with a retrieval system to ground responses in specific documents or databases. When a user asks a question, the retrieval component searches a vector database or document store for relevant passages, which are injected into the language model’s context alongside the query. The model then generates a response informed by the retrieved content. RAG allows AI companions to answer questions about private documents, current information, or large knowledge bases that would not fit in a context window or were not present in training data.
- Reinforcement Learning from Human Feedback (RLHF)
- A training methodology where human raters evaluate model outputs for quality, helpfulness, and safety, and those ratings train a reward model that then guides further optimization of the language model via reinforcement learning. RLHF is the primary technique used to align language models with human preferences — producing models that are helpful, harmless, and honest rather than merely statistically likely. ChatGPT, Claude, and Gemini were all trained using variants of RLHF.
S
- Safety Filter
- A component of an AI system designed to detect and block outputs that violate content policies — hate speech, instructions for creating weapons, explicit content, personal information extraction, and other harmful categories. Safety filters may operate as classifiers applied to model outputs before they are shown to users, as constraints integrated into the RLHF training process, or as both. Effective safety filtering is a balance: too strict and the system becomes unhelpfully restrictive; too permissive and it produces harmful content.
- Semantic Search
- A search methodology that retrieves results based on conceptual meaning rather than exact keyword matching. Semantic search converts both the query and the documents to embeddings, then finds documents whose embeddings are closest to the query embedding. A semantic search for “dog food” might return results about “canine nutrition” and “puppy kibble” even if those exact phrases do not appear in the query. Semantic search is the retrieval mechanism underlying most RAG systems.
- Session Memory
- The record of conversation turns within a single active session, held in the model’s context window. Session memory is available by default in any conversation-based AI system — the model can reference anything said earlier in the same session. Session memory is lost when the session ends unless explicitly saved to long-term storage. The distinction between session memory (ephemeral, in-context) and long-term memory (persistent, retrieved) is important for understanding what an AI companion can and cannot remember.
- System Prompt
- An instruction message provided to a language model before the user conversation begins, typically invisible to the user, that establishes the model’s persona, behavioral constraints, context, and task scope. System prompts are the primary mechanism through which AI companion developers shape model behavior: “You are a supportive wellness companion named Aria. You respond with warmth and empathy. You do not provide medical diagnoses.” System prompt design is a core engineering discipline for AI companion products.
T
- Temperature
- A sampling parameter that controls the randomness of a language model’s token selection during generation. At temperature 0, the model always selects the highest-probability next token (deterministic, repetitive output). At higher temperatures (0.7–1.0), the model samples from a broader distribution, producing more varied and creative outputs. At very high temperatures, outputs become incoherent. Most AI companion applications use temperatures between 0.6 and 0.9 to balance creativity with coherence.
- Token
- The basic unit of text that a language model processes. Tokens are not exactly words — a token may be a full word, a word fragment, a punctuation mark, or a space. The sentence “Hello, world!” is approximately 4 tokens. The conversion from text to tokens is handled by a tokenizer specific to each model family. Token count determines context window usage, API cost, and inference speed. One token is roughly equivalent to 0.75 English words on average, though this varies significantly by language and content type.
- Transformer
- The neural network architecture introduced in the 2017 paper “Attention Is All You Need” that became the foundation for virtually all large language models. The transformer replaces recurrent processing with a self-attention mechanism that allows the model to weigh the relevance of any part of the input sequence to any other part simultaneously, enabling parallel processing and effective modeling of long-range dependencies. All major AI companions — GPT-4, Claude, Gemini, Llama, Mistral — are built on transformer architectures or direct descendants.
V
- Vector Database
- A database optimized for storing and querying high-dimensional embedding vectors using approximate nearest-neighbor search. Vector databases power the retrieval component of RAG systems and long-term memory architectures for AI companions. When a user message arrives, it is converted to an embedding, and the vector database returns the stored memories or documents with the most similar embeddings. Common vector databases include Pinecone, Weaviate, Chroma, and pgvector (a Postgres extension).
Z
- Zero-Shot Learning
- Asking a language model to perform a task without providing any examples of the desired input-output format — relying solely on the model’s pre-trained capabilities and the task description. Zero-shot prompting works well for tasks that closely resemble patterns in the training data and for models with strong instruction-following training. For novel or complex tasks, few-shot or chain-of-thought prompting typically outperforms zero-shot. The capability to generalize to unseen tasks zero-shot is a defining characteristic of large, instruction-tuned models.
Common AI Companion Use Cases Quick Reference
| Use Case | Key Capabilities Required | Typical Session Length |
|---|---|---|
| Emotional support and journaling | Empathetic tone, long-term memory, session continuity | 10–30 minutes |
| Learning and tutoring | Chain-of-thought, knowledge accuracy, adaptive pacing | 20–60 minutes |
| Creative writing collaboration | High temperature, persona flexibility, long context | 30–90 minutes |
| Productivity and task management | Structured output, tool integration, memory of goals | 5–20 minutes |
| Language practice and conversation | Multilingual capability, error correction, patience | 15–45 minutes |
| Research and information synthesis | RAG, grounding, citation, low hallucination rate | 15–60 minutes |
| Role-play and entertainment | Strong persona, creative improvisation, safety filters | 20–120 minutes |
| Customer support augmentation | Knowledge base RAG, escalation detection, consistent tone | 5–15 minutes |