No permissions
Vanilla RAG returns the most relevant chunk regardless of who asked. The user gets answers from documents they could not open at the source, which turns a search upgrade into a governance incident.
// IN-HOUSE BUILD · AGENTIC RAG
Most enterprise RAG fails at the first compliance question. CloudNSite runs a hybrid-search and knowledge-graph stack across 40+ source connectors with permission-aware retrieval, a deep research agent on top, and an airgap option for data that cannot leave the building.
// What an AI knowledge base actually has to do
An ai knowledge base has to answer the way the business is actually organized. For enterprise rag, that means a rag platform must know the source system, the person asking, the document lineage, the entity relationships, and the confidence trail behind every sentence.
The common prototype skips those constraints because a demo room rewards speed. A board room, a legal review, and a security team reward proof. CloudNSite built this in-house architecture because the hard part of agentic rag is not only finding text. The hard part is finding the right evidence, refusing evidence the user cannot access, and making the answer easy to inspect when someone asks why it said what it said.
That inspection loop changes the product requirement. The system must preserve source titles, chunk identifiers, timestamps, permission state, connector sync state, and the retrieval path that promoted each citation. Without that context, an answer may sound polished while the organization has no way to decide whether it is reliable.
Vanilla RAG returns the most relevant chunk regardless of who asked. The user gets answers from documents they could not open at the source, which turns a search upgrade into a governance incident.
Similarity finds passages that look related, not entities that are related. A customer record and the wrong customer's invoice can be lexically close, so the system needs structure, not just distance.
Most stacks ship with one source. Real enterprises have Drive, Slack, Confluence, Notion, SharePoint, GitHub, plus a half-dozen vertical systems where the important context actually lives.
When an answer is wrong, there is no way to trace it back to the chunk and the chunk back to the document. That breaks review, correction, governance, and user trust at the same time.
// Hybrid search plus knowledge graph
Hybrid search RAG is not a compromise between keyword search and semantic search. It is a routing layer that treats each retrieval method as a specialist. Knowledge graph RAG adds the structural memory that ai retrieval augmented generation needs when the question is about relationships, accountability, coverage, or lineage.
In practice, one question often needs all three paths. A support lead may ask which enterprise account is blocked by a contract amendment, which Jira tickets mention the blocker, and who approved the exception. A single vector lookup is too thin for that workflow. The CloudNSite stack fans out across lexical, semantic, and graph retrieval, then brings the evidence back through one ranking pass.
Source tuning matters because each system behaves differently. Code repositories reward exact symbols and path names. Chat history rewards freshness and channel context. Contracts reward named entities, dates, and defined terms. The hybrid layer gives each source a retrieval profile instead of forcing every system into the same search shape.
Sparse retrieval handles exact-match terms, identifiers, SKUs, account numbers, ticket IDs, and code symbols. It catches the literal cases vector search misses by design, especially when a single token changes the answer.
Dense retrieval handles conceptual questions where the user does not know the exact term in the document. It catches the fuzzy cases lexical search misses and gives enterprise AI search a broader recall surface.
Entity walks resolve structural questions across relationships. Who reports to whom, which contracts cover which subsidiaries, and which alert chains escalate to which oncall all need knowledge graph RAG, not isolated chunks.
All three run on every query. A re-ranker fuses them with weights tuned per source. The user sees one ranked answer with citations across all three retrieval modes, while the audit log keeps the individual retrieval path visible for review.
// Source connectors
Connectors are not optional. The enterprises we work with have institutional memory scattered across a dozen systems built for different decades. The stack indexes all of them and respects each system's permission model.
Source coverage is what turns a private search project into an operating layer. AI for Slack is useful only if the answer can also cite the Confluence runbook, the Drive contract, the ticket history, and the CRM record. AI for Confluence is useful only if it can avoid stale pages and connect the page to the work happening around it. The connector layer keeps those source boundaries intact while giving employees one place to ask.
The ingestion layer normalizes content without flattening identity. A slide deck, a pull request, a support macro, and a transcript become searchable evidence, but each keeps its source URL, owner, created date, modified date, access rules, and connector health. That is why the same answer can be useful to an operator and reviewable by governance.
Adding a connector is a configuration change, not a rebuild. The capture and re-rank layer is source-agnostic on purpose.
// RBAC mirroring
Private RAG and secure RAG are not privacy claims on a slide. Rag with RBAC has to mirror the access system that employees already trust, because the retrieval layer becomes another doorway into the same information.
Permission aware RAG also has to fail closed. If a connector cannot confirm membership, if a group mapping is stale, or if a source has changed its ACL format, the relevant content should drop from the eligible retrieval set until the mirror is healthy again.
Every document indexed inherits the access controls from its source system. A user in Slack who could not read a private channel cannot retrieve from it through the agent. A user in Drive who lost access to a folder yesterday loses access through the agent today. ACL changes propagate on the same cadence as the connector sync, never longer.
The implementation treats authorization as a query-time filter and an indexing-time fact. Document metadata carries source identifiers, group grants, inherited folder rules, and revocation timestamps. The retrieval engine can find a relevant chunk and still refuse it, because relevance never overrides permission.
// When one query is not enough
A deep research agent is useful when the first answer is only a lead. AI deep research needs a planner, a retrieval budget, a citation discipline, and a verification loop that catches unsupported claims before a user treats them as finished work.
The agent in this architecture does not wander through sources. It turns an ambiguous question into a controlled investigation. It can ask for the current policy, compare it to a ticket trail, find the account context, and return the answer as a cited research note. The reasoning model makes decisions about what to fetch next, but the retrieval layer decides what it is allowed to see.
This is where ai deep research becomes operational instead of academic. A user can ask for a board-ready summary of a customer escalation, a compliance exception, or a product risk. The agent does not return a loose essay. It returns a synthesis with citations, coverage gaps, and enough retrieval history for a reviewer to challenge the answer.
The agent decomposes the question into sub-queries, identifies which sources each sub-query targets, and budgets retrieval calls before it fetches anything.
It runs the sub-queries in parallel across hybrid retrieval, gathers citations, deduplicates overlap, and notes coverage gaps that should not be hidden.
It composes the answer with inline citations to source chunks. Every claim is tied to a passage, every passage is tied to a document.
It re-checks the answer against the cited passages before returning. Mismatches trigger a second pass instead of a wrong answer.
// When data cannot leave the building
For regulated, classified, or compliance-bound deployments, the entire stack runs on customer infrastructure with local inference. The architecture is the same. The boundary is at the network edge.
Airgap support changes the deployment topology, not the product discipline. Connectors still sync into local indices. The graph still stores entity relationships. The RBAC mirror still blocks sources the user cannot access. The deep research agent still plans, fetches, synthesizes, and verifies against cited passages, but every dependency lives inside the customer-controlled boundary.
The airgap option is designed for teams that cannot accept a partial local story. It keeps inference, embeddings, index updates, graph writes, logs, and audit exports inside the deployment. Operators still get the same search surface, but security teams get a simpler network question: nothing outbound is required for the core workflow.
Open-weights models run on customer GPUs with no third-party API calls and no prompt traffic outside the deployment boundary.
Vector stores, graph stores, metadata stores, and audit tables live on customer infrastructure with no cloud handoff.
The user-facing experience is identical. The governance boundary and wire diagram are what change.
source connectors out of the box
retrieval modes per query
answers cited to source
airgap option supported
// What we ship for clients
CloudNSite uses this agentic rag architecture as a reference pattern, not a fixed appliance. Every client has different systems, risk tolerances, and review paths. The transfer is the operating method: inventory the sources, prove the permissions, tune the retrieval blend, and only then widen access to the business.
That order keeps the launch honest. A client does not need a generic rag platform with a dashboard first. They need a verified map of where knowledge lives, how it should be searched, who is allowed to see it, and what the agent should do when the evidence is thin. The architecture turns those decisions into software, then keeps them visible after launch.
We index the systems your team already lives in, not the ones we want to sell. The first map is a source map, permission map, and operational map.
The RBAC mirror is verified against the source of truth before any user runs a query. Access review is part of launch, not a late security ticket.
The planner is configured for the question shapes your team actually asks, including the sources it should trust first and the citations reviewers need.
We index your sources, mirror your permissions, deploy the deep research agent, and hand you a search surface your governance team can sign off on.