Architecture Overview

Cluster follows a modern three-tier architecture with clear separation between the frontend, backend, and data storage layers.

System Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              Client                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐│
│  │                     React SPA (Vite + TypeScript)                    ││
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ ││
│  │  │ File Browser│  │ Media Viewer│  │ Transcript  │  │  Synthesis  │ ││
│  │  │ (SharePoint)│  │ + Annotator │  │ Annotator   │  │    Canvas   │ ││
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘ ││
│  └─────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ HTTPS
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                           Backend Services                               │
│  ┌─────────────────────────────────────────────────────────────────────┐│
│  │                     API Gateway (Node/Express + TypeScript)          ││
│  │                                                                      ││
│  │  /api/annotations/*     → W3C Protocol (CRUD, search, collections)  ││
│  │  /api/files/*           → SharePoint proxy (browse, stream, meta)   ││
│  │  /api/studies/*         → Research workflow (studies, insights)     ││
│  │  /api/auth/*            → Azure AD OAuth flow                       ││
│  └─────────────────────────────────────────────────────────────────────┘│
│                                    │                                     │
│                 ┌──────────────────┼──────────────────┐                 │
│                 ▼                  ▼                  ▼                 │
│  ┌─────────────────────┐  ┌─────────────────┐  ┌─────────────────────┐ │
│  │  Annotation Store   │  │  SharePoint     │  │  Search Index       │ │
│  │  (PostgreSQL)       │  │  Graph API      │  │  (Meilisearch)      │ │
│  └─────────────────────┘  └─────────────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘

Core Principles

1. Data Sovereignty

Your research data stays in your control:

Source files remain in SharePoint/Google Drive
Annotations stored in your PostgreSQL database
No data sent to third-party analytics or AI services

2. Standards-Based

Cluster uses open standards wherever possible:

W3C Web Annotation for annotation data
OAuth 2.0 / OpenID Connect for authentication
REST/JSON for API communication
Media Fragments URI for video/audio targeting

3. Separation of Concerns

Layer	Responsibility	Technology
Frontend	User interface, interaction	React, TypeScript
Backend	Business logic, API	Node.js, Express
Database	Data persistence	PostgreSQL
External	File storage, auth	SharePoint, Azure AD

Data Flow

Creating an Annotation

sequenceDiagram
    participant User
    participant Frontend
    participant API
    participant Database
    participant SharePoint

    User->>Frontend: Select text in transcript
    Frontend->>API: POST /annotations
    API->>Database: Insert annotation
    API->>Database: Insert annotation_target
    Database-->>API: Return annotation ID
    API-->>Frontend: Return W3C JSON-LD
    Frontend-->>User: Show highlight

Loading a File

sequenceDiagram
    participant User
    participant Frontend
    participant API
    participant SharePoint

    User->>Frontend: Open file browser
    Frontend->>API: GET /files/drives/{id}/items
    API->>SharePoint: Graph API request
    SharePoint-->>API: File metadata
    API-->>Frontend: File list
    User->>Frontend: Select video file
    Frontend->>API: GET /files/.../content
    API->>SharePoint: Stream file
    SharePoint-->>API: Video stream
    API-->>Frontend: Proxied stream

Monorepo Structure

cluster/
├── packages/
│   ├── shared/          # Shared TypeScript types and schemas
│   │   ├── src/
│   │   │   ├── types/   # W3C annotation types
│   │   │   └── schemas/ # Zod validation schemas
│   │   └── package.json
│   │
│   ├── server/          # Express API server
│   │   ├── src/
│   │   │   ├── routes/  # API endpoints
│   │   │   ├── services/# Business logic
│   │   │   ├── db/      # Database schema and migrations
│   │   │   └── middleware/
│   │   └── package.json
│   │
│   └── web/             # React frontend
│       ├── src/
│       │   ├── components/
│       │   ├── pages/
│       │   ├── api/     # API client
│       │   └── stores/  # State management
│       └── package.json
│
├── docs/                # This documentation (Docusaurus)
└── docker-compose.yml

Security Model

Authentication Flow

User clicks "Sign in with Microsoft"
MSAL.js redirects to Azure AD
User authenticates with their org credentials
Azure AD returns access token to frontend
Frontend sends token with API requests
Backend validates token with Azure AD
Backend uses token for SharePoint access (on-behalf-of)

Multi-Tenancy

Cluster supports multiple organizations:

Each Azure AD tenant = one organization
All data filtered by org_id
Users can only access their organization's data

File Access

Cluster never stores files:

Files accessed via Microsoft Graph API
User's Azure AD permissions enforced
If user can't access file in SharePoint, they can't access it in Cluster

Next Steps

Tech Stack — Libraries and frameworks
Data Model — Database schema

System Architecture​

Core Principles​

1. Data Sovereignty​

2. Standards-Based​

3. Separation of Concerns​

Data Flow​

Creating an Annotation​

Loading a File​

Monorepo Structure​

Security Model​

Authentication Flow​

Multi-Tenancy​

File Access​

Next Steps​