Skip to main content

Files API

The Files API provides access to SharePoint/OneDrive files through Microsoft Graph. Files remain in your organization's cloud storage—Cluster only stores references and metadata.

Overview

User Request → Cluster API → Microsoft Graph → SharePoint

(cache metadata)

Return to User

Cluster proxies file requests to SharePoint using the On-Behalf-Of OAuth flow. This allows:

  • Seamless access to SharePoint files
  • Proper permission enforcement (user sees only their files)
  • Metadata caching for faster browsing
  • Video/audio streaming support

Authentication

All file endpoints require a valid Azure AD token with SharePoint scopes:

  • Files.Read.All — Read files user can access
  • Sites.Read.All — List SharePoint sites

Endpoints

List Sites

Get SharePoint sites the user can access:

GET /api/files/sites

Response:

{
"data": [
{
"id": "site-id",
"name": "Research Team",
"webUrl": "https://company.sharepoint.com/sites/research",
"description": "UX Research files and recordings"
}
]
}

List Drives

Get drives (document libraries) in a site:

GET /api/files/sites/:siteId/drives

Response:

{
"data": [
{
"id": "drive-id",
"name": "Documents",
"driveType": "documentLibrary",
"webUrl": "https://company.sharepoint.com/sites/research/Documents"
},
{
"id": "drive-id-2",
"name": "Interview Recordings",
"driveType": "documentLibrary"
}
]
}

List Files/Folders

Browse files in a drive:

GET /api/files/drives/:driveId/items

Query Parameters:

ParameterTypeDescription
pathstringFolder path (e.g., /Interviews/2025)
folderIdstringSpecific folder ID

Response:

{
"data": [
{
"id": "item-id-1",
"name": "P05-Interview.mp4",
"type": "file",
"mimeType": "video/mp4",
"size": 524288000,
"webUrl": "https://company.sharepoint.com/...",
"createdDateTime": "2025-01-02T10:00:00Z",
"lastModifiedDateTime": "2025-01-02T10:30:00Z",
"createdBy": {
"user": {
"displayName": "Jane Smith"
}
}
},
{
"id": "item-id-2",
"name": "P05-Transcript.vtt",
"type": "file",
"mimeType": "text/vtt",
"size": 15360
},
{
"id": "folder-id",
"name": "Session Notes",
"type": "folder",
"childCount": 5
}
],
"meta": {
"path": "/Interviews/2025",
"parentId": "parent-folder-id"
}
}

Get File Metadata

GET /api/files/drives/:driveId/items/:itemId

Response:

{
"data": {
"id": "item-id",
"name": "P05-Interview.mp4",
"type": "file",
"mimeType": "video/mp4",
"size": 524288000,
"webUrl": "https://company.sharepoint.com/...",
"downloadUrl": "https://...",
"thumbnails": [
{
"large": {
"url": "https://..."
}
}
],
"video": {
"duration": 3600000,
"width": 1920,
"height": 1080
}
}
}

Get File Content

Stream file content (for viewing/playing):

GET /api/files/drives/:driveId/items/:itemId/content

Response:

Proxied file content with appropriate headers:

Content-Type: video/mp4
Content-Length: 524288000
Accept-Ranges: bytes

Range Requests

Video/audio players use range requests for seeking:

GET /api/files/drives/:driveId/items/:itemId/content
Range: bytes=1000000-2000000

Response:

HTTP/1.1 206 Partial Content
Content-Range: bytes 1000000-2000000/524288000
Content-Length: 1000001

Get Transcript

Get parsed transcript for a video/audio file:

GET /api/files/drives/:driveId/items/:itemId/transcript

Response:

{
"data": {
"format": "vtt",
"duration": 3600.5,
"cues": [
{
"id": "1",
"startTime": 0.0,
"endTime": 5.2,
"text": "Welcome to the interview.",
"speaker": "Interviewer"
},
{
"id": "2",
"startTime": 5.5,
"endTime": 12.3,
"text": "Thanks for having me. I'm excited to share my experience.",
"speaker": "Participant"
}
]
}
}

Cluster automatically:

  1. Detects associated transcript files (.vtt, .srt)
  2. Parses cues and extracts speakers
  3. Syncs with media timestamps

Search Files

Search across SharePoint:

GET /api/files/search?q=interview+onboarding

Query Parameters:

ParameterTypeDescription
qstringSearch query
siteIdstringLimit to specific site
typestringFilter by type (file, folder)
mimeTypestringFilter by MIME type

Response:

{
"data": [
{
"id": "item-id",
"name": "Onboarding-Research-P05.mp4",
"path": "/Research/Interviews",
"webUrl": "...",
"highlights": [
"...<strong>onboarding</strong> research..."
]
}
]
}

File References

Cluster caches file metadata for faster access and annotation linking.

Create File Reference

Link a SharePoint file to a study:

POST /api/files/refs

Request:

{
"driveId": "drive-id",
"itemId": "item-id",
"studyId": "study-uuid"
}

Response:

{
"data": {
"id": "file-ref-uuid",
"name": "P05-Interview.mp4",
"mimeType": "video/mp4",
"webUrl": "...",
"studyId": "study-uuid",
"transcriptFileId": "transcript-ref-uuid"
}
}

List File References

Get files linked to a study:

GET /api/studies/:studyId/files

Response:

{
"data": [
{
"id": "file-ref-uuid",
"name": "P05-Interview.mp4",
"mimeType": "video/mp4",
"annotationCount": 24,
"lastAnnotatedAt": "2025-01-03T14:30:00Z"
}
]
}

Sync File Metadata

Refresh cached metadata from SharePoint:

POST /api/files/refs/:id/sync

Response:

{
"data": {
"id": "file-ref-uuid",
"synced": true,
"changes": {
"name": {
"old": "Interview.mp4",
"new": "P05-Interview.mp4"
}
}
}
}

Supported File Types

Video

ExtensionMIME TypeFeatures
.mp4video/mp4Streaming, thumbnails
.webmvideo/webmStreaming
.movvideo/quicktimeStreaming

Audio

ExtensionMIME TypeFeatures
.mp3audio/mpegStreaming
.wavaudio/wavStreaming
.m4aaudio/mp4Streaming

Transcripts

ExtensionMIME TypeFeatures
.vtttext/vttWebVTT (Teams transcripts)
.srtapplication/x-subripSubRip format
.txttext/plainPlain text

Documents

ExtensionMIME TypeFeatures
.docxapplication/vnd.openxmlformats...Text extraction
.pdfapplication/pdfText extraction
.txttext/plainDirect viewing
.mdtext/markdownRendered viewing

Error Handling

404 File Not Found

{
"error": {
"code": "FILE_NOT_FOUND",
"message": "File not found or access denied",
"driveId": "...",
"itemId": "..."
}
}

403 Access Denied

{
"error": {
"code": "ACCESS_DENIED",
"message": "You do not have permission to access this file"
}
}

502 SharePoint Error

{
"error": {
"code": "UPSTREAM_ERROR",
"message": "SharePoint returned an error",
"upstream": {
"code": "itemNotFound",
"message": "The resource could not be found"
}
}
}

Caching

Cluster caches:

DataCache DurationInvalidation
Site/drive list1 hourManual refresh
Folder contents5 minutesOn browse
File metadata1 hourOn sync
TranscriptsUntil file changesContent hash check

Next Steps