Build a Loom Clone with the V100 API

~150

Lines of Code

5 API

Calls Total

<2s

Transcription Start

Free

100 Recordings/mo

What You Will Build

By the end of this tutorial, you will have a fully functional Loom alternative embedded in your own application. The user clicks a button, records their screen with an optional camera overlay, stops recording, and gets back a shareable link with an auto-generated transcript and AI summary. No video infrastructure to manage, no transcription pipeline to deploy, no CDN to configure. V100 handles all of it through a single API.

The async video recording API is the foundation of any screen recording product, whether you are building an internal tool for engineering teams, a customer support widget, a sales outreach platform, or a standalone Loom alternative. Here is everything your Loom clone will include:

✓ Screen + camera capture (getDisplayMedia)

✓ Client-side recording (MediaRecorder)

✓ Server-side processing with S3 storage

✓ Auto-transcription with word-level timestamps

✓ AI summary generation

✓ Shareable link with embed player

✓ Virtual backgrounds

✓ Noise suppression

✓ Multi-platform publishing

✓ IndexedDB offline buffer

Architecture

Browser                   V100 API                  Storage
    |                          |                          |
    |-- getDisplayMedia() ---->|                          |
    |-- getUserMedia() ------->|                          |
    |-- MediaRecorder -------->|                          |
    |                          |                          |
    |-- POST /upload --------->|--- encode + store ------->|
    |<-- { videoId } ----------|                          |
    |                          |                          |
    |-- POST /transcribe ----->|--- whisper pipeline ----->|
    |<-- { transcript } -------|                          |
    |                          |                          |
    |-- POST /summarize ------>|--- LLM summary --------->|
    |<-- { summary, share } ---|                          |
    |                          |                          |
    |=========== SHAREABLE LINK (CDN-backed) ===========|

Step 1 — Record Screen + Camera

The first step in building a Loom clone is capturing the user's screen alongside their camera feed. The browser's getDisplayMedia API handles screen capture, and getUserMedia grabs the webcam. You combine both into a single MediaRecorder instance that writes chunks to an array (or IndexedDB for longer recordings) until the user clicks stop.

This is the client-side recording pattern that every async video messaging tool uses. The video never leaves the browser until the user explicitly uploads it, which means you get instant preview playback and no server costs during recording.

recorder.js — screen + camera capture

const API_BASE = 'https://api.v100.ai';
const API_KEY  = 'v100_sk_your_api_key_here';

let mediaRecorder;
let recordedChunks = [];

async function startRecording() {
  // 1. Capture the screen (tab, window, or entire display)
  const screenStream = await navigator.mediaDevices.getDisplayMedia({
    video: { width: 1920, height: 1080, frameRate: 30 },
    audio: true,  // capture system audio (tab audio)
  });

  // 2. Capture the webcam for the facecam bubble
  const cameraStream = await navigator.mediaDevices.getUserMedia({
    video: { width: 320, height: 320, facingMode: 'user' },
    audio: { echoCancellation: true, noiseSuppression: true },
  });

  // 3. Merge screen + camera + mic into one stream
  const combined = new MediaStream([
    ...screenStream.getVideoTracks(),
    ...cameraStream.getAudioTracks(),
  ]);

  // 4. Record with MediaRecorder
  recordedChunks = [];
  mediaRecorder = new MediaRecorder(combined, {
    mimeType: 'video/webm;codecs=vp9,opus',
    videoBitsPerSecond: 2_500_000,
  });

  mediaRecorder.ondataavailable = (e) => {
    if (e.data.size > 0) recordedChunks.push(e.data);
  };

  mediaRecorder.start(1000); // chunk every 1 second

  // Store camera stream for facecam preview
  document.getElementById('facecam').srcObject = cameraStream;
}

function stopRecording() {
  return new Promise((resolve) => {
    mediaRecorder.onstop = () => {
      const blob = new Blob(recordedChunks, { type: 'video/webm' });
      resolve(blob);
    };
    mediaRecorder.stop();
  });
}

When the user clicks Stop, the stopRecording function returns a Blob containing the full video. The facecam stream renders into a small circular <video> element overlaid on the screen preview — the same picture-in-picture pattern Loom uses. V100 composites the camera overlay server-side during processing so the final video is a single clean MP4.

Offline buffer. For recordings longer than 5 minutes, write chunks to IndexedDB instead of holding them in memory. The V100 SDK includes a ChunkStore utility that handles this automatically — it buffers to IndexedDB during recording and reassembles the blob on stop. This prevents browser tab crashes on long recordings.

Step 2 — Upload to V100

Once you have the recorded blob, upload it to V100's server-side recording pipeline. V100 handles encoding, transcoding to MP4, S3 storage, and CDN distribution. A single POST to /api/recordings/upload with the video file returns a videoId you will use for all subsequent operations.

upload.js — send the recording to V100

async function uploadRecording(blob) {
  const form = new FormData();
  form.append('file', blob, 'recording.webm');
  form.append('title', 'Screen Recording');
  form.append('visibility', 'unlisted');  // 'public', 'unlisted', or 'private'

  const res = await fetch(`${API_BASE}/api/recordings/upload`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
    },
    body: form,
  }).then(r => r.json());

  // res = {
  //   videoId: "rec_abc123",
  //   status: "processing",
  //   duration: 124.5,
  //   size: 15_400_000,
  //   storageUrl: "https://cdn.v100.ai/recordings/rec_abc123.mp4"
  // }

  return res;
}

The upload endpoint accepts WebM, MP4, and MOV files up to 2 GB. V100 transcodes everything to H.264 MP4 for universal playback. Processing takes 10 to 30 seconds for a typical 5-minute recording. You can poll the status or set up a webhook to get notified when processing completes.

Server-side uploads in production. In production, generate a signed upload URL from your backend using POST /api/recordings/upload-url. The client uploads directly to the signed URL without exposing your API key in the browser. This is the same pattern S3 presigned URLs use, and it is strongly recommended for any public-facing application.

Step 3 — Auto-Transcribe

Transcription is what separates a real Loom alternative from a simple screen recorder. V100's transcription API produces word-level timestamps, speaker identification, and paragraph segmentation. You get back structured JSON that powers searchable video, auto-generated captions, and the AI summary in the next step.

transcribe.js — request transcription

async function transcribeRecording(videoId) {
  const res = await fetch(`${API_BASE}/api/recordings/${videoId}/transcribe`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type':  'application/json',
    },
    body: JSON.stringify({
      language: 'en',              // or 'auto' for language detection
      wordTimestamps: true,       // word-level timing for highlights
      speakerLabels: true,        // identify different speakers
      paragraphs: true,           // auto-segment into paragraphs
    }),
  }).then(r => r.json());

  // res = {
  //   transcriptId: "tx_def456",
  //   status: "completed",
  //   language: "en",
  //   confidence: 0.97,
  //   duration: 124.5,
  //   text: "Hey team, I wanted to walk you through...",
  //   words: [
  //     { word: "Hey", start: 0.24, end: 0.48, confidence: 0.99 },
  //     { word: "team", start: 0.52, end: 0.81, confidence: 0.98 },
  //     ...
  //   ],
  //   paragraphs: [
  //     { start: 0.24, end: 15.7, text: "Hey team, I wanted to..." },
  //     ...
  //   ],
  //   srtUrl: "https://cdn.v100.ai/transcripts/tx_def456.srt"
  // }

  return res;
}

The word-level timestamps are what make this powerful. You can build a clickable transcript where clicking any word jumps the video to that exact moment — the same UX that makes Loom's viewer so useful. The srtUrl gives you a ready-made subtitle file you can pass to any video player for burned-in captions.

Transcription runs on V100's server-side infrastructure. A 5-minute video typically transcribes in under 10 seconds. Supported languages include English, Spanish, French, German, Japanese, Korean, Portuguese, Chinese, and 30+ more. Set language: 'auto' to let the API detect the language automatically.

Step 4 — Generate Summary

Once the transcript exists, you can generate an AI summary with a single API call. The summary includes key points, action items, and a one-paragraph overview — exactly what recipients need when they do not have time to watch the full video. This is the feature that turns a screen recorder into an async communication tool.

summarize.js — AI-generated summary

async function generateSummary(videoId) {
  const res = await fetch(`${API_BASE}/api/recordings/${videoId}/summarize`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type':  'application/json',
    },
    body: JSON.stringify({
      style: 'professional',    // 'professional', 'casual', or 'detailed'
      includeActionItems: true,
      includeChapters: true,    // auto-generate chapter markers
    }),
  }).then(r => r.json());

  // res = {
  //   summary: "Eric walks through the new dashboard redesign...",
  //   keyPoints: [
  //     "Dashboard load time reduced from 3.2s to 0.8s",
  //     "New chart component replaces legacy D3 implementation",
  //     "Rollout planned for next sprint"
  //   ],
  //   actionItems: [
  //     { assignee: "Sarah", task: "Review PR #847 by Friday" },
  //     { assignee: "Team", task: "Test on staging environment" }
  //   ],
  //   chapters: [
  //     { start: 0, title: "Introduction" },
  //     { start: 28.4, title: "Dashboard Performance" },
  //     { start: 67.1, title: "New Chart Component" },
  //     { start: 98.3, title: "Rollout Plan" }
  //   ]
  // }

  return res;
}

The chapter markers are generated from topic shifts in the transcript. Display them as a clickable timeline in your video player so viewers can jump to the section they care about. Action items include auto-detected assignees when speaker labels are available from the transcription step.

Chain it automatically. You can request transcription and summary in a single upload call by passing transcribe: true and summarize: true in the upload body. V100 will run the full pipeline and fire a webhook when everything is ready. No polling required.

Step 5 — Share

Every recording gets a shareable link that works instantly — no sign-up required for the viewer. The link loads a hosted player with the video, transcript, summary, and chapter navigation. You can also embed the player in your own app with an iframe or use the raw URLs to build a completely custom viewer.

share.js — get shareable link and publish

async function getShareLink(videoId) {
  const res = await fetch(`${API_BASE}/api/recordings/${videoId}/share`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type':  'application/json',
    },
    body: JSON.stringify({
      visibility: 'unlisted',        // link-only access
      allowDownload: true,
      expiresIn: '30d',              // auto-expire after 30 days
      password: null,                 // optional password protection
      notifyOnView: true,            // webhook when someone watches
    }),
  }).then(r => r.json());

  // res = {
  //   shareUrl: "https://watch.v100.ai/s/rec_abc123",
  //   embedHtml: '<iframe src="https://watch.v100.ai/embed/rec_abc123"...>',
  //   thumbnailUrl: "https://cdn.v100.ai/thumbs/rec_abc123.jpg",
  //   mp4Url: "https://cdn.v100.ai/recordings/rec_abc123.mp4",
  //   gifPreview: "https://cdn.v100.ai/previews/rec_abc123.gif"
  // }

  return res;
}

// Publish to multiple platforms at once
async function publishRecording(videoId, platforms) {
  const res = await fetch(`${API_BASE}/api/recordings/${videoId}/publish`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type':  'application/json',
    },
    body: JSON.stringify({
      platforms: platforms,  // ['slack', 'notion', 'youtube']
      message: 'New recording: Dashboard Redesign Walkthrough',
    }),
  }).then(r => r.json());

  return res;
}

The shareUrl is a hosted page with a video player, full transcript, AI summary, and chapter navigation. Viewers do not need an account. The embedHtml gives you a responsive iframe you can drop into Notion, Confluence, or any web page. The gifPreview is a 5-second animated thumbnail — useful for Slack or email previews where video links need a visual hook.

Multi-platform publishing pushes the video to connected integrations in a single call. Connect Slack, Notion, YouTube, or custom webhook destinations through the V100 dashboard, then pass the platform names to the publish endpoint. Each platform gets the video in its native format — Slack gets a rich unfurl with the GIF preview, Notion gets an embedded block, YouTube gets a properly formatted upload.

Going Further

The five steps above give you a complete Loom clone. Here is how to polish it into a production-grade async video messaging tool:

Trim Silence

V100's silence removal API detects and trims dead air from recordings automatically. Add trimSilence: true to the upload options and the API strips pauses longer than 2 seconds. For a typical 5-minute recording, this usually shaves off 30 to 60 seconds of awkward silence without any manual editing.

Auto-Captions

The transcription from Step 3 powers burned-in captions. Pass captions: { burn: true, style: 'modern' } to the upload options and V100 renders word-highlighted captions directly into the MP4. You get the same animated-word caption style that performs well on social media, with no post-processing on your end.

Virtual Backgrounds

Replace or blur the user's background in the camera feed. V100 processes this client-side using body segmentation so there is zero latency. Pass virtualBackground: { type: 'blur', intensity: 0.7 } or { type: 'image', url: 'https://...' } when initializing the camera stream. The composited result is what gets recorded — no post-processing needed.

Noise Suppression

AI-based noise suppression is enabled by default on all audio tracks. It removes keyboard clicks, fan noise, background chatter, and other common distractions. To configure sensitivity or disable it entirely, pass audio: { noiseSuppression: false } in the camera stream options.

Viewer Analytics

Track who watched, how far they got, and whether they clicked any chapters or action items. The notifyOnView flag from the share step fires a webhook on each view, and the GET /api/recordings/{id}/analytics endpoint returns aggregate view counts, average watch time, and drop-off points.

Loom vs Building Your Own

Loom costs $12.50 per user per month on the Business plan. That adds up fast for teams. Building your own async video tool gives you full control over the UX, data ownership, and no per-seat licensing. Here is what each approach actually looks like:

	Loom (SaaS)	DIY from Scratch	V100 API
Time to first recording	Instant (download app)	2–4 months	1 hour
Custom branding	Enterprise plan only	Full control	Full control
Transcription	Included (English-focused)	Deploy Whisper, manage GPUs	One API call, 40+ languages
AI summaries	Business plan ($12.50/user/mo)	Build LLM pipeline yourself	One API call
Data ownership	Loom hosts everything	Your infrastructure	Your S3 bucket option
Embed in your product	Limited embed options	Full integration	Full integration
Video storage	Loom servers	S3 + CDN + transcoding ($$$)	Managed (CDN included)
Silence removal	Manual trim only	FFmpeg + ML pipeline	One config flag
Per-seat cost	$12.50/user/mo (Business)	Engineering time + infra	Usage-based, no per-seat
Maintenance	Managed by Loom	Permanent headcount	Managed by V100

The sweet spot for most teams is building on V100. You get full control over the user experience and branding — it is your product, not a Loom embed — without the 2 to 4 months of infrastructure work. The video messaging API handles the hard parts (transcoding, transcription, CDN, AI summaries) and you focus on the product layer that differentiates your tool.

For teams already paying for Loom Business, the math is straightforward. A 50-person team on Loom costs $625 per month. V100's usage-based pricing typically comes in at 60 to 80% less for equivalent usage, and you own the UX completely.

Pricing

V100 offers a free tier with 100 recordings per month — enough for development, testing, and small teams. No credit card required. For production workloads:

Free — 100 recordings/month, transcription included, 720p output. Perfect for prototyping your Loom alternative.
Pro — Usage-based pricing, 1080p output, AI summaries, silence removal, multi-platform publishing, viewer analytics. Starts at $0.005 per recording-minute.
Enterprise — Volume discounts, bring-your-own S3, custom player branding, SLA, dedicated support.

See the full pricing page for details.

Start Building Your Loom Clone

Get your API key, record your first screen capture, and generate a shareable link in under an hour. No credit card. No sales call. Just code.

Get Your Free API Key

What You Will Build

Step 1 — Record Screen + Camera

Step 2 — Upload to V100

Step 3 — Auto-Transcribe

Step 4 — Generate Summary

Step 5 — Share

Going Further

Trim Silence

Auto-Captions

Virtual Backgrounds

Noise Suppression

Viewer Analytics

Loom vs Building Your Own

Pricing

Start Building Your Loom Clone

Related