What You Will Build
By the end of this tutorial, you will have a fully functional Loom alternative embedded in your own application. The user clicks a button, records their screen with an optional camera overlay, stops recording, and gets back a shareable link with an auto-generated transcript and AI summary. No video infrastructure to manage, no transcription pipeline to deploy, no CDN to configure. V100 handles all of it through a single API.
The async video recording API is the foundation of any screen recording product, whether you are building an internal tool for engineering teams, a customer support widget, a sales outreach platform, or a standalone Loom alternative. Here is everything your Loom clone will include:
Browser V100 API Storage | | | |-- getDisplayMedia() ---->| | |-- getUserMedia() ------->| | |-- MediaRecorder -------->| | | | | |-- POST /upload --------->|--- encode + store ------->| |<-- { videoId } ----------| | | | | |-- POST /transcribe ----->|--- whisper pipeline ----->| |<-- { transcript } -------| | | | | |-- POST /summarize ------>|--- LLM summary --------->| |<-- { summary, share } ---| | | | | |=========== SHAREABLE LINK (CDN-backed) ===========|
Step 1 — Record Screen + Camera
The first step in building a Loom clone is capturing the user's screen alongside their camera feed. The browser's getDisplayMedia API handles screen capture, and getUserMedia grabs the webcam. You combine both into a single MediaRecorder instance that writes chunks to an array (or IndexedDB for longer recordings) until the user clicks stop.
This is the client-side recording pattern that every async video messaging tool uses. The video never leaves the browser until the user explicitly uploads it, which means you get instant preview playback and no server costs during recording.
When the user clicks Stop, the stopRecording function returns a Blob containing the full video. The facecam stream renders into a small circular <video> element overlaid on the screen preview — the same picture-in-picture pattern Loom uses. V100 composites the camera overlay server-side during processing so the final video is a single clean MP4.
Offline buffer. For recordings longer than 5 minutes, write chunks to IndexedDB instead of holding them in memory. The V100 SDK includes a ChunkStore utility that handles this automatically — it buffers to IndexedDB during recording and reassembles the blob on stop. This prevents browser tab crashes on long recordings.
Step 2 — Upload to V100
Once you have the recorded blob, upload it to V100's server-side recording pipeline. V100 handles encoding, transcoding to MP4, S3 storage, and CDN distribution. A single POST to /api/recordings/upload with the video file returns a videoId you will use for all subsequent operations.
The upload endpoint accepts WebM, MP4, and MOV files up to 2 GB. V100 transcodes everything to H.264 MP4 for universal playback. Processing takes 10 to 30 seconds for a typical 5-minute recording. You can poll the status or set up a webhook to get notified when processing completes.
Server-side uploads in production. In production, generate a signed upload URL from your backend using POST /api/recordings/upload-url. The client uploads directly to the signed URL without exposing your API key in the browser. This is the same pattern S3 presigned URLs use, and it is strongly recommended for any public-facing application.
Step 3 — Auto-Transcribe
Transcription is what separates a real Loom alternative from a simple screen recorder. V100's transcription API produces word-level timestamps, speaker identification, and paragraph segmentation. You get back structured JSON that powers searchable video, auto-generated captions, and the AI summary in the next step.
The word-level timestamps are what make this powerful. You can build a clickable transcript where clicking any word jumps the video to that exact moment — the same UX that makes Loom's viewer so useful. The srtUrl gives you a ready-made subtitle file you can pass to any video player for burned-in captions.
Transcription runs on V100's server-side infrastructure. A 5-minute video typically transcribes in under 10 seconds. Supported languages include English, Spanish, French, German, Japanese, Korean, Portuguese, Chinese, and 30+ more. Set language: 'auto' to let the API detect the language automatically.
Step 4 — Generate Summary
Once the transcript exists, you can generate an AI summary with a single API call. The summary includes key points, action items, and a one-paragraph overview — exactly what recipients need when they do not have time to watch the full video. This is the feature that turns a screen recorder into an async communication tool.
The chapter markers are generated from topic shifts in the transcript. Display them as a clickable timeline in your video player so viewers can jump to the section they care about. Action items include auto-detected assignees when speaker labels are available from the transcription step.
Chain it automatically. You can request transcription and summary in a single upload call by passing transcribe: true and summarize: true in the upload body. V100 will run the full pipeline and fire a webhook when everything is ready. No polling required.
Step 5 — Share
Every recording gets a shareable link that works instantly — no sign-up required for the viewer. The link loads a hosted player with the video, transcript, summary, and chapter navigation. You can also embed the player in your own app with an iframe or use the raw URLs to build a completely custom viewer.
The shareUrl is a hosted page with a video player, full transcript, AI summary, and chapter navigation. Viewers do not need an account. The embedHtml gives you a responsive iframe you can drop into Notion, Confluence, or any web page. The gifPreview is a 5-second animated thumbnail — useful for Slack or email previews where video links need a visual hook.
Multi-platform publishing pushes the video to connected integrations in a single call. Connect Slack, Notion, YouTube, or custom webhook destinations through the V100 dashboard, then pass the platform names to the publish endpoint. Each platform gets the video in its native format — Slack gets a rich unfurl with the GIF preview, Notion gets an embedded block, YouTube gets a properly formatted upload.
Going Further
The five steps above give you a complete Loom clone. Here is how to polish it into a production-grade async video messaging tool:
Trim Silence
V100's silence removal API detects and trims dead air from recordings automatically. Add trimSilence: true to the upload options and the API strips pauses longer than 2 seconds. For a typical 5-minute recording, this usually shaves off 30 to 60 seconds of awkward silence without any manual editing.
Auto-Captions
The transcription from Step 3 powers burned-in captions. Pass captions: { burn: true, style: 'modern' } to the upload options and V100 renders word-highlighted captions directly into the MP4. You get the same animated-word caption style that performs well on social media, with no post-processing on your end.
Virtual Backgrounds
Replace or blur the user's background in the camera feed. V100 processes this client-side using body segmentation so there is zero latency. Pass virtualBackground: { type: 'blur', intensity: 0.7 } or { type: 'image', url: 'https://...' } when initializing the camera stream. The composited result is what gets recorded — no post-processing needed.
Noise Suppression
AI-based noise suppression is enabled by default on all audio tracks. It removes keyboard clicks, fan noise, background chatter, and other common distractions. To configure sensitivity or disable it entirely, pass audio: { noiseSuppression: false } in the camera stream options.
Viewer Analytics
Track who watched, how far they got, and whether they clicked any chapters or action items. The notifyOnView flag from the share step fires a webhook on each view, and the GET /api/recordings/{id}/analytics endpoint returns aggregate view counts, average watch time, and drop-off points.
Loom vs Building Your Own
Loom costs $12.50 per user per month on the Business plan. That adds up fast for teams. Building your own async video tool gives you full control over the UX, data ownership, and no per-seat licensing. Here is what each approach actually looks like:
| Loom (SaaS) | DIY from Scratch | V100 API | |
|---|---|---|---|
| Time to first recording | Instant (download app) | 2–4 months | 1 hour |
| Custom branding | Enterprise plan only | Full control | Full control |
| Transcription | Included (English-focused) | Deploy Whisper, manage GPUs | One API call, 40+ languages |
| AI summaries | Business plan ($12.50/user/mo) | Build LLM pipeline yourself | One API call |
| Data ownership | Loom hosts everything | Your infrastructure | Your S3 bucket option |
| Embed in your product | Limited embed options | Full integration | Full integration |
| Video storage | Loom servers | S3 + CDN + transcoding ($$$) | Managed (CDN included) |
| Silence removal | Manual trim only | FFmpeg + ML pipeline | One config flag |
| Per-seat cost | $12.50/user/mo (Business) | Engineering time + infra | Usage-based, no per-seat |
| Maintenance | Managed by Loom | Permanent headcount | Managed by V100 |
The sweet spot for most teams is building on V100. You get full control over the user experience and branding — it is your product, not a Loom embed — without the 2 to 4 months of infrastructure work. The video messaging API handles the hard parts (transcoding, transcription, CDN, AI summaries) and you focus on the product layer that differentiates your tool.
For teams already paying for Loom Business, the math is straightforward. A 50-person team on Loom costs $625 per month. V100's usage-based pricing typically comes in at 60 to 80% less for equivalent usage, and you own the UX completely.
Pricing
V100 offers a free tier with 100 recordings per month — enough for development, testing, and small teams. No credit card required. For production workloads:
- Free — 100 recordings/month, transcription included, 720p output. Perfect for prototyping your Loom alternative.
- Pro — Usage-based pricing, 1080p output, AI summaries, silence removal, multi-platform publishing, viewer analytics. Starts at $0.005 per recording-minute.
- Enterprise — Volume discounts, bring-your-own S3, custom player branding, SLA, dedicated support.
See the full pricing page for details.
Start Building Your Loom Clone
Get your API key, record your first screen capture, and generate a shareable link in under an hour. No credit card. No sales call. Just code.
Get Your Free API Key