Async video is a $2 billion+ market. Loom proved the category when Atlassian acquired it for $975 million in 2023, and the space has only grown since. Remote teams send millions of video messages per day instead of writing long emails or scheduling unnecessary meetings. A 3-minute video walkthrough replaces a 15-minute meeting and a 500-word email. The sender records once, and every viewer watches on their own schedule.
But Loom is a horizontal tool. It serves everyone generically, which means it serves no one perfectly. The opportunity for a Loom competitor is not to build a better Loom. It is to build a better async video tool for a specific vertical: sales teams, customer support, engineering, education, or recruiting. Vertical specialization lets you build features Loom cannot justify for its general audience, charge higher prices for deeper value, and win customers who feel underserved by a one-size-fits-all tool.
V100's API handles the entire video infrastructure layer: upload processing, transcription, CDN delivery, shareable page hosting, and viewer analytics. You focus on the capture experience (browser extension or web app), the specific workflow integrations for your vertical, and the go-to-market strategy. This guide walks through the complete architecture, step by step.
Why Build a Loom Competitor Now
Three market dynamics make this the right time. First, Loom's acquisition by Atlassian has shifted its product strategy toward Atlassian ecosystem integration (Jira, Confluence, Trello), creating openings for competitors who integrate with the rest of the market's tools. Second, AI capabilities that did not exist when Loom launched (edit-by-transcript, multi-language dubbing, AI summaries with action item extraction) are now accessible through APIs like V100. Third, remote and hybrid work is permanent. The 2020 spike was not a blip. Async video communication is a structural shift in how teams work, and the market is growing 25-30% annually.
The revenue model is proven. Freemium gets users in the door (5 videos per month, 5-minute max length). Individual plans at $12-15 per user per month unlock unlimited videos and advanced features. Team plans at $29 per user per month add shared workspaces, viewer analytics, custom branding, and integrations. Enterprise plans at custom pricing add SSO, admin controls, and compliance features. A product with 10,000 paying users at $15 per month generates $150,000 per month in recurring revenue.
Architecture Overview
The architecture has four layers: capture (browser extension or web app), processing (V100 API), storage and delivery (V100 CDN), and presentation (share pages and viewer analytics). Here is the flow.
Data flow
Step 1: Capture (Screen + Camera Recording)
The capture layer is the one piece you build yourself, because it is the primary user experience and your competitive differentiation point. V100 handles everything after the recording is captured. The browser's native APIs handle the actual screen and camera capture.
The getDisplayMedia API captures the user's screen (full screen, specific window, or browser tab). The getUserMedia API captures the webcam and microphone. Your extension or web app composites these into a single stream: screen recording with a small camera bubble overlay in the corner, which is the standard Loom-style layout. The MediaRecorder API encodes the composite stream into WebM or MP4 and uploads to V100 when the user stops recording.
import { V100 } from 'v100-sdk';
const v100 = new V100('YOUR_API_KEY');
// Capture screen + camera
const screenStream = await navigator.mediaDevices.getDisplayMedia({
video: { width: 1920, height: 1080 },
audio: true // System audio (tab audio)
});
const cameraStream = await navigator.mediaDevices.getUserMedia({
video: { width: 320, height: 320, facingMode: 'user' },
audio: { echoCancellation: true, noiseSuppression: true }
});
// Composite streams (screen + camera bubble)
const compositeStream = compositeScreenAndCamera(screenStream, cameraStream);
// Record
const recorder = new MediaRecorder(compositeStream, {
mimeType: 'video/webm;codecs=vp9',
videoBitsPerSecond: 3_000_000
});
// On stop: upload to V100
recorder.onstop = async () => {
const blob = new Blob(chunks, { type: 'video/webm' });
const video = await v100.videos.upload({
file: blob,
title: 'Quick walkthrough of the new dashboard',
transcription: {
enabled: true,
language: 'auto' // Auto-detect language
},
ai: {
summary: true, // Generate AI summary
actionItems: true, // Extract action items
chapters: true // Auto-chapter markers
},
thumbnail: { auto: true }, // AI-selected best frame
share: {
enabled: true,
tracking: true // Track who watches
}
});
// video.share_url → "https://yourapp.com/v/abc123"
// video.transcript → full text with timestamps
// video.summary → AI-generated summary
// video.action_items → ["Review the new nav layout", ...]
copyToClipboard(video.share_url);
};
The key user experience requirement is speed. From the moment the user clicks "Stop Recording" to the moment the share link is copied to their clipboard should be under 3 seconds. V100 handles this by returning the share URL immediately after upload begins (the video is playable within seconds via progressive processing), while transcription and AI features complete asynchronously in the background. The share page initially shows the video with a "Transcript generating..." placeholder, then updates in real time as processing completes.
Step 2: Processing (Transcode, Transcribe, Summarize)
Once the recording reaches V100, a processing pipeline runs automatically. Transcoding compresses the raw WebM recording into optimized MP4 with adaptive bitrate variants for different connection speeds. Thumbnail generation selects the most visually representative frame from the video using AI (not just the first frame, which is often a loading screen). Transcription converts speech to text with timestamps and speaker labels. AI summary generates a 2-3 sentence overview plus extracted action items.
For async video, transcription is not a nice-to-have feature. It is the core differentiator. A viewer who receives a 5-minute video can scan the transcript in 30 seconds to decide whether they need to watch the whole thing. A viewer who is in a meeting can read the transcript silently instead of watching with audio. A viewer searching for information from last week can search the transcript instead of scrubbing through video timelines. Transcription makes async video searchable, scannable, and accessible.
V100's transcription covers 40+ languages with automatic language detection. If your user base is international, videos recorded in Spanish, German, Japanese, or Portuguese are all transcribed automatically without the sender needing to specify the language. Multi-language dubbing is also available: a video recorded in English can be automatically dubbed into other languages, with the dubbed version accessible on the same share page via a language selector.
Step 3: Hosting and Delivery
V100 hosts processed videos on its CDN and delivers them through adaptive bitrate streaming. The share page loads the video from the nearest CDN edge, so a recipient in Tokyo and a recipient in London both get fast playback start times. You do not manage S3 buckets, CloudFront distributions, or cache invalidation. V100 handles the entire storage and delivery layer.
For an async video product, delivery performance directly impacts user perception. If a recipient clicks a share link and the video takes 4 seconds to start playing, they assume the product is slow. V100's CDN delivery starts playback within 500ms for most viewers globally. Progressive processing means the video is playable before transcoding fully completes, which is critical for the "record and share in 3 seconds" user experience.
Storage costs scale with usage. V100's pricing is based on minutes stored and minutes delivered, not on raw storage volume. A 5-minute video stored for 30 days and watched by 10 viewers costs pennies. At scale (100,000 videos per month, 1 million views), V100's CDN-included delivery is significantly cheaper than self-managing S3 + CloudFront + transcoding infrastructure.
Step 4: Share Page (Video + Transcript + Summary)
The share page is where your product lives for the recipient. It is the most important UI in the entire application, and it is where your vertical differentiation shows. A generic Loom-style share page shows the video, transcript, and a comment box. A vertical-specific share page adds context that matters for that use case.
For a sales-focused async video tool, the share page includes: the video, the transcript, the AI summary, extracted action items, a CTA button (e.g., "Book a Demo"), and engagement analytics visible to the sender ("Prospect watched 80% of the video, paused at the pricing slide for 12 seconds"). For an engineering-focused tool, the share page includes: the video, the transcript, code snippets extracted from screen content, linked Jira tickets, and threaded comments with timestamp references.
// Fetch all data for a share page
const videoData = await v100.videos.get('vid_abc123', {
include: ['transcript', 'summary', 'chapters', 'actionItems', 'analytics']
});
// videoData returns:
// {
// player_url: "https://cdn.v100.ai/embed/vid_abc123",
// thumbnail: "https://cdn.v100.ai/thumb/vid_abc123.jpg",
// duration: 187, // seconds
// transcript: [
// { start: 0.0, end: 4.2, text: "Hey team, quick update on..." },
// { start: 4.2, end: 9.1, text: "The new dashboard is ready..." }
// ],
// summary: "Walkthrough of the new dashboard redesign...",
// action_items: ["Review nav layout", "Test mobile view"],
// chapters: [
// { time: 0, title: "Introduction" },
// { time: 32, title: "Dashboard Changes" },
// { time: 98, title: "Next Steps" }
// ],
// analytics: {
// total_views: 14,
// unique_viewers: 8,
// avg_watch_pct: 72,
// viewers: [
// { email: "jane@co.com", watched_pct: 100, watched_at: "..." },
// { email: "bob@co.com", watched_pct: 45, watched_at: "..." }
// ]
// }
// }
Step 5: Viewer Analytics
Viewer analytics are the feature that converts free users to paid plans. Free users record and share. Paid users need to know who watched. V100 tracks viewer engagement at a granular level: who opened the share link, how much of the video they watched, when they watched it, which device they used, and where they paused or rewatched.
For sales teams, this data is transformative. A sales rep sends a 3-minute product demo to 10 prospects. The analytics show that 7 opened the link, 4 watched more than 80%, and 2 watched the pricing section twice. Those 2 prospects are the hottest leads and should be followed up with immediately. The analytics turn a passive "I sent you a video, let me know what you think" into an active "I noticed you spent time on the pricing section. Would you like me to walk through the ROI model on a call?"
Analytics require viewer identification. There are two approaches. Authenticated viewing requires the recipient to enter their email address before watching, which provides exact identification but adds friction. Cookied viewing tracks anonymous viewers and matches them to known contacts if they later authenticate. Most Loom competitors use a hybrid: the first view is unauthenticated (zero friction), and subsequent views from the same browser are tracked under a consistent anonymous ID that resolves to a real identity when the viewer eventually signs up or enters their email for a gated feature.
Step 6: Integrations
Integrations are what make an async video tool sticky. A standalone recording tool is easy to replace. A tool that is embedded in your team's existing workflow (Slack, email, Notion, Salesforce, Jira) becomes infrastructure that is painful to remove.
Priority integrations by vertical
Sales: Salesforce, HubSpot, Outreach, Gong
Auto-log video sends to CRM contact records. Sync viewer analytics as engagement signals. Trigger follow-up sequences when a prospect watches more than 50% of a video. Surface video engagement data in pipeline reports.
Engineering: Slack, Jira, GitHub, Notion
Embed video previews in Slack messages. Attach videos to Jira tickets. Embed in Notion docs. Link to GitHub PRs for code review walkthroughs. Threaded comments that sync bidirectionally with Slack threads.
Support: Zendesk, Intercom, Freshdesk
Embed video recording in support ticket workflows. Agents respond to tickets with video instead of text. AI categorizes video support requests. Video resolution times tracked alongside text resolution times.
Universal: Email, Slack, Notion
Animated GIF thumbnail in email that links to the share page (autoplay video in email is not supported). Slack unfurl that shows thumbnail, duration, and AI summary. Notion embed that plays inline.
Differentiators vs. Loom
Building on V100 gives you access to AI capabilities that Loom either does not offer or charges premium pricing for. These become your feature differentiation.
- • Edit-by-transcript: Click into the transcript, select and delete text, and the corresponding video segments are removed. No timeline scrubbing. This turns a 5-minute unedited ramble into a tight 3-minute message without video editing skills.
- • Multi-language dubbing: A video recorded in English is automatically available in 40+ languages with AI dubbing. The recipient selects their language on the share page. This is essential for international teams.
- • AI action item extraction: The AI summary does not just summarize what was said. It extracts specific action items ("Review the new nav layout," "Test mobile view by Friday") that can be exported to task management tools.
- • 40+ language transcription: Loom supports ~30 languages. V100 supports 40+ with higher accuracy on non-English languages due to more recent training data.
- • Silence and filler word removal: Auto-edit removes "umm," "uhh," and dead air from recordings before sharing. The recipient gets a polished message without the sender needing editing skills.
Revenue Model
| Plan | Price | Features |
|---|---|---|
| Free | $0 | 5 videos/mo, 5 min max, basic transcript, no analytics |
| Pro | $12-15/user/mo | Unlimited videos, edit-by-transcript, AI summary, viewer analytics, custom branding |
| Team | $29/user/mo | Shared workspace, team analytics, CRM integrations, dubbing, password-protected videos |
| Enterprise | Custom | SSO/SAML, admin controls, compliance, SLA, dedicated support, custom integrations |
What V100 Does Not Do
- • Screen capture. V100 processes and hosts the video after recording. The actual screen capture uses browser-native APIs (getDisplayMedia + getUserMedia). You build the capture UI in your browser extension or web app. V100 does not provide a pre-built recording widget.
- • Desktop app. If your product requires a native desktop app (for capturing system audio on macOS, which Safari restricts), you need to build that separately. V100's SDK works in any JavaScript environment, including Electron, but the native capture layer is your responsibility.
- • User authentication. V100 tracks viewers by token, not by user account. Your application handles user sign-up, login, and account management. V100 provides viewer tracking data, and your app maps V100 viewer tokens to your user accounts.
- • CRM integrations. V100 provides the video data and analytics via API. Building Salesforce, HubSpot, or Slack integrations is your development work. V100 provides webhooks for video events (recorded, processed, viewed) that your integration layer consumes.
- • Billing. V100 is your video infrastructure provider. Your SaaS billing (Stripe subscriptions, usage metering, plan upgrades) is your responsibility. V100 charges you for API usage, and you charge your customers for the product.
Ready to build your async video platform?
Start with V100's free tier. Upload a test video, generate a transcript, create a share link, and see viewer analytics in action. No credit card required for the free tier.