Loom proved that screen + webcam recording is the most effective format for async communication. A face in the corner of the screen creates a personal connection that a plain screen recording lacks. Viewers are 2x more likely to watch a screen recording to completion when they can see the presenter's face. This is why every modern tutorial, product demo, sales walkthrough, and course lecture uses the picture-in-picture (PiP) format: screen content fills the frame, and a circular or rectangular webcam overlay sits in one corner.
The challenge is that recording screen and webcam simultaneously requires capturing two separate media streams (one from the display, one from the camera), compositing them into a single frame, encoding the combined output, and ideally transcribing the audio and generating a shareable link. Different tools handle different parts of this pipeline, and the right choice depends on whether you are a solo user, a team, or a developer building recording into a product.
This guide covers four methods, from the simplest free tool to full API integration, with working code samples and an honest comparison of features, limitations, and pricing.
Why PiP Recording Works: The Data
The effectiveness of screen + webcam recording is not subjective. Multiple studies and platform analyses confirm that adding a face to a screen recording significantly improves engagement and comprehension.
PiP recording impact
2x higher completion rate
Loom's internal data shows that videos with a webcam overlay have approximately 2x the completion rate of screen-only recordings. The face creates social pressure (someone is talking to you) that keeps viewers watching.
Higher trust in sales contexts
Sales teams using video prospecting (sending screen+webcam recordings instead of emails) report 3x higher response rates. The face creates a personal connection that text cannot replicate. Vidyard, Loom, and Sendspark all recommend PiP format for sales outreach.
Better retention in educational content
Research on video lectures shows that students retain information better when they can see the instructor's face alongside the content. The face provides nonverbal cues (emphasis, confusion, excitement) that enhance understanding of the screen content.
Async communication replaces meetings
Teams using PiP screen recordings for status updates, code reviews, and design feedback report 30-50% fewer meetings. A 3-minute recording replaces a 15-minute meeting because the presenter can be more concise, and viewers can watch at 1.5-2x speed.
Method 1: OBS Studio (Free, Desktop)
OBS Studio is a free, open-source desktop application for video recording and live streaming. It is the most powerful free option for screen + webcam recording, offering complete control over layouts, encoding settings, and output formats. OBS is available on Windows, macOS, and Linux.
To record screen and webcam together in OBS, you create a Scene with two Sources: a Display Capture (or Window Capture) for the screen and a Video Capture Device for the webcam. You position and resize the webcam overlay to create the PiP layout. OBS records the combined output as a single video file.
OBS excels at customization. You can create any layout: circular webcam in the corner, side-by-side, picture-in-picture with custom borders, or full-screen webcam with screen as background. You control the encoding codec (H.264, H.265, AV1), bitrate, resolution, and frame rate. For creators who need maximum quality and do not mind a learning curve, OBS is the best free option.
The downside is complexity. OBS has a steep learning curve and is designed for power users. Setting up a clean PiP layout for the first time takes 15-30 minutes of configuration. There is no built-in sharing (you get a local video file), no automatic transcription, no AI features, and no way to generate a shareable link. After recording, you need to manually upload the file, host it, and share the URL. For teams and developers, OBS does not have an API.
Method 2: Loom ($12.50/month)
Loom is the category-defining tool for async video communication. Its browser extension provides one-click screen + webcam recording with automatic cloud hosting and shareable links. Click the extension, choose "Screen + Camera", hit record, and when you stop, Loom instantly generates a shareable link. The entire process takes seconds.
Loom's strengths are speed and simplicity. There is zero configuration. The PiP layout is a circular webcam overlay in the bottom-left corner, and it works immediately. Recordings are automatically hosted on Loom's cloud with a shareable link, viewer analytics (who watched, how far they got), and basic transcription.
The limitations are flexibility and pricing. Loom's PiP layout is fixed (you cannot choose the corner, size, or shape of the webcam overlay in most plans). The free tier limits recordings to 5 minutes. The Business plan at $12.50/user/month removes the limit but adds up quickly for teams. There is no API for developers building recording into their own products. And the transcription is basic: no word-level timestamps, no speaker diarization, no edit-by-transcript.
Method 3: Browser-Native JavaScript (Free)
Modern browsers provide two APIs that enable screen + webcam recording without any external tools. getDisplayMedia() captures the screen, and getUserMedia() captures the webcam. By combining both streams on an HTML Canvas and recording the canvas output with MediaRecorder, you can build a complete PiP recorder in approximately 80 lines of JavaScript.
// Screen + Webcam PiP Recorder using Browser APIs
async function startPiPRecording() {
// 1. Capture screen
const screenStream = await navigator.mediaDevices.getDisplayMedia({
video: { width: 1920, height: 1080 },
audio: true // System audio (if supported)
});
// 2. Capture webcam
const webcamStream = await navigator.mediaDevices.getUserMedia({
video: { width: 320, height: 320, facingMode: 'user' },
audio: true // Microphone
});
// 3. Create canvas for compositing
const canvas = document.createElement('canvas');
canvas.width = 1920;
canvas.height = 1080;
const ctx = canvas.getContext('2d');
// Create video elements for streams
const screenVideo = document.createElement('video');
screenVideo.srcObject = screenStream;
screenVideo.play();
const webcamVideo = document.createElement('video');
webcamVideo.srcObject = webcamStream;
webcamVideo.play();
// 4. Composite loop: draw screen + webcam PiP
function draw() {
// Full-screen: screen capture
ctx.drawImage(screenVideo, 0, 0, 1920, 1080);
// PiP overlay: circular webcam in bottom-left
const pipSize = 200;
const pipX = 40;
const pipY = 1080 - pipSize - 40;
ctx.save();
ctx.beginPath();
ctx.arc(pipX + pipSize/2, pipY + pipSize/2, pipSize/2, 0, Math.PI * 2);
ctx.clip();
ctx.drawImage(webcamVideo, pipX, pipY, pipSize, pipSize);
ctx.restore();
// Border around PiP
ctx.strokeStyle = '#6366f1';
ctx.lineWidth = 3;
ctx.beginPath();
ctx.arc(pipX + pipSize/2, pipY + pipSize/2, pipSize/2, 0, Math.PI * 2);
ctx.stroke();
requestAnimationFrame(draw);
}
draw();
// 5. Record the canvas output
const canvasStream = canvas.captureStream(30); // 30fps
// Mix microphone audio into canvas stream
const audioTrack = webcamStream.getAudioTracks()[0];
canvasStream.addTrack(audioTrack);
const recorder = new MediaRecorder(canvasStream, {
mimeType: 'video/webm;codecs=vp9'
});
const chunks = [];
recorder.ondataavailable = (e) => chunks.push(e.data);
recorder.onstop = () => {
const blob = new Blob(chunks, { type: 'video/webm' });
const url = URL.createObjectURL(blob);
console.log('Recording ready:', url);
};
recorder.start();
return { recorder, screenStream, webcamStream };
}
This approach gives you complete control over the recording experience. You choose the PiP position, size, shape, and border style. You can add your logo, a recording timer, or click indicators. The recording happens entirely in the browser with no external dependencies.
The limitations of browser-native recording are significant for production use. WebM is the only widely supported output format (MP4 requires server-side remuxing). Video quality depends on the user's hardware and browser. There is no built-in hosting, sharing, or transcription. You need server infrastructure to store recordings, generate shareable links, and process the video. This is where V100's API fills the gap.
Method 4: V100 Screen Recording API
V100 provides a recording API that handles the entire pipeline: capture screen and webcam streams from the browser, upload to V100's infrastructure, composite the PiP layout server-side, transcode to MP4, auto-transcribe the audio, generate a shareable link, and optionally produce an AI summary of the recording's content.
The browser-side integration captures the raw streams and sends them to V100. V100 handles compositing, encoding, storage, transcription, and sharing. This means your recording quality is consistent regardless of the user's hardware, the output is always MP4 (universally playable), and every recording gets automatic transcription with word-level timestamps.
import { V100 } from 'v100-sdk';
const v100 = new V100('YOUR_API_KEY');
// Start a PiP recording session
const session = await v100.recording.start({
screen: true, // Capture screen via getDisplayMedia
webcam: true, // Capture webcam via getUserMedia
audio: true, // Capture microphone
layout: {
type: 'pip', // Picture-in-picture
pip_position: 'bottom-left', // Corner placement
pip_size: 200, // Webcam overlay diameter (px)
pip_shape: 'circle', // circle or rectangle
pip_border: '#6366f1' // Indigo border color
},
transcription: true, // Auto-transcribe when done
ai_summary: true, // Generate AI summary
resolution: '1080p'
});
// User records... then stop:
const result = await session.stop();
// result.video_url — hosted MP4 video
// result.share_link — shareable link with viewer analytics
// result.transcript — word-level timestamped transcript
// result.summary — AI-generated summary of the recording
// result.duration — recording length
console.log(`Share: ${result.share_link}`);
console.log(`Summary: ${result.summary}`);
PiP Layout Guide: Which Layout for Which Use Case
Tutorials and code walkthroughs: Bottom-left circle, 200px
Small circular webcam in the bottom-left corner keeps the presenter's face visible without blocking code or content. The bottom-left position avoids the code area (typically top and center) and the scrollbar (right side).
Product demos and sales: Bottom-right circle, 250px
Slightly larger webcam overlay increases the personal connection. Bottom-right works well for demo walkthroughs because the mouse cursor is usually moving through center and left content.
Presentations: Side-by-side, 70/30 split
Slides on the left (70% of frame), speaker on the right (30%). This layout gives the speaker equal visual presence with the slides, which is better for keynotes, lectures, and any content where the speaker's facial expressions and body language add value.
Async status updates: Full webcam with screen thumbnail
For quick 1-2 minute updates, full webcam with a small screen thumbnail in the corner inverts the typical PiP. The focus is on the person, with screen content as supporting context. This works well for stand-up summaries and Slack video messages.
Auto-Transcription During Recording
The most valuable feature of API-based screen recording is automatic transcription. Every recording gets a word-level timestamped transcript immediately after recording completes. This transcript powers several features that standalone recorders like OBS cannot provide.
Searchable recordings: Search your entire library of screen recordings by spoken content. Find the recording where you explained the new API endpoint by searching for "API endpoint." The search returns the recording with a timestamp link that jumps directly to the relevant moment.
AI summary: V100 generates a bullet-point summary of the recording's content. A 10-minute product demo becomes a 5-bullet summary: "Demonstrated new dashboard, showed analytics filtering, explained export feature, discussed pricing change, answered objection about integration timeline." Viewers can read the summary before deciding whether to watch the full recording.
Captions: The transcript is automatically converted to captions, making the recording accessible and watchable on mute. This is particularly important for async recordings shared in Slack or email, where recipients may watch without audio.
Comparison: OBS vs. Loom vs. Browser API vs. V100
| Feature | OBS | Loom | Browser API | V100 |
|---|---|---|---|---|
| Screen + webcam PiP | Yes (manual setup) | Yes (one-click) | Yes (code required) | Yes (API config) |
| Custom PiP layouts | Full control | Limited | Full control | 4 presets + custom |
| Auto transcription | No | Basic | No | Word-level + diarization |
| AI summary | No | Yes (paid) | No | Yes |
| Shareable link | No (local file) | Yes | No (local blob) | Yes + analytics |
| Output format | MP4, MKV, FLV | MP4 (cloud) | WebM only | MP4 (cloud) |
| API for developers | No | No | Yes (raw APIs) | Yes (managed) |
| Viewer analytics | No | Yes | No | Yes |
| Price | Free | $12.50/user/mo | Free (DIY) | Pay per minute |
When to Use Each Method
Use OBS when:
You want maximum quality and control, you are comfortable with setup complexity, you record locally and share manually, or you are also live streaming. OBS is the power-user choice.
Use Loom when:
You want one-click simplicity, you are an individual or small team, you do not need API access, and the $12.50/user/month price fits your budget. Loom is the fastest path from recording to shared link.
Use Browser APIs when:
You are building a recording feature into your own application and want full control over the UI and experience. Be prepared to handle storage, hosting, and transcription yourself.
Use V100 when:
You are building screen recording into your product and need managed infrastructure for compositing, transcription, hosting, sharing, and analytics. V100 handles the backend so you build the UI.
Build screen recording into your product
V100's recording API handles capture, compositing, transcription, hosting, and sharing. Free tier includes 100 API calls per month. Start building today.