Virtual conferences and webinars became permanent fixtures of the business landscape after 2020. What changed between then and now is that audiences got picky. They will not tolerate bad audio, laggy video, missing captions, or a platform that looks like it was designed by an engineering team that has never attended a conference. The bar for virtual events moved from "it works" to "it works beautifully, in my language, with content I can revisit later."
Most organizations face a choice: use a turnkey platform (Hopin, Zoom Events, Goldcast, Bizzabo) and accept their UI, feature set, and pricing, or build a custom solution from scratch and spend 6-12 months on video infrastructure plumbing. V100 offers a third option: an API layer that provides all the conference video infrastructure — SFU, transcription, recording, AI features, CDN, PPV — and lets you build your own branded experience on top.
This post covers what V100 provides for conference and webinar use cases, how each feature works at the infrastructure level, and an honest comparison with the major turnkey platforms.
SFU Mode: 200 Participants with Sub-Second Latency
V100 uses an SFU (Selective Forwarding Unit) architecture for interactive sessions. Unlike an MCU (Multipoint Control Unit), which decodes, composites, and re-encodes all participant video on the server — adding latency and limiting quality — an SFU forwards each participant's encoded stream directly to the other participants. The server selects which streams to forward based on who is speaking, who is on screen, and what the viewer's bandwidth can support.
V100's SFU handles up to 200 active video participants in a single session with sub-second latency. For a conference keynote where one person presents to thousands, the architecture shifts: the speaker (and any panelists) connect via SFU for real-time interaction, and V100 bridges their streams to a broadcast delivery path (HLS/DASH with multi-CDN routing) for the audience. The audience sees the session with 2-4 seconds of latency, which is imperceptible for a keynote format. Questions from the audience come through a moderated chat or raise-hand queue, not through video — so the latency asymmetry does not affect the interactive experience.
This hybrid SFU-plus-broadcast architecture means V100 handles both a 15-person panel discussion and a 50,000-attendee keynote with the same API. You do not need to choose between an interactive tool and a broadcast tool — V100 is both.
# Create a conference session
curl -X POST https://api.v100.ai/v1/conference/session \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"name": "Q3 Product Strategy Keynote",
"mode": "hybrid",
"speakers": { "max": 8, "transport": "sfu" },
"audience": { "max": 10000, "transport": "broadcast" },
"features": {
"ai_director": true,
"transcription": { "enabled": true, "languages": ["en", "es", "fr", "de", "ja", "zh"] },
"screen_sharing": true,
"dvr": true,
"recording": true,
"breakout_rooms": 20,
"ai_summary": true
},
"access": "ticketed",
"white_label": { "domain": "events.yourcompany.com", "branding": true }
}'
AI Transcription: Live Captions in 40+ Languages
Live captions are no longer a nice-to-have. They are a legal requirement in many jurisdictions (ADA compliance in the US, AODA in Canada, EAA in the EU) and a practical necessity for global conferences where attendees speak different languages. V100's AI transcription engine generates live captions in 40+ languages with under 2 seconds of latency from speech to on-screen text.
The transcription pipeline works in two modes. In same-language mode, the engine transcribes the speaker's language in real time — English speech becomes English captions. In translation mode, the engine transcribes and translates simultaneously — English speech becomes Japanese, Spanish, French, or Mandarin captions in real time. Each viewer selects their preferred caption language independently. There is no per-language fee and no limit on the number of simultaneous caption languages.
For conferences specifically, the transcription engine handles speaker diarization — it identifies which speaker is talking and labels captions accordingly. When a panel discussion has five speakers, the captions show "Sarah Chen:" or "Marcus Johnson:" before each statement. This is critical for post-event transcripts and AI summaries, which need to attribute statements to specific speakers.
The transcription also powers real-time search. During a multi-track conference, attendees can search across all sessions for specific topics, and V100 returns timestamped results from the live transcription. This helps attendees navigate between sessions based on what is actually being discussed, not just the session title.
AI Director: Automatic Speaker Switching
Conference sessions with multiple speakers require camera switching. In a physical event, this means a camera operator and a director. In a virtual event, it means someone manually spotlighting speakers in Zoom or hoping the "active speaker" detection is accurate. Both approaches are unreliable.
V100's AI Director handles speaker switching automatically. It analyzes audio energy, lip movement, gestures, and conversation flow to determine who should be on screen. During a panel discussion, it switches to the active speaker within 200 milliseconds of them starting to talk. When a presenter shares their screen, it automatically composites the screen share as the primary view with the speaker in a picture-in-picture overlay. When the presentation ends, it reverts to the speaker grid.
The AI Director also handles transitions between sessions. When one speaker finishes and another begins, it generates a branded transition card (configurable via the graphics API) and smoothly switches to the next session's layout. This produces a broadcast-quality viewing experience without any production crew.
Screen Sharing for Presentations
Screen sharing is the single most important feature for webinars and conference presentations, and it is the feature most platforms get wrong. Common failures: resolution drops to 720p on a slide deck with small text, frame rate is too low for video demonstrations, the share window captures the wrong monitor, and audio from the shared screen does not transmit.
V100's screen sharing captures at native resolution (up to 4K) with adaptive frame rate. Static slides are transmitted at low frame rate to conserve bandwidth, and video content or live demos are transmitted at 30fps. Audio from the shared application is captured and mixed with the speaker's microphone audio, so product demonstrations with sound work correctly. The shared screen is composited by the AI Director — full-screen when the presenter is talking over slides, picture-in-picture with the speaker when they step away from the deck.
PPV for Paid Conferences and Webinars
Paid virtual events are a growing market. Industry conferences that charge $500-$2,000 for in-person attendance can offer virtual tickets at $50-$500 and reach a global audience. Professional development webinars, continuing education sessions, and exclusive workshops all support direct monetization.
V100's PPV system supports the access models that conferences need. A full-conference pass grants access to all sessions and on-demand replays. A day pass covers one day of a multi-day event. Individual session tickets let attendees pay for a single keynote or workshop. VIP passes add exclusive Q&A sessions, backstage content, or early access to recordings. Each access level is a token configuration in the API — no custom access control logic required.
Token validation happens at 31 nanoseconds through Cachee, so there is no delay when an attendee joins a session. The token includes device fingerprinting to prevent credential sharing — a common problem with virtual events where one person buys a ticket and shares the link with their entire team.
Breakout Rooms for Workshops and Networking
Breakout rooms are what separate a conference from a webinar. A webinar is a one-to-many broadcast. A conference is a community event with interactive components — workshops, roundtables, networking sessions, and small-group discussions. V100's breakout room API creates sub-sessions within a parent session, each with its own SFU instance, recording, transcription, and chat.
Breakout rooms can be pre-assigned (workshop tracks defined in advance), self-selected (attendees choose which room to join), or randomly assigned (networking rotation where attendees meet new people every 10 minutes). Organizers can broadcast messages to all rooms, move attendees between rooms, and recall everyone to the main session. Each breakout room is independently recorded and transcribed, so workshop content is preserved for on-demand replay.
For workshops specifically, breakout rooms support collaborative features: shared whiteboards (via screen sharing), live polling, and moderated Q&A. The facilitator has host controls to mute participants, manage screen sharing permissions, and pin speakers. These are API-level controls that you integrate into your application's UI however you choose.
Post-Event: AI Summary, Highlights, and Clips
The value of a conference does not end when the event ends. Post-event content — recordings, transcripts, highlights, and summaries — extends the event's reach and justifies the production investment. V100 automates the post-event content pipeline with AI.
Post-event AI pipeline
- • AI summary. Within minutes of a session ending, V100 generates a structured summary: key points, action items, notable quotes (with speaker attribution), and topics discussed. The summary is available via API and can be emailed to attendees automatically.
- • Highlight clips. V100 identifies the highest-engagement moments (based on audience reactions, Q&A activity, and content analysis) and auto-generates 30-90 second highlight clips. These are ready for social media distribution within an hour of the event ending.
- • Searchable transcript. The full transcript, with speaker diarization and timestamps, is available for text search. Attendees (and people who missed the event) can search for specific topics and jump directly to that moment in the recording.
- • Chapter markers. V100 segments each session recording into chapters based on topic changes, speaker transitions, and slide changes. Viewers can jump to specific chapters in the on-demand recording without scrubbing through the entire session.
DVR: On-Demand Replay During and After the Event
At a multi-track conference, attendees cannot attend every session. DVR (Digital Video Recording) lets them rewind and replay sessions they missed, even while the conference is still running. V100's DVR maintains a rolling buffer of all active sessions and converts them to on-demand recordings when the session ends.
DVR also handles the timezone problem. For a global conference with attendees across 20+ timezones, some sessions happen at 3 AM for parts of your audience. DVR ensures those attendees can watch on their own schedule without feeling like second-class participants. The experience is identical to live viewing — same player, same captions, same chapter markers — just time-shifted.
White-Label: Your Brand, Your Domain
V100 is infrastructure. There is no V100 branding visible to your attendees. The video player, the conference interface, and the attendee experience are entirely your design, running on your domain. V100 provides the SDKs (JavaScript, iOS, Android) and the APIs. You build the UI. Your brand. Your domain. Your data.
This matters for two reasons. First, brand consistency. A Fortune 500 company hosting an investor day does not want their event running on "hopin.com" or "zoom.us." They want it on "events.theircompany.com" with their design system, their color palette, and their user experience. Second, data ownership. With V100, attendee data, engagement analytics, and content recordings live in your infrastructure. You are not a tenant in someone else's platform.
V100 vs. Hopin vs. Zoom Events vs. Goldcast
The comparison between V100 and turnkey event platforms is not apples-to-apples, and it is important to be honest about that. Hopin, Zoom Events, and Goldcast are complete applications. V100 is infrastructure. Here is what that means in practice.
| Feature | V100 | Hopin | Zoom Events | Goldcast |
|---|---|---|---|---|
| Type | API / Infrastructure | Turnkey platform | Turnkey platform | Turnkey platform |
| White-label | Full (your domain, UI) | Limited branding | Zoom-branded | Partial branding |
| Max participants (SFU) | 200 active video | 100 on stage | 100 panelists | 25 speakers |
| Max audience (broadcast) | Unlimited (CDN) | 100,000 | 50,000 | 100,000 |
| AI transcription | 40+ languages, live | English + limited | English + limited | English primarily |
| AI Director | Yes (20Hz switching) | No | No | No |
| Post-event AI summary | Yes (auto-generated) | No | Zoom AI Companion | Basic |
| PPV / ticketing | Full (session-level) | Event-level | Event-level | Event-level |
| Breakout rooms | Yes (API-managed) | Yes | Yes | Limited |
| DVR / on-demand | Yes (during + after) | After event only | After event only | After event only |
| Custom dev required | Yes (UI is yours) | No | No | No |
The honest tradeoff: V100 requires development effort to build the attendee-facing experience. Hopin, Zoom Events, and Goldcast give you a working event platform immediately. If you need to host a conference next week, use a turnkey platform. If you are building a conference product, a webinar SaaS, or a branded event experience that you will run repeatedly, V100 gives you the infrastructure to build exactly what you want without being constrained by someone else's UI decisions.
When V100 Is the Right Choice
Choose V100 when:
- • You are building a conference or webinar product. If virtual events are your business (not a one-off), V100 is the infrastructure layer that lets you differentiate on UX, features, and brand.
- • Brand control is non-negotiable. Enterprise organizations hosting investor days, sales kickoffs, or partner summits want their domain and their design. No "Powered by Zoom" footer.
- • You need global multilingual support. 40+ language transcription and translation is not available on most turnkey platforms, or is limited to post-event processing.
- • You monetize events directly. Session-level PPV, VIP tiers, and granular access control are infrastructure problems. V100 solves them at the API level.
- • You want AI features that do not exist elsewhere. AI Director for speaker switching, auto-generated highlight clips, post-event AI summaries with speaker attribution — these are V100-native capabilities.
And the honest flip side: if you run 2-3 events per year and your team does not include developers, a turnkey platform is the pragmatic choice. V100 is infrastructure for teams that build on infrastructure.
Build your conference platform on V100
Get a free API key and create your first conference session. SFU, AI transcription, AI Director, DVR, and breakout rooms are available on all plans.