The AI meeting assistant market is growing at 40% year-over-year. Otter.ai, Fireflies, Fathom, and Granola have proven that people will pay $15-30 per user per month to never take meeting notes again. But the incumbents are all built on the same fragile stack: Deepgram or AssemblyAI for transcription, OpenAI for summaries, and a patchwork of integrations held together by duct tape. Their margins are thin because they are paying retail for every API call.
V100 changes the economics. Transcription at $0.006 per minute (versus $0.01-0.05/min from standalone ASR providers), built-in speaker diarization, AI-powered highlight extraction via Claude Haiku, and a conferencing API that handles the recording layer. You build the product experience and the integrations. V100 handles the audio intelligence pipeline.
This guide walks through every step of building an AI meeting assistant on V100: what users actually want, the technical architecture, a step-by-step MVP build, revenue model, and an honest comparison of what V100 handles versus what you need to build yourself.
The Market: Why AI Meeting Assistants Are a $4B+ Opportunity
Knowledge workers spend an average of 31 hours per month in meetings. Most of that time produces no written record. Action items get forgotten. Decisions get re-litigated in the next meeting because nobody remembers what was agreed upon. The meeting assistant category exists because meetings are broken, and AI is finally good enough to fix them.
Otter.ai has over 25 million users. Fireflies processes over 300,000 meetings per day. Fathom was acquired by Notion for a reported $2B. The market is large, but it is not winner-take-all. Different segments want different things. Sales teams want CRM sync and deal intelligence. Engineering teams want Jira ticket creation from action items. Legal teams want verbatim transcripts with speaker attribution. Executives want 30-second summaries they can skim between meetings.
The opportunity for new entrants is vertical specialization. Build the best meeting assistant for sales teams, or for healthcare, or for education, or for legal. The horizontal players cannot go deep on every vertical. You can.
What Users Actually Want From a Meeting Assistant
After analyzing the feature sets of every major meeting assistant and hundreds of user reviews, here is what matters, ranked by how often users mention it.
Feature priority (based on user demand)
Auto-join meetings without friction
The bot should join every meeting automatically. No manual start, no browser extension clicks, no forgetting. Calendar integration is table stakes.
Accurate transcription with speaker identification
Users need to know who said what. A transcript without speaker labels is nearly useless for follow-up. Speaker diarization is not optional.
Action items extracted automatically
"John will send the proposal by Friday" should become an action item assigned to John with a Friday deadline, without anyone manually typing it.
Post-meeting summary delivered immediately
Within 60 seconds of the meeting ending, a structured summary should hit Slack or email: key decisions, action items, open questions, and next steps.
Search across all past meetings
"What did we decide about the pricing model in the Q3 planning meeting?" Full-text search across every transcript your team has ever generated.
Integrations with existing tools
Summaries to Slack. Action items to Jira or Linear. Contact intelligence to HubSpot or Salesforce. Meeting notes to Notion or Google Docs. The assistant must live where the team already works.
What V100 Provides for Meeting Assistants
V100 is not a meeting assistant. It is the infrastructure layer that makes building one dramatically faster and cheaper. Here is exactly what V100 handles and what you build on top.
| V100 Handles | You Build |
|---|---|
| Real-time transcription (40+ languages) | Calendar integration (Google/Outlook) |
| Speaker diarization | Meeting bot join mechanism |
| AI highlights (Claude Haiku) | CRM/PM tool integrations |
| Action item extraction | User dashboard and search UI |
| Sentiment analysis | Notification delivery (Slack/email) |
| Post-meeting summary generation | Billing and subscription management |
| Meeting recording + storage | Team management and permissions |
| Edit-by-transcript (edit recording via text) | Onboarding and customer support |
Architecture: How the Pieces Fit Together
The architecture of an AI meeting assistant has five layers. V100 handles layers 2 and 3 (capture and intelligence). You build layers 1, 4, and 5 (scheduling, delivery, and integrations).
Layer 1: Scheduling (You build)
Google Calendar API / Outlook API
↓ detect upcoming meetings
↓ extract meeting link (Zoom/Meet/Teams)
Layer 2: Capture (V100)
Meeting bot joins via link
↓ V100 conferencing API records audio/video
↓ real-time audio stream to transcription
Layer 3: Intelligence (V100)
Real-time transcription (40+ languages)
↓ speaker diarization (who said what)
↓ AI highlights (Claude Haiku)
↓ action item extraction
↓ sentiment analysis per speaker
↓ post-meeting summary
Layer 4: Delivery (You build)
Summary → Slack channel / email
Action items → Jira / Linear / Asana
Contact insights → HubSpot / Salesforce
Full transcript → Notion / Google Docs
Layer 5: Search & Dashboard (You build)
Full-text search across all transcripts
Meeting library with filters
Speaker analytics and talk-time ratios
Team usage and billing dashboard
Step-by-Step: Building the MVP
Step 1: Meeting Capture
The first decision is how your assistant captures meeting audio. There are two approaches, and you should start with the simpler one.
Option A: Recording upload (MVP). Users upload meeting recordings after the fact. This sidesteps the complexity of bot-joining live meetings entirely. V100's upload API accepts MP4, WAV, WebM, and common audio/video formats. The user records on Zoom (which saves a local file), uploads it to your app, and V100 processes it. This is how you validate demand before building the harder path.
Option B: Live bot join (production). Your bot joins meetings automatically via V100's conferencing API. This is what users expect from a mature meeting assistant, but it requires calendar integration and platform-specific bot join logic. Build this after you have validated the product with Option A.
import { V100 } from 'v100-sdk';
const v100 = new V100('YOUR_API_KEY');
// Option A: Process uploaded recording
const meeting = await v100.assets.upload({
file: recordingFile,
transcription: {
enabled: true,
language: 'auto', // Auto-detect language
diarization: true, // Speaker identification
speaker_labels: participantNames // Map speakers to names
},
ai_highlights: true, // Key moments extraction
ai_summary: {
enabled: true,
format: 'structured', // Decisions, actions, questions
model: 'claude-haiku' // Fast, cost-effective
},
sentiment: true // Per-speaker sentiment tracking
});
// Webhook fires when processing completes (~30-90 seconds)
// meeting.transcript, meeting.summary, meeting.action_items
// meeting.highlights, meeting.sentiment all available via API
Step 2: Transcription with Speaker Identification
V100's transcription engine produces timestamped, speaker-attributed text. Each segment includes the speaker ID, start time, end time, and confidence score. When you provide participant names from the calendar invite, V100 maps speaker IDs to names using voice fingerprinting within the first 30-60 seconds of the meeting.
Accuracy matters here. V100's transcription accuracy exceeds 95% for clear audio with distinct speakers. It degrades with overlapping speech (multiple people talking simultaneously), heavy background noise, or strong accents in less-common languages. For your MVP, set honest expectations in your product: show confidence scores alongside transcript segments, and let users correct errors. Those corrections improve the speaker model for future meetings with the same participants.
The transcription supports 40+ languages with automatic language detection. For multilingual meetings (common in global teams), V100 detects language switches mid-meeting and transcribes each segment in its detected language. This is not perfect. Code-switching within a single sentence sometimes confuses the detector. But for meetings that are primarily in one language with occasional switches, it works well.
Step 3: AI Highlights and Action Items
This is where V100's Claude Haiku integration transforms a raw transcript into meeting intelligence. V100 identifies five categories of highlights automatically.
AI-extracted meeting intelligence
- • Decisions: "We decided to launch the beta on April 15th" — extracted with context and attributed to the decision-maker.
- • Action items: "Sarah will prepare the investor deck by next Tuesday" — extracted with assignee, task description, and deadline.
- • Questions (unanswered): "What is our fallback plan if AWS costs exceed the budget?" — flagged for follow-up.
- • Key topics: Topic boundaries detected throughout the meeting, creating navigable chapter markers in the recording.
- • Sentiment shifts: Moments where the conversation tone shifted significantly (disagreement, excitement, concern) — useful for sales calls and customer meetings.
The structured output from V100's AI layer is JSON, making it straightforward to render in your dashboard or push to integrations. Action items include an assignee field, a description field, and a deadline field (when mentioned). Your application maps these to Jira tickets, Linear issues, or Asana tasks via the respective APIs.
Step 4: Summary Generation
V100 generates a structured post-meeting summary that includes an executive overview (2-3 sentences), a list of key decisions, a list of action items with owners, open questions that were not resolved, and suggested next steps. The summary is generated within 30-90 seconds of the meeting ending, depending on meeting length.
The summary quality depends on the meeting content. For well-structured meetings with clear agendas and distinct speakers, summaries are excellent. For freeform brainstorming sessions with overlapping conversation, the AI sometimes misattributes statements or misses implicit action items. This is a limitation of every AI meeting assistant, not specific to V100. Your product should let users edit and correct summaries, and those corrections should feed back into the AI to improve over time.
Step 5: Integrations
Integrations are where your meeting assistant becomes indispensable. A standalone transcript viewer is useful. A transcript that automatically creates Jira tickets, updates HubSpot contact records, and posts summaries to the right Slack channel is a product people cannot live without. Here is the integration priority for your MVP.
// After V100 webhook fires with processed meeting data
async function postMeetingWorkflow(meetingData) {
const { summary, action_items, transcript, highlights } = meetingData;
// 1. Post summary to Slack
await slack.chat.postMessage({
channel: meetingData.slack_channel,
blocks: formatSummaryBlocks(summary, action_items)
});
// 2. Create Jira tickets from action items
for (const item of action_items) {
await jira.createIssue({
summary: item.description,
assignee: mapToJiraUser(item.assignee),
dueDate: item.deadline,
description: `From meeting: ${meetingData.title}\n\n${item.context}`
});
}
// 3. Update HubSpot contacts (for sales meetings)
if (meetingData.type === 'sales_call') {
await hubspot.updateContact(meetingData.contact_id, {
last_meeting_summary: summary.executive_overview,
sentiment: meetingData.sentiment.overall,
next_steps: summary.next_steps
});
}
// 4. Save full transcript to Notion
await notion.pages.create({
parent: { database_id: MEETINGS_DB },
properties: { title: meetingData.title, date: meetingData.date },
children: formatTranscriptBlocks(transcript)
});
}
Step 6: Search Across All Meetings
Full-text search across all meeting transcripts is the feature that makes your assistant more valuable over time. Every meeting your team records adds to a searchable knowledge base. "What did the customer say about the pricing in the January call?" becomes a search query, not a 45-minute re-watch.
V100 provides the indexed transcript data. Your application builds the search interface. Use Elasticsearch, Typesense, or Postgres full-text search on your backend. For the best user experience, return search results with clickable timestamps that jump to the exact moment in the recording. V100's timestamped transcript makes this straightforward to implement.
The Differentiator: Edit-by-Transcript
V100's edit-by-transcript feature is your single biggest competitive advantage against incumbents. Users can edit the meeting recording by editing the transcript text. Delete a sentence from the transcript, and the corresponding audio/video is removed from the recording. This is the same technology that Descript charges $24-33 per user per month for as a standalone product.
For meeting assistants, this unlocks powerful use cases. Users can remove off-the-record segments by deleting them from the transcript. Sales managers can create highlight reels of customer calls by selecting the best segments. Trainers can extract teaching moments from long meetings and share just the relevant 2-minute clips. All of this happens by selecting text in the transcript, not by scrubbing a video timeline.
Otter.ai, Fireflies, and Fathom do not offer edit-by-transcript. They offer transcripts and recordings as separate, read-only artifacts. V100's edit-by-transcript makes them interactive. This is a defensible product differentiator that is difficult for competitors to replicate because it requires deep integration between the transcription engine and the video processing pipeline.
Revenue Model
The meeting assistant market has settled on per-user-per-month pricing with a freemium entry point. Here is a proven pricing structure based on what incumbents charge and what the V100 cost structure supports.
| Tier | Price | Limits | Features |
|---|---|---|---|
| Free | $0 | 5 meetings/month | Transcription, summary, 7-day retention |
| Pro | $15/user/mo | Unlimited meetings | + action items, search, Slack, edit-by-transcript |
| Team | $29/user/mo | Unlimited + team features | + CRM sync, Jira, analytics, custom workflows, API |
At $15/user/month with 1,000 paying users, you generate $15,000/month in revenue. Your V100 cost at that scale (assuming 10 meetings/user/month, 30 minutes average) is approximately $2,700/month in transcription ($0.006/min x 300,000 minutes). Your gross margin is roughly 80%, which is healthy for SaaS.
Cost: V100 vs. Building From Scratch
| Component | Build From Scratch | V100 |
|---|---|---|
| Transcription (ASR) | $0.01-0.05/min (Deepgram/AssemblyAI) | $0.006/min |
| Speaker diarization | $0.005-0.02/min (add-on) | Included |
| AI summaries (LLM) | $500-2,000/mo (OpenAI/Anthropic) | Included (Claude Haiku) |
| Recording storage | $200-1,000/mo (S3) | Included |
| Edit-by-transcript | 6-12 months eng (custom build) | API call |
| Monthly cost (10K meetings) | $5,000-$15,000 | $1,800-$3,000 |
Honest cost comparison
If you only need transcription and do not need speaker diarization, AI summaries, edit-by-transcript, or sentiment analysis, Deepgram at $0.01/min is a proven and cost-effective option. Their API is mature, well-documented, and battle-tested at scale.
V100's advantage is in the bundled intelligence layer. If you need transcription plus diarization plus AI extraction plus edit-by-transcript, V100 provides all of these as a single API instead of stitching together Deepgram + OpenAI + custom video editing + S3 storage. The cost savings compound as you scale, and the development time reduction is significant: weeks instead of months to reach feature parity.
What V100 Does Not Do
- • Calendar integration. V100 does not connect to Google Calendar or Outlook. You build the scheduling layer that detects upcoming meetings, extracts meeting links, and triggers the bot to join.
- • Platform-specific bot joining. Joining a Zoom meeting programmatically requires the Zoom Bot SDK. Joining Google Meet requires the Calendar API plus a headless browser approach. Joining Teams requires the Microsoft Graph API. V100 provides the recording and transcription layer once your bot is in the meeting, but the join mechanism is your responsibility.
- • CRM and PM tool integrations. V100 outputs structured JSON (transcript, summary, action items). Mapping action items to Jira tickets or updating HubSpot contacts is your application logic.
- • Real-time coaching. Some meeting tools provide live suggestions to sales reps during calls ("mention the competitor's recent outage"). V100 does not provide real-time coaching. The AI analysis happens post-meeting or with a short delay during the meeting.
- • Compliance recording with legal chain-of-custody. While V100 provides PQ-signed transcripts for integrity verification, it is not a certified compliance recording platform for regulated industries like financial services (MiFID II) or healthcare (HIPAA). If you need certified compliance recording, you may need additional infrastructure.
Ready to build your AI meeting assistant?
Start with V100's free tier. Upload a test meeting recording, see the transcription and AI highlights in action, and prototype your integration workflow before committing to a paid plan. No credit card required.