Rendered at 22:46:45 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
blopker 1 days ago [-]
Nice! I really like how many variations on this idea are coming out. MacWhisper used to be great, but is kinda of a buggy mess now.
I'm making my own, for personal use. I did a survey of many and they all (that I could find) skip the fundamentals.
The major issues that I've run into:
- Crash recovery. Most of these apps are incredibly buggy and crash all the time, taking the recorded audio with them. Macwhisper is incredibly bad at this.
- Disk space. Many of these apps save wav files to disk. After a few hours of meetings, you may end up with gigabytes eaten.
- Microphone bleed. People don't always use headphones, the system mic will pick up the speaker sounds, causing duplicate (approximately) transcriptions.
I've yet to find a solution that handles all these correctly, let alone having high quality transcriptions.
Crash recovery is definitely something that I want to spend a bit more time on. I'm not entirely sure how Trace handles crashing right in the middle of a recording, so I'm going to put a bit of time aside in the next few days to properly explore this and see if I can come up with an elegant solution to it.
I think I've got the other two bits covered. I pushed an update yesterday that adds active echo cancellation so that audio playing through the speakers (or leaky headphones) won't get transcribed twice if it is picked up by the microphone. It can be disabled in preferences, but it's on by default.
The disk space issue is one that I considered as well. By default, Trace deletes the actual audio recordings as soon as transcription is successfully completed, so the idea is you keep just the markdown transcript rather than the gigabytes of raw audio. If you want, there's a preference to disable the auto-deletion. There's a bit more on the support page here https://traceapp.info/support (search for "Auto-deletion of audio").
FluidAudio is a big part of this and is actually used in two places during a session. It runs the Parakeet EOU model for the instant recap (which isn't hugely accurate, but it's good enough for the job) and after the call it's also used to transcribe the recording, depending on which engine you've selected (Trace offers a fast and an accurate one). If the fast engine is selected, we use FluidAudio with the Parakeet-TDT 0.6b v3 model for transcription, which then goes through Pyannote and WeSpeaker for diarization. If the accurate engine is selected, we use WhisperKit with the Whisper large-v3-turbo model for transcription, and SpeakerKit for diarization.
kstenerud 17 hours ago [-]
For crash resilient data, you have a few options:
- Journaling file structures (telegraph what you're about to write, then write it, then signal completion)
- memmap your important data structures to a file (they will be flushed to disk no matter how your app dies - short of a power loss)
- post-crash dump (put last-minute writers in a crash handler to save it to disk)
A journaling file structure is the most secure, because it's designed with the assumption that writing will eventually fail. memmapped structs are easy and cheap, and get you 99% of the way there (only power loss will lose your data). Crash-time writing is doable with a crash handler like KSCrash, but there are many ways an app can crash without triggering a crash handler (thermal kill, exceeding quota, memory jetsam, etc). You also need to write your data in a signal-safe manner.
jamesbagley849 4 hours ago [-]
[flagged]
scosman 21 hours ago [-]
I had the same experience so started building my own. All problems are solvable, just working on the polish.
- crash recovery: part one is use ADTS aac (even if process crashes, audio is saved up until it does). Part two is isolating the transcription/summaries in separate XPC services.
- disk space: AAC 64kbps mono soles it. Could use Opus for further reduction but both are small.
- speaker bleed: macOS voice isolation processing solves this. It’s a nightmare to get setup, but works great once done.
- library: using argmax SDK - by a bunch of ex-Apple on device AI folks.
It it wasn’t for CoreAudio, I’d say it was easy to make. Argmax, Whisper, and llama.cpp - wrapped in the right architecture, mostly just work.
I’m having fun nerding out on the details like custom vocabulary (get the names of the people in here meeting right), inferring speaker names from transcript, calendar integration, nice UI, etc.
victorbjorklund 7 hours ago [-]
Handy works good with crash recovery (mostly from me turning off the computer mid-recording because I forgot about the recording)
jv22222 1 days ago [-]
Nice tip on FluidAudio that's the kind of thing I've been looking for. Thanks!
highmastdon 1 days ago [-]
I’m using MacParakeet these days. If your language is supported, definitely give it a try. It’s much faster and lower footprint
Folcon 22 hours ago [-]
> I've yet to find a solution that handles all these correctly, let alone having high quality transcriptions.
Wait really? I honestly would have thought this was a solved problem by now, especially high quality transcriptions bit, just out of curiosity, is the problem that the quality isn't high enough?
blopker 21 hours ago [-]
There are still a few unsolved problems that require tuning for specific applications. Applications that own the video call have a much easier time, they have access to each individual audio stream. Applications like this, however, have to deal with overlapping voices from a single stream. If it's trying to attribute each utterance to an individual, separating the voices is tough, or can lead to confusing transcripts. There are many little problems like this which make it a tough problem in real world usage. Domain specific terms, or proper nouns is another source of inaccuracy.
sofixa 12 hours ago [-]
> Wait really? I honestly would have thought this was a solved problem by now, especially high quality transcriptions bit, just out of curiosity, is the problem that the quality isn't high enough?
If I had to guess, all of those apps are probably vibecoded, hence the variable quality.
scimonk 15 hours ago [-]
The App looks really interesting and I’d love to try it out. How well does it work in other languages than English? For me, German would be important.
Due to audio quality, transcription sometimes produces garbled output or understands something wrong. FluidVoice offers the option to use a LLM to „interpret“ the text to rescue garbled audio through context. Do you also plan to support something like this? This would be a great feature!
> Which languages does Trace support?
English only, for now. Both transcription models, Fast and Accurate, are built for English audio. A recording in another language will still produce a transcript, but it won’t be accurate: the model maps whatever it hears onto English words, so the result comes out garbled rather than failing outright.
> If transcribing other languages matters to you, get in touch (see Contact below).
scimonk 13 hours ago [-]
Thanks! Somehow I missed that part. Didn't look for language support in the "preferences" section I guess.
denbyc 1 days ago [-]
I'd love to have a purchase option not tied to the App Store if possible. I don't use an Apple account with my Mac, but I would love to try Trace.
AG342 21 hours ago [-]
This is definitely on the to-do list if there’s enough demand for it. The payment/distribution/updates infra required is not insignificant, especially if nobody was that bothered, but by the sounds of it they are so I’ll bump this up the priority list.
addozhang 23 hours ago [-]
Agreed, no need to tie it into Apple either.
tillcarlos 10 hours ago [-]
try amore.computer - that might do the trick?
thenipper 21 hours ago [-]
Also agreed, my work prohibits App Store apps so i have to skip things like this.
lee_ars 4 hours ago [-]
It works well so far, but what's up with the weird non-standard menubar menu? It's very odd, and it doesn't respect system light/dark mode preferences.
zmmmmm 11 hours ago [-]
I've tried so many of these and paid for a lot of them and I still can't find what I want. It sounds like this is closer than most:
- record and separate two sides of the conversation
- save meetings in a simple transcription format in a local folder
- connect with my calendar (Outlook, Google Calendar) and name meeting transcripts accordingly
- for recurring meetings, append rather than create a new transcript
- let me label speaker voices and recognise those voices across different meetings
A tool that did all this and then ALSO built a knowledge base to let me RAG query my meetings would be the holy grail for me.
watchlight 1 days ago [-]
Agreed with JohnBiz, the moment flagging is interesting and unusual, and a nice contrast to passive transcription. I only recently learned about MacWhisper (I'm Windows primarily) and was floored to learn how expensive the Pro option is. Nowadays it's not so hard to have some-level of DIY transcription, so crazy that it's priced with a premium.
What's your diarization pipeline? Pyannote?
I'd taken a different approach that used a LLM clean-up pass to summarize and progressively compress the transcript for ultra-long content, but I like the idea of targeted "pay attention here" flags.
1 days ago [-]
tillcarlos 19 hours ago [-]
Had the same idea, but have to focus on my main business. This comes at the right time!
I just purchased it. What's the best way to give you feedback? (Do you want any?)
From the top of my head:
- will the mic switch automatically when I am at my office? Or do I have to change settings every time? Maybe a preference of what's available + auto switch would be good.
- I personally don't need the hot key. Menu bar icon would be fine.
- Download the model is a long process. Put it into the installer, not into the bar on the bottom
- Speaker correction would be amazing. If it could "Learn" the speakers based on voice.
- Overall neat app. Good animations and UX
**Speaker 1** [00:00] What if I fell to the floor?
**Microphone** [00:02] Yes, this is Phil, I'm just speaking, this should be my voice, and there's music in the
**Speaker 1** [00:05] Couldn't tell this anymore
AG342 18 hours ago [-]
Thanks for the feedback. Feel free to drop an email to hello@traceapp.info if you want a chat. Happy to hop on a call too if you'd prefer.
For the switching, do you mean if you hot-swap during a call? The mic should auto-switch if you've got System default selected, but feel free to give it a go and report back. If it doesn't do what we expect I can absolutely take a look at changing the behaviour.
Learning speakers is also on the to-do list.
P.S. Great choice in test audio. What a banger.
shireboy 8 hours ago [-]
I tried this but it took over my mic and people couldnt hear me on teams call until I turned it off. Nice idea but needs to share the mic w teams to be useful for me. Not sure if it’s teams fault or trace fault but either way…
AG342 6 hours ago [-]
Looks like echo cancellation is to blame here. I'm working on a more permanent fix but for now try turning off "Cancel system audio from the microphone" to see if that helps.
AG342 7 hours ago [-]
Someone else has also reported this, which is not coincidental… Have you tried this with other applications too to see if it’s a teams-specific issue?
I’ll take a look as my top priority.
6 hours ago [-]
littlecranky67 16 hours ago [-]
Would love to use this app, I recently thought about coding something similar myself. I would need to only record my own voice due to privacy laws (here in Germany, you can record yourself without consent). With overthe-ear headset, the microphone only captures my voice. Would need to store the original audio plus the transcription. Ideally, you can configure it to start as soon as it detects a new window with a given title (i.e. Webex launches meetings in a new window named "Meeting ....").
mrkn1 21 hours ago [-]
The key moments feat is neat. Been working on a free opensource offline transcriber that runs fast on CPU and does diarization too
This is an excellent product and exactly what I've been looking for. But most of my meetings are done on my company Mac, and they definitely won't let me install this kind of software, even though I'd be willing to pay for it myself.
geniium 21 hours ago [-]
And if it runs on the browser without install it would not probably be able to record your other browser (or app) audio
z3ugma 9 hours ago [-]
Maybe we ought to be making little hardware passthroughs that plug into the headphone/mic jack and control them with idk the Caps Lock signal from USB HID to start recording.
It's very cyberpunk eventually...the human operator of the console needs to be able to see and hear the screen and sound, there will always be an interface that can be adapted to a machine, however low-fidelity
A webpage cannot provide a system I/O device (camera, microphone, speaker, etc.). That requires a signed driver on MacOS.
robertkarl 23 hours ago [-]
This looks sick. I was going to download it but for $10 I am more willing to attempt asking Claude to implement something like it, than to purchase.
I would be more willing to purchase if it was open source and I could build from source to try it first.
satvikpendem 23 hours ago [-]
It's kinda funny how frontier LLMs change the game when it comes to software. If it becomes so good to make whatever little utility you want, why would I pay 10 dollars when an AI subscription is 20 bucks and I can build way more in a month for that $20? Especially since it's very likely people on show HN have simply used AI anyway, so why would I pay for your prompts?
PufPufPuf 14 hours ago [-]
AI is great at getting you 80% there. But you have to finish the remaining 80% yourself.
addozhang 23 hours ago [-]
I don't really recommend it. If the software is a one-time purchase, there's no need to rewrite it with an LLM. Rewriting the tokens could cost more than just $10.
anonymouse008 22 hours ago [-]
* full price tokens, yes
Not the subsidized subs
plaguuuuuu 22 hours ago [-]
I'd much rather spend $10 than have to sit at a prompt every day babysitting the thing, after working all day sitting at a prompt babysitting other things
yilugurlu 7 hours ago [-]
Looks awesome, I just bought. I'd love to see Office/Teams Calendar integration.
Good luck!
AG342 6 hours ago [-]
Thanks. Trace integrates with your Mac calendar, so if you’ve signed into your accounts with those it will pick them up. No support currently for direct Teams/Exchange integration but I’ll add it to the list.
haaz 8 hours ago [-]
Really nice, I will buy now and test out. How did you make the video on your website? And are you based in UK?
AG342 8 hours ago [-]
Thanks for the support. The videos are Vue components with CSS animations. As mentioned in a previous reply I’m happy to share the component if it’s of interest.
Yep, based in Sheffield, UK.
nkmnz 1 days ago [-]
Which Speech-to-Text is used? Is it possible to configure it? This might be crucial for supporting languages other than English - the model that comes built-in with macOS fails completely for German.
AG342 20 hours ago [-]
[dead]
ectoloph 11 hours ago [-]
I've just bought it based on the description.
Minor complaint is that it steals Cmd-Shift-P (Firefox Private Browsing shortcut) by default.
Easy to change in the UI though, so no big deal.
usernametaken29 21 hours ago [-]
I don’t have this particular use case right now but if anything it feels like LLMs and their distilled on prem models are starting to kill SaaS simply because it becomes more and more tenable to build a “complete software” in a short time frame. That’s freaking awesome. Good idea and love the return of the good old you buy, you own it mentality
mushufasa 1 days ago [-]
This looks like a good approach, though I would expect this to be a native macOs feature within 12 months -- this seems totally like it fits into their product roadmap.
nightpool 19 hours ago [-]
Is this legal to use in 2 party consent states? Might vary from state to state, which is probably why both zoom and Meet require users to click through consent screens when meetings are bring transcribed. Might be useful to have that on the FAQ page
AG342 18 hours ago [-]
Recording and consent rules vary quite a bit by country and region, so it's on whoever is doing the recording to get the consent they need and to follow the law where they are. Still, I like the idea of putting a note on the FAQ to make that clear.
Gigachad 19 hours ago [-]
It would be if you tell the person you are recording first.
Myrmornis 23 hours ago [-]
I will be happy to spend £10 on this. One feature question though -- does it continue transcribing the meeting even if I've turned my volume down / muted it?
AG342 22 hours ago [-]
It does indeed. Trace will record your system audio regardless of your speaker volume. You do have the option to mute your own mic temporarily though, via a button on the “pill” or a global keyboard shortcut.
Myrmornis 4 hours ago [-]
Thanks. I've bought it and started using it; it looks great. I was previously using Hyprnote which did work well, but yours appears to fit my "I just want markdown" case better and to generally be more polished.
I'll be wanting to find a good workflow to get the markdown transcripts into a git repo with file names that define a suitable sort order and also indicate what the meeting was. So would welcome your suggestions there. Not blocked of course, yo umake it easy to copy from clipboard or from the disk location and rename, but might be nice to have more control about where and how the .md lands.
I might email the support address on the off-chance that you're happy to have support/feature conversations like this. Thanks!
AG342 4 hours ago [-]
Please do drop me a line on hello@traceapp.info and let's chat!
iorinu 8 hours ago [-]
I thought it was really interesting
frabia 1 days ago [-]
Super interesting! How accurate is the local model to transcribe audio compared to other cloud services? E.g. Google Meet, Otter, Granola, etc.
watchlight 1 days ago [-]
A lot of the available models are Whisper or Faster-Whisper derived and shared across multiple apps. The tier names are often funny... "Tiny" "base" "small" "medium" "large" "large-v2" "large-v3" "large-v3-turbo" -en only variants, etc.
In my experience, medium is often the sweet spot for English accuracy vs speed, especially if following-up with a post-processing pass. The large options are all fine, but can severely slow it down. There are some speed checks on my website if you're curious (link not posted because I don't want to hijack another post's app).
fandorin 18 hours ago [-]
looks great! and good that you decided on the one-time fee instead of a subscription. side question: what did you use to create these videos/gifs on the homepage that shows the app? They look really good!
AG342 18 hours ago [-]
Appreciate it, thanks. They’re just Vue components with CSS animations. Drop me an email and I’ll send you the code.
hello@traceapp.info
fandorin 3 hours ago [-]
thanks! will do
scosman 21 hours ago [-]
Those transcription times are fast fast. What model/library do you use?
AG342 20 hours ago [-]
Trace has two engines that you can choose from. The fast one uses NVIDIA's Parakeet-TDT 0.6b v3 model run through FluidAudio, which surprised me with how fast it was. There is also an accurate engine, which uses Whisper large-v3-turbo via WhisperKit, which is slower but holds up better on accents and jargon.
scosman 5 hours ago [-]
ahh yes. I'm using Whisper v3 turbo via WhisperKit as well. Will play with parakeet
chid 14 hours ago [-]
Will this be available for iOS?
nazca 1 days ago [-]
I've been looking for this exact thing!
overflowy 1 days ago [-]
Does it support multiple languages?
23 hours ago [-]
1 days ago [-]
triyambakam 17 hours ago [-]
My stack has been QuickTime and Assembly AI lol
kexelion 7 hours ago [-]
[flagged]
_onecookie 8 hours ago [-]
[dead]
ipotapov 2 days ago [-]
[dead]
beefmumbai 12 hours ago [-]
[dead]
2 days ago [-]
JohnBizBiz 2 days ago [-]
[flagged]
ZoneZealot 24 hours ago [-]
HN is not the place for LLM generated advertisements
satvikpendem 1 days ago [-]
I don't see how this is different to literally the dozens of other offline transcription apps, many open source even unlike this one.
hmokiguess 1 days ago [-]
can you share them? I'm looking for a decent open source one
I'm making my own, for personal use. I did a survey of many and they all (that I could find) skip the fundamentals.
The major issues that I've run into:
- Crash recovery. Most of these apps are incredibly buggy and crash all the time, taking the recorded audio with them. Macwhisper is incredibly bad at this.
- Disk space. Many of these apps save wav files to disk. After a few hours of meetings, you may end up with gigabytes eaten.
- Microphone bleed. People don't always use headphones, the system mic will pick up the speaker sounds, causing duplicate (approximately) transcriptions.
I've yet to find a solution that handles all these correctly, let alone having high quality transcriptions.
Anyway, most of these apps are built around https://github.com/FluidInference/FluidAudio, if anyone is curious. Their readme has a big list of similar apps as well.
I think I've got the other two bits covered. I pushed an update yesterday that adds active echo cancellation so that audio playing through the speakers (or leaky headphones) won't get transcribed twice if it is picked up by the microphone. It can be disabled in preferences, but it's on by default.
The disk space issue is one that I considered as well. By default, Trace deletes the actual audio recordings as soon as transcription is successfully completed, so the idea is you keep just the markdown transcript rather than the gigabytes of raw audio. If you want, there's a preference to disable the auto-deletion. There's a bit more on the support page here https://traceapp.info/support (search for "Auto-deletion of audio").
FluidAudio is a big part of this and is actually used in two places during a session. It runs the Parakeet EOU model for the instant recap (which isn't hugely accurate, but it's good enough for the job) and after the call it's also used to transcribe the recording, depending on which engine you've selected (Trace offers a fast and an accurate one). If the fast engine is selected, we use FluidAudio with the Parakeet-TDT 0.6b v3 model for transcription, which then goes through Pyannote and WeSpeaker for diarization. If the accurate engine is selected, we use WhisperKit with the Whisper large-v3-turbo model for transcription, and SpeakerKit for diarization.
- Journaling file structures (telegraph what you're about to write, then write it, then signal completion)
- memmap your important data structures to a file (they will be flushed to disk no matter how your app dies - short of a power loss)
- post-crash dump (put last-minute writers in a crash handler to save it to disk)
A journaling file structure is the most secure, because it's designed with the assumption that writing will eventually fail. memmapped structs are easy and cheap, and get you 99% of the way there (only power loss will lose your data). Crash-time writing is doable with a crash handler like KSCrash, but there are many ways an app can crash without triggering a crash handler (thermal kill, exceeding quota, memory jetsam, etc). You also need to write your data in a signal-safe manner.
- crash recovery: part one is use ADTS aac (even if process crashes, audio is saved up until it does). Part two is isolating the transcription/summaries in separate XPC services.
- disk space: AAC 64kbps mono soles it. Could use Opus for further reduction but both are small.
- speaker bleed: macOS voice isolation processing solves this. It’s a nightmare to get setup, but works great once done.
- library: using argmax SDK - by a bunch of ex-Apple on device AI folks.
It it wasn’t for CoreAudio, I’d say it was easy to make. Argmax, Whisper, and llama.cpp - wrapped in the right architecture, mostly just work.
I’m having fun nerding out on the details like custom vocabulary (get the names of the people in here meeting right), inferring speaker names from transcript, calendar integration, nice UI, etc.
Wait really? I honestly would have thought this was a solved problem by now, especially high quality transcriptions bit, just out of curiosity, is the problem that the quality isn't high enough?
If I had to guess, all of those apps are probably vibecoded, hence the variable quality.
Due to audio quality, transcription sometimes produces garbled output or understands something wrong. FluidVoice offers the option to use a LLM to „interpret“ the text to rescue garbled audio through context. Do you also plan to support something like this? This would be a great feature!
> Which languages does Trace support? English only, for now. Both transcription models, Fast and Accurate, are built for English audio. A recording in another language will still produce a transcript, but it won’t be accurate: the model maps whatever it hears onto English words, so the result comes out garbled rather than failing outright.
> If transcribing other languages matters to you, get in touch (see Contact below).
- record and separate two sides of the conversation
- save meetings in a simple transcription format in a local folder
- connect with my calendar (Outlook, Google Calendar) and name meeting transcripts accordingly
- for recurring meetings, append rather than create a new transcript
- let me label speaker voices and recognise those voices across different meetings
A tool that did all this and then ALSO built a knowledge base to let me RAG query my meetings would be the holy grail for me.
What's your diarization pipeline? Pyannote?
I'd taken a different approach that used a LLM clean-up pass to summarize and progressively compress the transcript for ultra-long content, but I like the idea of targeted "pay attention here" flags.
I just purchased it. What's the best way to give you feedback? (Do you want any?)
From the top of my head: - will the mic switch automatically when I am at my office? Or do I have to change settings every time? Maybe a preference of what's available + auto switch would be good. - I personally don't need the hot key. Menu bar icon would be fine. - Download the model is a long process. Put it into the installer, not into the bar on the bottom - Speaker correction would be amazing. If it could "Learn" the speakers based on voice. - Overall neat app. Good animations and UX
For the switching, do you mean if you hot-swap during a call? The mic should auto-switch if you've got System default selected, but feel free to give it a go and report back. If it doesn't do what we expect I can absolutely take a look at changing the behaviour.
Learning speakers is also on the to-do list.
P.S. Great choice in test audio. What a banger.
I’ll take a look as my top priority.
https://github.com/kouhxp/yapsnap
It's very cyberpunk eventually...the human operator of the console needs to be able to see and hear the screen and sound, there will always be an interface that can be adapted to a machine, however low-fidelity
I would be more willing to purchase if it was open source and I could build from source to try it first.
Not the subsidized subs
Yep, based in Sheffield, UK.
Minor complaint is that it steals Cmd-Shift-P (Firefox Private Browsing shortcut) by default.
Easy to change in the UI though, so no big deal.
I'll be wanting to find a good workflow to get the markdown transcripts into a git repo with file names that define a suitable sort order and also indicate what the meeting was. So would welcome your suggestions there. Not blocked of course, yo umake it easy to copy from clipboard or from the disk location and rename, but might be nice to have more control about where and how the .md lands.
I might email the support address on the off-chance that you're happy to have support/feature conversations like this. Thanks!
In my experience, medium is often the sweet spot for English accuracy vs speed, especially if following-up with a post-processing pass. The large options are all fine, but can severely slow it down. There are some speed checks on my website if you're curious (link not posted because I don't want to hijack another post's app).
hello@traceapp.info
Add "open source" if you wish as well.
https://handy.computer/
https://hn.algolia.com/?q=macOS+transcription