A captions-based search engine and ingestion pipeline for YouTube videos.
Search videos by their captions, and generate shareable GIFs and memes instantly.
Yoogle is a pet project of mine, and has been something that I have wanted to build for my friend circle since 2016. We all have very similar tastes in humour and YouTube channels (echo chamber, anyone?), and as a result we frequently quote old videos to eachother.
I always wanted a way to search for these clips by quote, and have a meme or GIF (I refuse to pronouce it as jif) generated from it.
This project is finally going to realise those abilities for us.
I have a couple of goals for this project…
I am using TF more and more in my development life these days. I’m really enjoying having the ability to share, scrutenise and re-use infrastructure easily through code. I’m not a fan of CloudFormation, and really enjoy the ability to spin up infra from multiple providers in the same config set.
If you’re not aware, I am a co-founder of Communiti Labs; an AI-powered data analytics company. Working on the challenges there is mentally stimulating, to say the least. However, I have been longing for a bit of a break on my weekends. This is a nice way to still flex my development muscles, and experiment with some tools and techniques that I can take back into Communiti Labs.
By the way, if you’re interested in Communiti Labs, ask me about it!
While I am experienced with most of the technologies that I plan on using for this project, I may have not used this style. For instance, we use Clerk as our authentication platform for Communiti Labs, but mostly through server-side authentication. I’m excited to try using it for FE auth too.
I’m a massive Postgres fanboy, though I have never used the Neon DB platform; and with the latest changes from PlanetScale, there’s never been a better time to look for alternatives that offer a free tier.
And if you have any experience with Elysia, pnpm workspaces or Keysley… feel free to give me some feedback. This should be fun.
Sometimes you don’t need any more reason other than no one else has built it yet, and I can, so I must.
I wanted to keep myself on-task by giving this project some pretty simple tenets and milestones. I also wanted to set out an overview of the technologies used in the project, and some justification for the tools.
(Most) Technological choices for this projet have been made with these goals in mind…
This is a pet project, and is never going to make money. I have made SaaS application like Airproxy (a dead-simple proxy for Airtable APIs) which I am fine with spending dosh on services. This is not the same, and should incur minimal, if any, on-going costs.
Here’s hoping that this is useful for you and others! Both for the peace of mind, for good practise and just because I like too, I want to make sure it scales (if need be).
I want to detect errors in the connection between the backend and the frontend as fast as possible; ideally before it’s shipped. I’d rather not use a tool like Sentry to monitor errors, as I want to keep time on this project as minimal as possible.
As aforementioned, I want to keep this fairly simple. While some tech choices are made for learning opportunities, most will be because it’s tested, tried and true (and cheap!).
Purpose | Technology | Justification |
---|---|---|
File storage | S3 | Come on, what else? (yes, CF is decent) |
Website hosting | S3 | Dirt cheap. |
Functions | Lambda | Scales to zero, and is dead simple. |
Authentication | Clerk | It’s awesome, and free! |
DNS | Cloudflare | I already use it, and love it. |
IaC | Terraform | I want to get better at it for other projects. |
Database | Neon DB | Cheap, and Postgres! |
Embedding Model | Unsure | Still investigating the most cost efficient (read: ‘cheap’) tool here. |
Repo tooling | PNPM Workspaces | I want to be able to share types between frontend & backend |
*I’ve got nothing agains’t tRPC. On the contrary, I love it when I’m not using Next JS’ Server Actions. I wanted to try this kind of monorepo/project configuration.
2024-04-25
/apps/web
) projectyoogle-frontend
S3 bucketweb
app and upload to yoogle-frontend
bucketwww.yoogle.app
to the yoogle-frontend
bucket2024-0X-XX
/infra
dirBacklog
s3_user
iam users s3 permissions