AI WITH Rithesh
Open-Source OCR, Audio Models, and Developer Tools with a focus on live implementation and stress-testing.
Nutrition Label
Rithesh delivers practical, code-first evaluations of emerging AI models, prioritizing live demonstrations in Google Colab over marketing slides. He rigorously stress-tests tools against difficult edge cases—such as handwriting or complex audio—often revealing failure modes that official benchmarks miss. While his technical testing is grounded and authentic, his video titles sometimes oversell a model's capabilities compared to his actual, more nuanced findings.
Strengths
- +
- +
- +
Notes
- !Titles sometimes promise revolutionary performance that the video's actual testing proves to be merely incremental.
- !Watch for specific hardware metrics like GPU RAM usage, which he consistently verifies during live runs.
Rating Breakdown
Breakdown across the key dimensions we rate. Methodology →
Recent Videos

Qwen 3.5 Just Dropped And It Claims to Outperform GPT-5.2, Gemini & Claude at 60% the Cost!

SaarasV3 Next Generation Indic Languages Speech Recognition model beats Gemini 3 Pro

Sarvam Vision SOTA OCR for 22 Indian Languages + English beats Frontier Models

moltbook : Social Network for AI Agents ABSOLUTE CHAOS

PaddleOCR-VL-1.5 New SOTA OCR Underwhelming?

DeepSeek OCR 2 — A Tiny 3B Model Beating the Best 🤯

Microsoft’s New ASR Transcribes 60-Minute Audio in One Shot—with Speakers & Timestamps Open Source

Qwen3-TTS Can Clone Any Voice and It’s Scarily Good Open Source Too

Pocket TTS CPU Only Lightweight TTS Voice Cloning

Inside the 2025 AI Shock: The Chinese Labs Outpacing the West

Microsoft VibeVoice-Realtime: Lightning-Fast TTS for Live Streams & Instant Speech from Any Model!

HunyuanOCR Best Free OCR from China blows away the competition Extensive Testing Colab Demo

Google Nano Banana Pro 🍌🍌 : Ultimate AI for image generation + editing

Gemini 3 New Era of Intelligence Begins — First Tests, Shock Results, and FREE Access

Kimi K2 Thinking vs Qwen 3 Max Thinking Battle of the Heavyweight Reasoning models
Why this rating
Evidence receipts showing why each dimension is rated the way it is.
“Let's start with our usual set of images... I want to do this simple tabular data.”[3:42] →
The creator demonstrates direct engagement by uploading his own diverse dataset to the tool's playground for live testing.
“Here is a small mistake over here. It is actually some 22 percentage... so it says 2 over here.”[6:37] →
The analyst identifies a specific data extraction error in a chart, proving he is verifying the output rather than just accepting it.
“Some other OCRs actually fail on this document altogether. For example, Chandra OCR I've seen... they fail because this language is Kannada.”[5:38] →
He contextualizes the performance by comparing it to specific competitors and their known limitations with Indian languages.
“I find this OCR on par with other OCRs like OLMOCR 2... even though they claim a really high percentage.”[8:17] →
The title claims the model 'blows away the competition,' but the video conclusion is much more measured, stating it is merely 'on par' with existing tools.