<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:blog="https://jonesrussell.github.io/blog/ns"><channel><title>Build-in-Public on Web Developer Blog</title><link>https://jonesrussell.github.io/blog/tags/build-in-public/</link><description>Recent content in Build-in-Public on Web Developer Blog</description><image><title>Web Developer Blog</title><url>https://jonesrussell.github.io/blog/images/og-default.png</url><link>https://jonesrussell.github.io/blog/images/og-default.png</link></image><generator>Hugo -- 0.163.3</generator><language>en-us</language><lastBuildDate>Thu, 25 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://jonesrussell.github.io/blog/tags/build-in-public/feed.xml" rel="self" type="application/rss+xml"/><item><title>The ingest side of a sovereign language platform</title><link>https://jonesrussell.github.io/blog/minoo-ingest-sovereign-language-platform/</link><pubDate>Thu, 25 Jun 2026 00:00:00 +0000</pubDate><guid>https://jonesrussell.github.io/blog/minoo-ingest-sovereign-language-platform/</guid><category>ai</category><blog:tag>waaseyaa</blog:tag><blog:tag>language-tech</blog:tag><blog:tag>ai-agents</blog:tag><blog:tag>build-in-public</blog:tag><description>How an Elder&amp;rsquo;s whiteboard video becomes a searchable Anishinaabemowin lesson, on infrastructure the community owns end to end.</description><content:encoded><![CDATA[<p>Ahnii!</p>
<p>I just shipped the ingest side of <a href="https://minoo.live">Minoo</a>, the Anishinaabemowin language platform I am building. The short version: an Elder posts a video of himself holding a whiteboard with a word on it, and that teaching becomes a published, searchable lesson, with a human reviewing every step and the community owning the whole stack. This post walks through how it actually works and the decisions underneath it.</p>
<h2 id="the-problem">The problem</h2>
<p>Anishinaabemowin teaching happens constantly, but it is scattered. Elders share words on Facebook, in notebooks, in classrooms, and almost none of it flows into anything a learner can search tonight. The few deep digital resources that do exist are owned by institutions, not by the communities whose language they hold.</p>
<p>So I set two goals that have to be true at the same time. Turn everyday teaching into structured, reusable data. And keep that data under community control end to end. Not control as a policy promise bolted on afterward, but control built into where the data lives, who can read it, and who can change it.</p>
<h2 id="the-pipeline">The pipeline</h2>
<p>The source material is real. Steven Bennett, an Elder from Sagamok, posts short videos holding up a whiteboard with one word, the Anishinaabemowin on top and the English gloss below. The pipeline turns one of those into a lesson in four stages.</p>
<p><strong>Ingest.</strong> A reel comes in, by upload or through a URL importer backed by a swappable media-fetcher interface. The system pulls a keyframe and the audio, stages the media, and creates a draft tagged with its community provenance.</p>
<p><strong>Vision.</strong> The keyframe goes to a vision model through the framework&rsquo;s provider abstraction, which returns a small JSON object: the Ojibwe and the English read straight off the whiteboard. Today that provider is Claude vision. The binding is swappable by config, and the sovereign-stack goal is a local model before any public beta.</p>
<p><strong>Transcribe, Curate, Publish.</strong> Each of these is a human gate, not an automated hop. The model drafts, a person confirms. Curate promotes the entry into the dictionary. Publish puts it on the live site inside a lesson. Nothing reaches the public without a human pass.</p>
<p>The design principle is that the model is an assistant that fills a draft, never an authority that publishes.</p>
<h2 id="language-tags-bcp-47-three-layers">Language tags: BCP 47, three layers</h2>
<p>One of the core decisions was how to tag the language so it can federate across the 21 Robinson Huron Treaty nations without flattening their dialects into one another. I use <a href="https://www.rfc-editor.org/info/bcp47">BCP 47</a> with three layers and a fallback chain.</p>
<p>There is the macrolanguage, <code>oj</code>, always displayed with the autonym Anishinaabemowin rather than the ISO exonyms. There is an optional dialect layer in the middle. And there is community provenance as a private-use subtag, for example <code>oj-x-sagamok</code>.</p>
<p>Translation memory keys on the full tag, never on a dialect-only code, so each community keeps its own granularity. Dialect groupings (Nishnaabemwin spans two ISO codes) are derived from the community tag, not stored as the source of truth. A tag like <code>oj-x-sagamok</code> resolves <code>oj-x-sagamok</code> to <code>oj</code> to <code>en</code>, which needed a small fix to the framework&rsquo;s i18n fallback chain so it would resolve private-use subtags at all. That fix shipped upstream.</p>
<h2 id="the-translation-side">The translation side</h2>
<p>Alongside transcription there is a translation memory exposed at <code>/api/lang</code>: exact match first, then fuzzy, then log the gap when there is no entry yet, so the backlog fills itself as it gets used.</p>
<p>To seed it with real demand instead of guesses, I crawled the public English interface strings off the 21 RHT nation websites and ranked them by how many sites each one appears on. The result is a demand-ordered list of the words communities actually put on their own sites, things like Governance, Education, Membership, Chief and Council, a few hundred of them, waiting on speaker-verified translations. That ranked list is the backlog, highest demand first.</p>
<h2 id="sovereign-by-construction">Sovereign by construction</h2>
<p>The part I care about most: this runs on infrastructure the community controls. The app is a PHP service built on the <a href="https://github.com/waaseyaa/framework">Waaseyaa framework</a>, in Docker behind Caddy, on a Raspberry Pi the community runs, not on someone else&rsquo;s cloud. The corpus stays local. The AI provider is swappable by config. The model assists, it does not own the language, the data, or the hosting.</p>
<p>That boundary shows up in the API too. The public <code>/api/lang</code> surface is read-only and validated, returning a 422 on a malformed tag rather than guessing. The admin pipeline and the corpus behind it are staff-gated. Reading is open. The language itself is governed.</p>
<h2 id="what-is-honest-about-it">What is honest about it</h2>
<p>Build-in-public should include the rough parts. Pulling video off Facebook is login-walled, so reliable ingest is upload-first for now. The vision provider is a hosted model, which is the interim and not the destination. And &ldquo;published in the admin&rdquo; has to actually mean &ldquo;visible on the public site,&rdquo; which is exactly the kind of seam you only find by walking the whole pipeline on camera. Finding those is the point of demoing it for real.</p>
<p>The language has been taught this way for a long time, one word at a time, by people willing to stand in front of a camera and share it. The software&rsquo;s only job is to catch those teachings and hand them back to the community in a form a learner can use, without taking ownership of them along the way.</p>
<p>Watch the walkthrough: <a href="https://youtu.be/zfx7CHs_Ec0">youtu.be/zfx7CHs_Ec0</a>. The framework is open source at <a href="https://github.com/waaseyaa/framework">github.com/waaseyaa/framework</a> and on <a href="https://packagist.org/packages/waaseyaa/framework">Packagist</a>.</p>
<p>Baamaapii</p>
]]></content:encoded></item></channel></rss>