Building Ridgeline, part 1: I have too many dashboards
I run a handful of side projects. A couple of open-source tools, a few smaller things, some experiments. Each one has its own little ecosystem of data: Umami tells me who visits, Google Search Console tells me what ranks, GitHub tells me who stars, Hacker News occasionally tells me someone posted a link.
None of these systems talk to each other.
Every Monday I open Umami in one tab, GSC in another, check GitHub traffic, maybe search Hacker News. Each dashboard answers one question about one project. None of them answer the question I actually care about: across everything I run, what’s working and what isn’t?
I’ve been doing this for years.
The tools that almost work
The data-engineering world has plenty of options if you’re willing to run infrastructure. Airbyte can sync data from hundreds of sources into a warehouse. Fivetran does the same thing but managed and expensive. Segment routes events. Metabase and Superset visualize whatever’s in your database.
The problem is the stack. To run Airbyte you need Docker, a Postgres database, and ideally a separate warehouse. Fivetran is priced for companies, not individuals. Segment is a CDP for marketing teams, not a solo dev checking if anyone read their blog post.
PostHog is closer to what I want. Self-hosted, open-source, generous free tier. But it’s product analytics for one app at a time. I don’t need funnels and retention curves. I need “did anyone notice that thing I shipped last Tuesday, and if so, where did they come from?”
Datadog is in a different universe entirely. Infra monitoring, APM, log management. Incredible product. Also $50K a year for a mid-size company. I’m not a mid-size company. I’m a person with a Mac mini and some side projects.
Nothing I’ve found is built for someone who runs their own Umami, their own Immich, their own llama.cpp instance, and wants one tool that pulls usage data from all of it into a place where they can ask questions in SQL.
What I actually want
I keep coming back to the same set of constraints:
One binary. Not a Docker stack, not three services and a reverse proxy. One thing I download, run, and forget about. Like how SQLite is one file and DuckDB is one binary. That shape.
No cloud. The data stays on my machine. I already self-host everything else; I’m not going to ship my analytics data to someone else’s servers to ask “how many pageviews did my blog get?”
Config-driven. I want to describe my setup in a YAML file. Here are my products, here are the sources for each one, here’s where to write the data. Then ridgeline sync does the rest.
SQL is the interface. Not a custom query language, not a drag-and-drop dashboard builder, not a GraphQL API. SQL. I know SQL. DuckDB speaks SQL. My data lands in Parquet files. SELECT * FROM read_parquet('./out/*/*.parquet') just works.
Pluggable. I don’t want to wait for someone to build a connector for the niche API I use. I want to write a 30-line Python script that speaks JSON lines to stdin/stdout and have the system treat it like any other source.
Multi-product. One install covers everything I run. Shared connectors, per-product data, one config file.
CLI-first. This one matters more than it looks. A CLI is the simplest interface that both humans and AI agents can use from day one. I already manage my Home Assistant setup through Claude Code by giving it SSH access and two shell scripts. The same pattern works here: ridgeline sync, ridgeline query "SELECT ...", ridgeline status are commands an AI agent can run without an SDK, without browser automation, without an API wrapper. The CLI is the API. Any tool that can call a shell command can integrate with Ridgeline today, no plugin system required.
So I’m building it
Ridgeline is a single Go binary. You point it at a ridgeline.yaml that describes your products and their data sources, and it does the ETL: extract records from each source, write them to Parquet (or JSON lines), checkpoint state in SQLite so it knows where it left off, and expose a ridgeline query command that runs SQL against the output via an embedded DuckDB.
No Docker. No Postgres. No Redis. No cloud account. The state is one SQLite file. The output is Parquet files in a directory. You can read them with pandas, pyarrow, the DuckDB CLI, or ridgeline query itself.
Today it has native connectors for Hacker News (via Algolia’s public API) and Umami (with both API-key and username/password login). It has an external runner that lets you wire any executable that speaks JSON lines as a connector, so if your source isn’t natively supported, you write a script and point the config at it. Credentials are stored in AES-256-GCM encrypted entries in the same SQLite database.
$ ridgeline sync --config ridgeline.yaml
loaded ridgeline.yaml
state: ./ridgeline.db
myapp/hackernews: 47 records, 2 states saved
myapp/umami: 312 records, 1 state saved
done: 359 records total
It’s early. The roadmap has Google Search Console, a Bubble Tea TUI, goreleaser for Homebrew distribution, and a bunch more connectors. But the ETL core works end to end, the state survives restarts, the credential store is encrypted, and the query layer reads everything DuckDB can read.
Why I’m writing about it now
I’m building this in public. The repo is at github.com/xydac/ridgeline, MIT licensed, every commit compiles and tests pass before it gets pushed. The design decisions are documented, the connector interface is stable enough to write against, and the external plugin protocol has a spec.
I’m writing about it now because the project just crossed the threshold from “proof of concept” to “thing I actually use.” I sync my own Umami data through it. I query Hacker News mentions of my projects through it. The state checkpoints work, the Parquet output is real, and ridgeline query is genuinely faster than opening three browser tabs.
This is part 1 of a series. Part 2 covers the architecture: why DuckDB and not Postgres, why Parquet and not a database, why JSON lines over stdin/stdout for external connectors, and why the pipeline flushes data before saving state (and what happens if it doesn’t). Part 3 is a hands-on tutorial: clone, build, sync, query. I’ll link them here as they go up.
If you run multiple projects and you’re tired of opening six dashboards every Monday morning, the repo is here. It’s a Go binary. It’s one file. It’s yours.