Projects

Reverse Wiktionary

Live demo · Blog post · GitHub repository

A multilingual reverse dictionary that uses Wiktionary data, semantic embeddings, and vector search to find words by meaning across thousands of languages.

Keywords: semantic search, vector databases, embeddings, Qdrant, language data

Built a multilingual semantic search system that maps natural-language descriptions to Wiktionary entries across more than 4,000 languages.
Generated offline sentence-transformer embeddings for Wiktionary definition data and restored the prepared vector snapshot into Qdrant for serving.
Implemented a FastAPI web service with a stable search API, Jinja/HTMX browser UI, Redis-backed session state, and language/POS filtering.
Added a navigable language-family taxonomy so users can browse and filter thousands of languages without relying only on text search.
Deployed the serving stack with Docker Compose on Azure, using Nginx, Cloudflare Tunnel, Qdrant, Redis, and repeatable restore/smoke-test scripts.

IPA to meSpeak

GitHub repository

A compiler/runtime project for converting Unicode IPA inputs into eSpeak/meSpeak phoneme strings. The system separates source-backed rule authoring from generated Rust scanner tables and exposes the runtime to Python for integration with Reverse Wiktionary.

Keywords: Rust, PyO3, text-to-speech, IPA, language data, validation systems

Built a Rust runtime scanner exposed through a Python extension with PyO3/maturin for in-process server integration.
Separated source-backed rule authoring, validation, table generation, and runtime execution into repeatable build stages.
Added validation tooling for encoder outputs, rule-source references, exact alternatives, and generated Rust table contracts.
Supported both Python-extension and standalone Rust executable integration modes for server use, probes, and shell pipelines.

MusicBrainz Query Server

GitHub repository

A Spark SQL query service for analyzing a local PostgreSQL mirror of the MusicBrainz database. The system exposes MusicBrainz analytics over gRPC, with Docker-based local development and generated protocol buffer interfaces for client/server communication.

Keywords: PySpark, PostgreSQL, gRPC, asynchronous jobs, Docker

Built a PySpark query engine for large-scale MusicBrainz analysis, including top-artist queries by genre and year range.
Exposed query results through a gRPC API for remote, language-agnostic access.
Implemented asynchronous query jobs with job IDs, status polling, and result retrieval for long-running Spark workloads.
Used concurrency-safe in-memory tracking with automatic TTL cleanup for completed or stale query jobs.
Added Docker and Docker Compose configuration for repeatable local development against a MusicBrainz PostgreSQL mirror.

Album Cover Art Image Classifier

GitHub repository

An end-to-end TensorFlow computer vision pipeline for predicting an album’s genre or release decade from cover art. The project combines dataset construction, image ingestion, transfer learning, fine-tuning, and evaluation artifacts in a reproducible training workflow.

Keywords: TensorFlow, computer vision, transfer learning, data pipelines

Trained ImageNet-pretrained backbones, including DenseNet201 by default, with a two-stage process: frozen-backbone training followed by fine-tuning.
Built a tf.data input pipeline for JPEG decoding, resizing, batching, prefetching, optional caching, and train-only shuffling/repetition.
Added stratified train/validation/test splitting, run configuration hashes, model checkpoints, training logs, confusion matrices, classification reports, and summary metadata.
Included a MusicBrainz and Cover Art Archive utility to query a local MusicBrainz PostgreSQL mirror, download front-cover JPEGs, and write per-genre metadata files.

Word-Hoarder

GitHub repository

An earlier Python project for building searchable local dictionaries from extracted Wiktionary data, focused on Latin, Ancient Greek, and Old English. The system supports local lookup, saved practice sets, and machine-readable Anki flashcard export.

Keywords: Wiktionary, trie indexes, Anki export, language tools

Parsed Wiktionary/Wiktextract JSON into normalized Python dictionary structures for local storage and lookup.
Organized language data into trie-backed indexes for fast prefix search and dictionary access.
Supported curation of saved practice sets for language learners.
Built export tools for generating HTML-formatted Anki flashcard import files from saved word collections.

SymPy Linear Algebra Calculator

GitHub repository

An interactive Python shell for working with SymPy matrix objects. The project provides a terminal-based workflow for creating, storing, manipulating, and rendering symbolic matrices.

Keywords: SymPy, symbolic computation, linear algebra

Managed named SymPy MutableDenseMatrix objects through a terminal menu interface.
Supported matrix creation from symbolic expressions or CSV-style row input.
Added operations for row reduction, determinants, matrix manipulation, symbolic substitution, and exact expression output.
Supported LaTeX-rendered output for reviewing and reusing matrix calculations.