Reverse Wiktionary

(Coming Soon) Semantic reverse dictionary project planned for the secondary domain reverse-wiktionary.com. Intended topics include vector search, lexical data modeling, Qdrant, and practical search UX.

Keywords: semantic search, vector databases, embeddings, search UX

MusicBrainz Query Server

GitHub repository

A Spark SQL query service for analyzing a local PostgreSQL mirror of the MusicBrainz database. The system exposes MusicBrainz analytics over gRPC, with Docker-based local development and generated protocol buffer interfaces for client/server communication.

Keywords: PySpark, PostgreSQL, gRPC, asynchronous jobs, Docker

  • Built a PySpark query engine for large-scale MusicBrainz analysis, including top-artist queries by genre and year range.
  • Exposed query results through a gRPC API for remote, language-agnostic access.
  • Implemented asynchronous query jobs with job IDs, status polling, and result retrieval for long-running Spark workloads.
  • Used concurrency-safe in-memory tracking with automatic TTL cleanup for completed or stale query jobs.
  • Added Docker and Docker Compose configuration for repeatable local development against a MusicBrainz PostgreSQL mirror.

Album Cover Art Image Classifier

GitHub repository

An end-to-end TensorFlow computer vision pipeline for predicting an album’s genre or release decade from cover art. The project combines dataset construction, image ingestion, transfer learning, fine-tuning, and evaluation artifacts in a reproducible training workflow.

Keywords: TensorFlow, computer vision, transfer learning, data pipelines

  • Trained ImageNet-pretrained backbones, including DenseNet201 by default, with a two-stage process: frozen-backbone training followed by fine-tuning.
  • Built a tf.data input pipeline for JPEG decoding, resizing, batching, prefetching, optional caching, and train-only shuffling/repetition.
  • Added stratified train/validation/test splitting, run configuration hashes, model checkpoints, training logs, confusion matrices, classification reports, and summary metadata.
  • Included a MusicBrainz and Cover Art Archive utility to query a local MusicBrainz PostgreSQL mirror, download front-cover JPEGs, and write per-genre metadata files.

Word-Hoarder

GitHub repository

An earlier Python project for building searchable local dictionaries from extracted Wiktionary data, focused on Latin, Ancient Greek, and Old English. The system supports local lookup, saved practice sets, and machine-readable Anki flashcard export.

Keywords: Python, Wiktionary, trie indexes, Anki export, language tools

  • Parsed Wiktionary/Wiktextract JSON into normalized Python dictionary structures for local storage and lookup.
  • Organized language data into trie-backed indexes for fast prefix search and dictionary access.
  • Supported curation of saved practice sets for language learners.
  • Built export tools for generating HTML-formatted Anki flashcard import files from saved word collections.

SymPy Linear Algebra Calculator

GitHub repository

An interactive Python shell for working with SymPy matrix objects. The project provides a terminal-based workflow for creating, storing, manipulating, and rendering symbolic matrices.

Keywords: Python, SymPy, symbolic computation, linear algebra

  • Managed named SymPy MutableDenseMatrix objects through a terminal menu interface.
  • Supported matrix creation from symbolic expressions or CSV-style row input.
  • Added operations for row reduction, determinants, matrix manipulation, symbolic substitution, and exact expression output.
  • Supported LaTeX-rendered output for reviewing and reusing matrix calculations.