I keep a lot of notes in markdown files, and I’d like an LLM to assist.
I regularly use Open WebUI with with inference routed through huggingface. Open WebUI kind of has this functionality like you can upload a markdown file and prompt it to improve it in whatever way, but of course that’s a fairly clunky workflow.
I really want something built into the editor, that can use RAG to consider other files in context.
I also don’t want to be locked in to a specific LLM or provider, I’d like to be able to link it to OpenRouter or similar.


I’m out of my depth here but trying to piece this together.
If I understand correctly the first component of this workflow is to use an inference API (like huggingface or so) to convert each file from your notes into semantic vectors and store them in chromadb, ready to be used in future prompts.
Are you using any software to do that or have you written some code to load the files from disk, call the API, and store the response?
So my notes are just a directory of thousands of MD files. I wrote some code that watches the files in this dir to see when anything changes and when it does it will do the following:
My ai agent is a separate component (just another docker container, with the notes dir mounted as a volume) using pi which uses an LLM via remote api (openrouters). I have a custom tool for that agent where the agent can write a text search that returns the top n most semantically similar chunks of text (along with some metadata notably the filename and line numbers where this chunk came from). The vectors are never seen by the LLM they exists purely for the search ranking. The agent also has file editing capabilities so it can then go read that file or modify that file like any coding agent. The agent also has a tool to send messages via matrix.
I have a service that watches a specific matrix chat and if a message is recieved does 1 of 2 things: Option 1: if an agent is already running it will pass the message into the existing agent as a user message. Option 2: if no agent is running it will start a new agent instance and pass the message into the agent as the user message. This agent manager service is the same docker image that runs the agent. This is the same docker container that runs the agent when the agent finishes running it takes and final agent output and sends that to the matrix chat as the agents matrix user.
I got an agent to write all this code so its probably dodgy as shit with all sorts of security holes hence I haven’t published it on github (security through obscurity etc etc lol).
I also have a searxng instance running accessible to the agent via MCP. And I have a chrome MCP allowing the agent to do things from inside a virtual chrome browser.