Skip to content

Instantly share code, notes, and snippets.

View justinabrahms's full-sized avatar

Justin Abrahms justinabrahms

View GitHub Profile

Okay, we're going to make a delivery plan based on this system I only kinda understand. Here's where we are.

https://thrivemarket.atlassian.net/wiki/spaces/EN/pages/4080009244/Future+autoship+how+we+might+get+there the big idea https://thrivemarket.atlassian.net/wiki/spaces/EN/pages/4097310741/Autoship+Migration+Phase+1+Airflow+DAGs+to+Temporal+Workflows - the current plan

My misconception: I had a misunderstanding about the autoship thing we were building. Right now, we've got a bunch of DAGs that were migrated over to temporal workflows. Those workflows call into the CMS API. The logic is that every ~20 minutes, we wake up.. query for a list of users that have things that need done.. then execute the activities (a temporal concept which is ~= function) to process those things.

I thought we had a single long lived workflow per customer, with a run per "schedule", so you'd see the t-3, prefill, etc events under a single "Justin's August 14th order" type thing. That's apparently phase 3, whi

first under-load deploy of catalog happened today. Shot up to 100+ pods. We got a bunch of errors. Deploy happened b/c tf/reliability change (aka zero code change). Why?

⏺ Let me investigate. I'll look at the deployment configuration and recent changes to understand what might have caused this.

⏺ Task(Investigate deploy scaling issue) ⎿ The user doesn't want to proceed with this tool use. The tool use was rejected (eg. if it was a file edit, the new_string was NOT written to the file). STOP what you are doing and wait for the user to tell you how to proceed.

coworker suggested: After the E2E test passes on 1 pod, the rollout immediately shifts to 100% traffic weight and scales all pods at once. With 36 replicas currently running, this means ~36 new pods come up while 36 old pods are terminated.

That seems weird/bad. This is definitely a k8s thing, not a java thing.

https://github.com/ThriveMarket/autoship-service/pull/14 Review those specs and consider how autoship works in this repository. What seems to be missing?

⏺ mcp__github__pull_request_read(...) ⏺ mcp__github__pull_request_read(...) ⏺ Glob(dags/autoship/**/*.py) ⎿ /Users/justinabrahms/src/github.com/thrivemarket/be-airflow/dags/autoship/autoship_single_email_process.py /Users/justinabrahms/src/github.com/thrivemarket/be-airflow/dags/autoship/autoship_prefill_recs_master_cart.py /Users/justinabrahms/src/github.com/thrivemarket/be-airflow/dags/autoship/alert_autoship_slow_queries.py /Users/justinabrahms/src/github.com/thrivemarket/be-airflow/dags/autoship/autoship_schedule_single_cart_process_items.py /Users/justinabrahms/src/github.com/thrivemarket/be-airflow/dags/autoship/alert_autoship_anomaly_detection.py

https://github.com/ThriveMarket/autoship-service/pull/14 Review those specs and consider how autoship works in this repository. What seems to be missing?

⏺ I'll fetch the PR details and explore how autoship works in this repository.

⏺ mcp__github__pull_request_read(...) ⏺ mcp__github__pull_request_read(...) ⏺ Task(Explore autoship in codebase) ⎿ [{'type': 'text', 'text': '[{"sha":"c454a8188d53935d3c2bad0f17713413bc90ef50","filename":"openspec/project.md","additions":76,"deletions":0,"changes":76,"status":"added","patch":"@@ -0,0 +1,76 @@\n+# Project Context\n+\n+## Purpose\n+\n+Autoship Service manages Thrive Market's subscription-based recurring order\n+system. It enables members to automatically receive products at configurable\n+intervals with per-item frequency control.\n+\n+## Tech Stack\n+\n+- Python (primary language)\n+- Temporal (workflow orchestration)\n+- PostgreSQL (data persistence)\n+- Integration with Magento 1 (Thrive_Autoship module) during migration\n+\n+## Project C

https://github.com/ThriveMarket/autoship-service/pull/14 Review those specs and consider how autoship works in this repository. What seems to be missing?

⏺ mcp__github__pull_request_read(...) ⏺ mcp__github__pull_request_read(...) ⏺ Glob(dags/autoship/**/*.py) ⎿ /Users/justinabrahms/src/github.com/thrivemarket/be-airflow/dags/autoship/autoship_single_email_process.py /Users/justinabrahms/src/github.com/thrivemarket/be-airflow/dags/autoship/autoship_prefill_recs_master_cart.py /Users/justinabrahms/src/github.com/thrivemarket/be-airflow/dags/autoship/alert_autoship_slow_queries.py /Users/justinabrahms/src/github.com/thrivemarket/be-airflow/dags/autoship/autoship_schedule_single_cart_process_items.py /Users/justinabrahms/src/github.com/thrivemarket/be-airflow/dags/autoship/alert_autoship_anomaly_detection.py

don't run this yet, but know about it. uv run audit-logs.py --enterprise thrivemarket --org ThriveMarket --from-date 2025-10-01 --to-date 2025-10-31

First, I want that to have a --quiet mode of operation that doesn't dump the giant preamble. Do that and commit. Then run the command above w/ this quiet mode. What you'll see is there are things logged which are outside of the from/to dates. I think that's a bug and I want you to address it.

⏺ Let me first read the audit-logs.py file to understand its structure.

⏺ Read(/Users/justinabrahms/src/github.com/thrivemarket/sox-automation/github-audit/audit-logs.py) ⎿ 1→#!/usr/bin/env python3 2→# /// script

use openspec. Make a pain relief timer. It should run in a browser, plain js/css. I enter when the last time I took ibuprofen or tylenol. It calculates the next available time to take one. I click the button when I've adminstered it. It resets the clock. It will allow entering a time directly if I forget to click the button. Plays a loud sound when the timer is off. Should be hosted on github pages.

⏺ I'll create a pain relief timer app. Let me start by checking the current state of the repository and then build this out.

⏺ Bash(ls -la) ⏺ Bash(git remote -v 2>/dev/null || echo "No git repo yet") ⎿ No git repo yet

⎿ drwxr-xr-x justinabrahms staff 64 B Mon Jan 26 17:29:28 2026 . drwxr-xr-x justinabrahms staff 448 B Mon Jan 26 17:29:28 2026 ..

we're going to migrate some capybara tests to integration style tests. The end state is a pull request for each file.

We'll start by surveying the tests to be migrated. For them, we'll inspect which object factories they'll need to work. If the necessary factories don't exist, we'll create them. Then, we'll use them in our tests to author the tests correctly.

relevant docs live at docs/factory-testing.md and docs/testing-api-responses.md

The files in question are in the capybara-tests/ folder

api_cart_payment_check.feature

the /rop/quiz/new endpoint is erroring since our http status codes change on master branch. Can you poke around newrelic to see if you can find the cause? Started just after 11:45am local

⏺ I'll use the NewRelic skill to investigate this error.

⏺ Skill(...) ⎿ Launching skill: newrelic

⏺ Let me query NewRelic to investigate the /rop/quiz/new endpoint errors. I'll start by looking for error spans around that time.

⏺ Bash(newrelic nrql query --query "SELECT count(*) FROM Span WHERE name LIKE '%quiz...)

We're going to be debugging a k8s thing in trellis. We've deployed to the dev cluster for 'magento'. I think if we hit an endpoint on it.. we'll get a 503. I think the root cause will be b/c of missing envvars or secrets. I think I've created them, but haven't populated them.. but hoping to get an actual error message to start from. Can you investigate?

⏺ I'll investigate the magento deployment in the dev cluster. Let me use the trellis skill to help with this.

⏺ Skill(...) ⎿ Launching skill: trellis

⏺ Let me investigate the magento deployment in the dev cluster. I'll check pods, logs, and events to find the error.

⏺ Bash(kubectl get pods -n magento --context dev-eks)