Skip to content

Commit 5456806

Browse files
committed
chore: update documentation and dependencies for CLI improvements
- Improved error handling in the CLI entry point to catch configuration errors gracefully. - Refactored imports in various files to align with updated schema paths. - Adjusted task execution methods to ensure proper workflow handling.
1 parent c6db4b3 commit 5456806

19 files changed

Lines changed: 190 additions & 2537 deletions

ARCHITECTURE.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Index (discover CIKs)
1919
**Goal:** Discover which companies (CIKs) have new filings.
2020

2121
**CLI commands:**
22+
2223
- `daily-index [date]` — fetch a single day's index
2324
- `quarterly-index [date]` — fetch a single quarter's index
2425
- `quarterly-index-range [start] [end]` — fetch a range of quarters
@@ -36,6 +37,7 @@ Index (discover CIKs)
3637
**Result:** The `cik_last_update` table now knows which CIKs have activity and when their most recent filing was.
3738

3839
**Key files:**
40+
3941
- `src/task/index/FetchDailyIndexTask.ts`
4042
- `src/task/index/FetchQuarterlyIndexTask.ts`
4143
- `src/task/index/StoreCikLastUpdatedTask.ts`
@@ -48,6 +50,7 @@ Index (discover CIKs)
4850
**Goal:** For each CIK with new activity, fetch its full company submission data (metadata + list of all filings).
4951

5052
**CLI commands:**
53+
5154
- `submissions <cik>` — fetch a single company's submissions
5255
- `update-all-submissions` — batch process all CIKs with new activity
5356

@@ -74,6 +77,7 @@ Index (discover CIKs)
7477
**Result:** The `filings` table now contains a row for every filing by every active CIK, including the `form` type and `primary_doc` filename needed to fetch the actual document.
7578

7679
**Key files:**
80+
7781
- `src/task/submissions/FetchSubmissionsTask.ts`
7882
- `src/task/submissions/StoreSubmissionsTask.ts`
7983
- `src/task/submissions/StoreSubmissionFilingsTask.ts`
@@ -87,6 +91,7 @@ Index (discover CIKs)
8791
**Goal:** Fetch individual filing documents from SEC Archives, parse their XML/HTML content into structured data, and store the results.
8892

8993
**CLI commands:**
94+
9095
- `form <cik> <form> [docid]` — process forms for a single company
9196
- `update-all-forms <form1,form2,...>` — batch process all unprocessed filings of given form types
9297

@@ -100,9 +105,11 @@ Index (discover CIKs)
100105

101106
**Step 2 — Fetch the document:**
102107
`SecFetchAccessionDocTask` downloads the document from:
108+
103109
```
104110
https://www.sec.gov/Archives/edgar/data/{cik}/{accession-no-dashes}/{filename}
105111
```
112+
106113
Results are cached to disk (filings are immutable once submitted).
107114

108115
**Step 3 — Parse and store:**
@@ -114,6 +121,7 @@ Index (discover CIKs)
114121
**Result:** Structured, normalized data from the filing is stored across multiple tables (entities, persons, companies, addresses, phones, investment offerings, etc.).
115122

116123
**Key files:**
124+
117125
- `src/task/forms/ProcessAccessionDocFormTask.ts`
118126
- `src/task/forms/SecFetchAccessionDocTask.ts`
119127
- `src/task/forms/FetchAndStoreFormsTask.ts`
@@ -134,6 +142,7 @@ src/sec/forms/all-forms.ts
134142
```
135143

136144
Each form category directory (e.g., `exempt-offerings/`, `insider-trading/`) exports:
145+
137146
- A `FORM_NAMES_MAP` array of `[formName, FormClass]` tuples
138147
- A `FORM_NAMES` array of just the form name strings
139148

@@ -148,11 +157,13 @@ The storage layer uses a **repository pattern** with TypeBox schemas for runtime
148157
### Repository Pattern
149158

150159
Each domain has:
160+
151161
- **Schema** (`*Schema.ts`) — TypeBox schema defining the table structure, primary keys, and a DI token
152162
- **Repo** (`*Repo.ts`) — domain-specific class wrapping one or more repositories, providing save/query methods
153163
- **Normalization** (`*Normalization.ts`, optional) — functions to clean and standardize input data (e.g., address parsing, name splitting, hash generation)
154164

155165
Repos get their underlying storage via dependency injection:
166+
156167
- **Production:** `SqliteTabularRepository` registered in `src/config/DefaultDI.ts`
157168
- **Testing:** `InMemoryTabularRepository` registered in `src/config/TestingDI.ts`
158169

@@ -189,7 +200,7 @@ Define the TypeBox schema that mirrors the XML structure of the SEC filing.
189200
// src/sec/forms/<category>/Form_X.schema.ts
190201

191202
import { Type, Static } from "typebox";
192-
import { /* reusable types */ } from "../FormSchemaUtil";
203+
import {} from /* reusable types */ "../FormSchemaUtil";
193204

194205
// Define sub-types for nested XML elements
195206
const SOME_NESTED_TYPE = Type.Object({
@@ -207,12 +218,13 @@ export type FormX = Static<typeof FormXSchema>;
207218

208219
// XML wrapper schema (matches the root XML element)
209220
export const FormXSubmissionSchema = Type.Object({
210-
edgarSubmission: FormXSchema, // or whatever the root XML tag is
221+
edgarSubmission: FormXSchema, // or whatever the root XML tag is
211222
});
212223
export type FormXSubmission = Static<typeof FormXSubmissionSchema>;
213224
```
214225

215226
**Key points:**
227+
216228
- Use `Type.Array()` for elements that can repeat in XML — the base `Form` class uses `extractArrayPaths()` to automatically detect these from the schema and configure the XML parser's `isArray` callback
217229
- Use `Type.Optional()` for elements that may be absent
218230
- Import shared types from `FormSchemaUtil.ts` (e.g., `TRUE_FALSE_LIST`, `CIK_TYPE`, `STATE_COUNTRY_CODE`)
@@ -232,7 +244,7 @@ import { FormX, FormXSchema, FormXSubmission, FormXSubmissionSchema } from "./Fo
232244
export class Form_X extends Form {
233245
static readonly name = "Human-Readable Form Name";
234246
static readonly description = "Brief description of what this form is";
235-
static readonly forms = ["X", "X/A"] as const; // form name and amendment variant
247+
static readonly forms = ["X", "X/A"] as const; // form name and amendment variant
236248

237249
static async parse(form: (typeof Form_X.forms)[number], xml: string): Promise<FormX> {
238250
if (!Form_X.forms.includes(form)) {
@@ -250,6 +262,7 @@ export type { FormX };
250262
```
251263

252264
**How parsing works:**
265+
253266
1. `Form.getParser(schema)` creates an `XMLParser` (from `fast-xml-parser`) configured with `isArray` callbacks derived from the TypeBox schema — any field defined as `Type.Array()` will be treated as an array even if the XML has only one element
254267
2. `parser.parse(xml)` converts XML to a plain JS object
255268
3. `Value.Convert(schema, obj)` uses TypeBox to coerce values to the correct types (e.g., string `"123"` to number `123`)
@@ -293,6 +306,7 @@ export async function processFormX({
293306
```
294307

295308
**Patterns from Form D:**
309+
296310
- Instantiate repos as needed (they get their storage via DI)
297311
- Use `"form-x:role-name"` relation names for junction records to distinguish data sources
298312
- Detect companies in person fields with `hasCompanyEnding()` from `CompanyNormalization`
@@ -362,7 +376,7 @@ import { PersonRepo } from "../../../storage/person/PersonRepo";
362376

363377
describe("Form_X", () => {
364378
beforeEach(() => {
365-
resetDependencyInjectionsForTesting(); // resets all repos to in-memory
379+
resetDependencyInjectionsForTesting(); // resets all repos to in-memory
366380
});
367381

368382
it("should parse and store form data", async () => {

0 commit comments

Comments
 (0)