You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chore: update documentation and dependencies for CLI improvements
- Improved error handling in the CLI entry point to catch configuration errors gracefully.
- Refactored imports in various files to align with updated schema paths.
- Adjusted task execution methods to ensure proper workflow handling.
Copy file name to clipboardExpand all lines: ARCHITECTURE.md
+18-4Lines changed: 18 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,7 @@ Index (discover CIKs)
19
19
**Goal:** Discover which companies (CIKs) have new filings.
20
20
21
21
**CLI commands:**
22
+
22
23
-`daily-index [date]` — fetch a single day's index
23
24
-`quarterly-index [date]` — fetch a single quarter's index
24
25
-`quarterly-index-range [start] [end]` — fetch a range of quarters
@@ -36,6 +37,7 @@ Index (discover CIKs)
36
37
**Result:** The `cik_last_update` table now knows which CIKs have activity and when their most recent filing was.
37
38
38
39
**Key files:**
40
+
39
41
-`src/task/index/FetchDailyIndexTask.ts`
40
42
-`src/task/index/FetchQuarterlyIndexTask.ts`
41
43
-`src/task/index/StoreCikLastUpdatedTask.ts`
@@ -48,6 +50,7 @@ Index (discover CIKs)
48
50
**Goal:** For each CIK with new activity, fetch its full company submission data (metadata + list of all filings).
49
51
50
52
**CLI commands:**
53
+
51
54
-`submissions <cik>` — fetch a single company's submissions
52
55
-`update-all-submissions` — batch process all CIKs with new activity
53
56
@@ -74,6 +77,7 @@ Index (discover CIKs)
74
77
**Result:** The `filings` table now contains a row for every filing by every active CIK, including the `form` type and `primary_doc` filename needed to fetch the actual document.
Results are cached to disk (filings are immutable once submitted).
107
114
108
115
**Step 3 — Parse and store:**
@@ -114,6 +121,7 @@ Index (discover CIKs)
114
121
**Result:** Structured, normalized data from the filing is stored across multiple tables (entities, persons, companies, addresses, phones, investment offerings, etc.).
115
122
116
123
**Key files:**
124
+
117
125
-`src/task/forms/ProcessAccessionDocFormTask.ts`
118
126
-`src/task/forms/SecFetchAccessionDocTask.ts`
119
127
-`src/task/forms/FetchAndStoreFormsTask.ts`
@@ -134,6 +142,7 @@ src/sec/forms/all-forms.ts
134
142
```
135
143
136
144
Each form category directory (e.g., `exempt-offerings/`, `insider-trading/`) exports:
145
+
137
146
- A `FORM_NAMES_MAP` array of `[formName, FormClass]` tuples
138
147
- A `FORM_NAMES` array of just the form name strings
139
148
@@ -148,11 +157,13 @@ The storage layer uses a **repository pattern** with TypeBox schemas for runtime
148
157
### Repository Pattern
149
158
150
159
Each domain has:
160
+
151
161
-**Schema** (`*Schema.ts`) — TypeBox schema defining the table structure, primary keys, and a DI token
152
162
-**Repo** (`*Repo.ts`) — domain-specific class wrapping one or more repositories, providing save/query methods
153
163
-**Normalization** (`*Normalization.ts`, optional) — functions to clean and standardize input data (e.g., address parsing, name splitting, hash generation)
154
164
155
165
Repos get their underlying storage via dependency injection:
166
+
156
167
-**Production:**`SqliteTabularRepository` registered in `src/config/DefaultDI.ts`
157
168
-**Testing:**`InMemoryTabularRepository` registered in `src/config/TestingDI.ts`
158
169
@@ -189,7 +200,7 @@ Define the TypeBox schema that mirrors the XML structure of the SEC filing.
- Use `Type.Array()` for elements that can repeat in XML — the base `Form` class uses `extractArrayPaths()` to automatically detect these from the schema and configure the XML parser's `isArray` callback
217
229
- Use `Type.Optional()` for elements that may be absent
218
230
- Import shared types from `FormSchemaUtil.ts` (e.g., `TRUE_FALSE_LIST`, `CIK_TYPE`, `STATE_COUNTRY_CODE`)
1.`Form.getParser(schema)` creates an `XMLParser` (from `fast-xml-parser`) configured with `isArray` callbacks derived from the TypeBox schema — any field defined as `Type.Array()` will be treated as an array even if the XML has only one element
254
267
2.`parser.parse(xml)` converts XML to a plain JS object
255
268
3.`Value.Convert(schema, obj)` uses TypeBox to coerce values to the correct types (e.g., string `"123"` to number `123`)
@@ -293,6 +306,7 @@ export async function processFormX({
293
306
```
294
307
295
308
**Patterns from Form D:**
309
+
296
310
- Instantiate repos as needed (they get their storage via DI)
297
311
- Use `"form-x:role-name"` relation names for junction records to distinguish data sources
298
312
- Detect companies in person fields with `hasCompanyEnding()` from `CompanyNormalization`
@@ -362,7 +376,7 @@ import { PersonRepo } from "../../../storage/person/PersonRepo";
362
376
363
377
describe("Form_X", () => {
364
378
beforeEach(() => {
365
-
resetDependencyInjectionsForTesting(); // resets all repos to in-memory
379
+
resetDependencyInjectionsForTesting(); // resets all repos to in-memory
366
380
});
367
381
368
382
it("should parse and store form data", async () => {
0 commit comments