From eb2095a0af36c46647c12feee7a5ccdf14ee5e12 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Vincent=20Gr=C3=A9goire?= Date: Tue, 5 May 2026 23:05:28 -0400 Subject: [PATCH] Address JOSS reviewer comments on paper Fixed typos and updated the supported-formats footnote to reflect current ITCH (2.0-5.0) and IEX DEEP support. Co-Authored-By: Claude Opus 4.7 (1M context) --- paper/paper.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/paper/paper.md b/paper/paper.md index b85bc00..09ee1f3 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -33,7 +33,7 @@ MeatPy is a Python framework specifically developed for processing and analyzing High-frequency financial markets now operate at sub-millisecond time scales, with individual order placements, cancellations, and executions occurring in nanoseconds. To understand market microstructure, liquidity provision, and price formation, researchers increasingly rely on historical order book data that record every message sent by the exchange. Several exchanges, including Nasdaq, IEX, and the Australian Stock Exchange and Chi-X Australia, make their high-frequency data feeds available for academic research, often at no cost or a reduced fee.^[Nasdaq data can be obtained through their ["Academic Waiver Policy"](https://www.nasdaqtrader.com/content/AdministrationSupport/Policy/ACADEMICWAIVERPOLICY.pdf); IEX provides free historical data, called HIST, on a T+1 basis on their [website](https://iextrading.com/trading/market-data/#hist-download); ASX and Chi-X Australia data can be accessed via [SIRCA](https://sirca.org.au/) by their academic subscribers] These feeds offer unprecedented granularity, capturing not only trades and their direction, but also order placements, cancellations, halts, circuit breakers, and specialized exchange-specific order types. As such, they have been used to study a wide range of market microstructure-related questions [see, e.g., @oharaWhatsNotThere2014;@comerton2019inverted;@shkilkoEveryCloudHas2020;@gregoire2022earnings.] -Despite their common conceptual structure, exchange data feeds differ in message formats, order type definitions, and exchange rules which affect processing logic. A single day of Nasdaq TotalView ITCH data can exceed ten gigabytes in compressed binary form and contain billions of messages. MeatPy addresses these challenges by providing an open-source Python framework for parsing and analyzing high-frequency financial market data. It reconstructs full limit order books from raw feed data, supports multiple exchange-specific message formats,^[MeatPy currently supports TotalView-ITCH versions 4.1 and 5.0 used by exchanges on the INET platform which includes most Nasdaq-operated equity exchanges. Support for IEX DEEP+ is under developement.] and leverages modern Python features such a generators, context managers, and type annotations to improve reliability and developer productivity. By abstracting away the complexity of heterogeneous feed formats and offering efficient data processing primitives, MeatPy enables researchers to focus on designing and executing market microstructure analyses rather than building low-level data engineering infrastructure. +Despite their common conceptual structure, exchange data feeds differ in message formats, order type definitions, and exchange rules which affect processing logic. A single day of Nasdaq TotalView ITCH data can exceed ten gigabytes in compressed binary form and contain billions of messages. MeatPy addresses these challenges by providing an open-source Python framework for parsing and analyzing high-frequency financial market data. It reconstructs full limit order books from raw feed data, supports multiple exchange-specific message formats,^[MeatPy currently supports all Nasdaq ITCH versions from 2.0 to 5.0 used by exchanges on the INET platform which includes most Nasdaq-operated equity exchanges, as well as IEX DEEP.] and leverages modern Python features such as generators, context managers, and type annotations to improve reliability and developer productivity. By abstracting away the complexity of heterogeneous feed formats and offering efficient data processing primitives, MeatPy enables researchers to focus on designing and executing market microstructure analyses rather than building low-level data engineering infrastructure. # State of the Field @@ -47,7 +47,7 @@ MeatPy was built as a new framework rather than a contribution to an existing pr Processing limit order book (LOB) data poses significant technical and conceptual challenges compared to working with more conventional tabular financial datasets. While stock prices, trades, or aggregated quotes are typically available as simple time series, raw exchange feeds record every event affecting the state of the order book, often at sub-millisecond intervals. Because these events depend on one another, the full state of the market at any given time must be inferred by dynamically applying these events in sequence rather than reading a single row of static information. -Commercial data feeds such as Nasdaq TotalView ITCH amplify this challenge by optimizing bandwidth through highly efficient message formats. For example, a new order addition may specify a stock symbol, price, and size as in the first row of Table \ref{messages}. Subsequent messages referencing the same order—such as cancellations or executions omit the stock symbol and price entirely, providing only the unique order identifier. This design is optimized for low-latency transmission but complicates downstream processing: the consumer must maintain a mapping between active order identifiers and their associated attributes to correctly interpret later modifications or removals. Without reconstructing and maintaining this state, it is impossible to know, for example, which stock an execution message pertains to or whether it affects liquidity on the bid or ask side. +Commercial data feeds such as Nasdaq TotalView ITCH amplify this challenge by optimizing bandwidth through highly efficient message formats. For example, a new order addition may specify a stock symbol, price, and size as in the first row of Table \ref{messages}. Subsequent messages referencing the same order—such as cancellations or executions—omit the stock symbol and price entirely, providing only the unique order identifier. This design is optimized for low-latency transmission but complicates downstream processing: the consumer must maintain a mapping between active order identifiers and their associated attributes to correctly interpret later modifications or removals. Without reconstructing and maintaining this state, it is impossible to know, for example, which stock an execution message pertains to or whether it affects liquidity on the bid or ask side. | **Timestamp** | **Message** | **Bid/Ask** | **Shares** | **Price** | **Stock** | **Ref Number** | | --------------- | ----------- | ----------- | ---------- | --------- | --------- | -------------- | @@ -82,11 +82,11 @@ This modular design separates data parsing, message representation, and order bo MeatPy has supported peer-reviewed academic publications examining market microstructure. The framework enabled the analysis in @gregoire2022earnings, investigating how earnings news propagates to stock prices using Nasdaq order book data. It contributed to @comerton2019inverted, examining market quality under inverted fee structures, and @yaali2022hftviz, developing visualization techniques for high-frequency trading data. -The project has 29 stars and 8 forks on GitHub and was downloaded 263 times in the last month^[According to PyPi Stats, see [https://pypistats.org/packages/meatpy](https://pypistats.org/packages/meatpy).], indicating adoption beyond the original development team. The project has received bug reports and feature requests through GitHub Issues, demonstrating engagement from the research community. +The project has 29 stars and 8 forks on GitHub and was downloaded 263 times in the last month^[According to PyPI Stats, see [https://pypistats.org/packages/meatpy](https://pypistats.org/packages/meatpy).], indicating adoption beyond the original development team. The project has received bug reports and feature requests through GitHub Issues, demonstrating engagement from the research community. # AI Usage Disclosure -Generative AI tools were used during the development of MeatPy and the preparation of this manuscript. ChatGPT and GitHub Copilot assisted with code implementation during from the time they became available. Claude Code was subsequently used for bug fixes, improving the API interface and documentation, implementing automated release workflows, and copy-editing this paper. Prior to version 0.2, MeatPy contained no AI-generated code. All AI-generated content was reviewed and edited by the authors to ensure accuracy and clarity. +Generative AI tools were used during the development of MeatPy and the preparation of this manuscript. ChatGPT and GitHub Copilot assisted with code implementation from the time they became available. Claude Code was subsequently used for bug fixes, improving the API interface and documentation, implementing automated release workflows, and copy-editing this paper. Prior to version 0.2, MeatPy contained no AI-generated code. All AI-generated content was reviewed and edited by the authors to ensure accuracy and clarity. # Acknowledgements