Skip to content

Commit c39fc9b

Browse files
feat: Add useBigIntTimestamp option (#67)
* Add useBigIntTimestamp option * Update documentation * Move the flag to Type.Timestamp from Type.Date * Add tests
1 parent aeddf23 commit c39fc9b

10 files changed

Lines changed: 81 additions & 22 deletions

File tree

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -114,11 +114,12 @@ Data extraction can be customized using options provided to table generation met
114114

115115
```js
116116
const table = tableFromIPC(ipc, {
117-
useDate: true, // map dates and timestamps to Date objects
118-
useDecimalInt: true, // use BigInt for decimals, do not coerce to number
119-
useBigInt: true, // use BigInt for 64-bit ints, do not coerce to number
120-
useMap: true, // create Map objects for [key, value] pair lists
121-
useProxy: true // use zero-copy proxies for struct and table row objects
117+
useDate: true, // map dates and timestamps to Date objects
118+
useDecimalInt: true, // use BigInt for decimals, do not coerce to number
119+
useBigInt: true, // use BigInt for 64-bit ints, do not coerce to number
120+
useBigIntTimestamp: true, // use BigInt for timestamps, do not coerce to float
121+
useMap: true, // create Map objects for [key, value] pair lists
122+
useProxy: true // use zero-copy proxies for struct and table row objects
122123
});
123124
```
124125

docs/api/data-types.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The table below provides an overview of all data types supported by the Apache A
2121
| 7 | [Decimal](#decimal) |||| `number`, or scaled integers via the `useDecimalInt` flag |
2222
| 8 | [Date](#date) |||| `number`, or `Date` via the `useDate` flag. |
2323
| 9 | [Time](#time) |||| `number`, or `bigint` for 64-bit values via the `useBigInt` flag |
24-
| 10 | [Timestamp](#timestamp) |||| `number`, or `Date` via the `useDate` flag. |
24+
| 10 | [Timestamp](#timestamp) |||| `number`, `bigint` via `useBigIntTimestamp` flag, or `Date` via the `useDate` flag. |
2525
| 11 | [Interval](#interval) |||| depends on the interval unit |
2626
| 12 | [List](#list) |||| `Array` or `TypedArray` of child type |
2727
| 13 | [Struct](#struct) |||| `object`, properties depend on child types |
@@ -398,7 +398,8 @@ timeNanosecond()
398398

399399
Create a Timestamp data type instance. Timestamp values are 64-bit signed integers representing an elapsed time since a fixed epoch, stored in either of four *unit*s: seconds, milliseconds, microseconds or nanoseconds, and are optionally annotated with a *timezone*. Timestamp values do not include any leap seconds (in other words, all days are considered 86400 seconds long).
400400

401-
Timestamp values are stored in a `BigInt64Array` and converted to millisecond-based JavaScript `number` values (potentially with fractional digits) upon extraction. An error is raised if a value exceeds either `Number.MIN_SAFE_INTEGER` or `Number.MAX_SAFE_INTEGER`. Pass the `useDate` extraction option (e.g., to [`tableFromIPC`](/flechette/api/#tableFromIPC) or [`tableFromArrays`](/flechette/api/#tableFromArrays)) to instead extract timestamp values as JavaScript `Date` objects.
401+
Timestamp values are stored in a `BigInt64Array` and converted to millisecond-based JavaScript `number` values (potentially with fractional digits) upon extraction. An error is raised if a value exceeds either `Number.MIN_SAFE_INTEGER` or `Number.MAX_SAFE_INTEGER`. Pass the `useDate` extraction option (e.g., to [`tableFromIPC`](/flechette/api/#tableFromIPC) or [`tableFromArrays`](/flechette/api/#tableFromArrays)) to instead extract timestamp values as JavaScript `Date` objects. Alternatively, pass the `useBigIntTimestamp` extraction option to extract timestamp values as JavaScript `bigint` (bypass float conversion).
402+
402403

403404
* *unit* (`number`): The time unit, one of `TimeUnit.SECOND`, `TimeUnit.MILLISECOND` (default), `TimeUnit.MICROSECOND`, or `TimeUnit.NANOSECOND`.
404405
* *timezone* (`string`): An optional string for the name of a timezone. If provided, the value should either be a string as used in the Olson timezone database (the "tz database" or "tzdata"), such as "America/New_York", or an absolute timezone offset of the form "+XX:XX" or "-XX:XX", such as "+07:30". Whether a timezone string is present indicates different semantics about the data. That said, Flechette does not process the timezone information.

docs/api/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ By default Flechette assumes input data is uncompressed. If input IPC data conta
2525
* *data* (`ArrayBuffer` \| `Uint8Array` \| `Uint8Array[]`): The source byte buffer, or an array of buffers. If an array, each byte array may contain one or more self-contained messages. Messages may NOT span multiple byte arrays.
2626
* *options* (`ExtractionOptions`): Options for controlling how values are transformed when extracted from an Arrow binary representation.
2727
* *useBigInt* (`boolean`): If true, extract 64-bit integers as JavaScript `BigInt` values. Otherwise, coerce long integers to JavaScript number values (default `false`), raising an error if the integer can not be represented as a double precision floating point number.
28+
* *useBigIntTimestamp* (`boolean`): If true, extract timestamps as JavaScript `BigInt` values. Otherwise, coerce timestamps to float milliseconds.
2829
* *useDate* (`boolean`): If true, extract dates and timestamps as JavaScript `Date` objects. Otherwise, return numerical timestamp values (default `false`).
2930
* *useDecimalInt* (`boolean`): If true, extract decimal-type data as scaled integer values, where fractional digits are scaled to integer positions. Returned integers are `BigInt` values for decimal bit widths of 64 bits or higher and 32-bit integers (as JavaScript `number`) otherwise. If false, decimals are lossily converted to floating-point numbers (default).
3031
* *useMap* (`boolean`): If true, extract Arrow 'Map' values as JavaScript `Map` instances. Otherwise, return an array of [key, value] pairs compatible with both `Map` and `Object.fromEntries` (default `false`).

docs/index.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -114,11 +114,12 @@ Data extraction can be customized using options provided to table generation met
114114

115115
```js
116116
const table = tableFromIPC(ipc, {
117-
useDate: true, // map dates and timestamps to Date objects
118-
useDecimalInt: true, // use scaled ints for decimals, not floating point
119-
useBigInt: true, // use BigInt for 64-bit ints, do not coerce to number
120-
useMap: true, // create Map objects for [key, value] pair lists
121-
useProxy: true // use zero-copy proxies for struct and table row objects
117+
useDate: true, // map dates and timestamps to Date objects
118+
useDecimalInt: true, // use scaled ints for decimals, not floating point
119+
useBigInt: true, // use BigInt for 64-bit ints, do not coerce to number
120+
useBigIntTimestamp: true, // use BigInt for timestamps, do not coerce to float
121+
useMap: true, // create Map objects for [key, value] pair lists
122+
useProxy: true // use zero-copy proxies for struct and table row objects
122123
});
123124
```
124125

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
"postbuild": "npm run types",
3131
"lint": "eslint src test",
3232
"test:unit": "vitest",
33-
"test": "vitest --run",
33+
"test": "vitest --run",
3434
"prepublishOnly": "npm run test && npm run lint && npm run build"
3535
},
3636
"devDependencies": {

src/batch-type.js

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ import { invalidDataType } from './data-types.js';
1212
*/
1313
export function batchType(type, options = {}) {
1414
const { typeId, bitWidth, mode, precision, unit } = /** @type {any} */(type);
15-
const { useBigInt, useDate, useDecimalInt, useMap, useProxy } = options;
15+
const { useBigInt, useBigIntTimestamp, useDate, useDecimalInt, useMap, useProxy } = options;
1616

1717
switch (typeId) {
1818
case Type.Null: return NullBatch;
@@ -29,7 +29,7 @@ export function batchType(type, options = {}) {
2929
useDate && DateBatch
3030
);
3131
case Type.Timestamp:
32-
return wrap(
32+
return useBigIntTimestamp ? DirectBatch : wrap(
3333
unit === TimeUnit.SECOND ? TimestampSecondBatch
3434
: unit === TimeUnit.MILLISECOND ? TimestampMillisecondBatch
3535
: unit === TimeUnit.MICROSECOND ? TimestampMicrosecondBatch

src/types.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -358,6 +358,11 @@ export interface ExtractionOptions {
358358
* Otherwise, coerce long integers to JavaScript number values (default).
359359
*/
360360
useBigInt?: boolean;
361+
/**
362+
* If true, extract 64-bit timestamps as JavaScript `BigInt` values.
363+
* Otherwise, coerce timestamps to float milliseconds.
364+
*/
365+
useBigIntTimestamp?: boolean;
361366
/**
362367
* If true, extract Arrow 'Map' values as JavaScript `Map` instances.
363368
* Otherwise, return an array of [key, value] pairs compatible with

test/column-from-array.test.js

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,22 @@ describe('columnFromArray', () => {
179179
test(ms, timestamp(TimeUnit.MILLISECOND));
180180
test(ms.map(ts => ts + 0.001), timestamp(TimeUnit.MICROSECOND));
181181
test(ms.map(ts => ts + 0.000001), timestamp(TimeUnit.NANOSECOND));
182+
183+
// bigint timestamps
184+
const msBigInt = ms.map(BigInt);
185+
test(msBigInt, timestamp(TimeUnit.MILLISECOND), { useBigIntTimestamp: true });
186+
test([...msBigInt, null], timestamp(TimeUnit.MILLISECOND), { useBigIntTimestamp: true });
187+
188+
const secCol = columnFromArray(ms, timestamp(TimeUnit.SECOND), { useBigIntTimestamp: true });
189+
expect(Array.from(secCol)).toStrictEqual(ms.map(t => BigInt(t) / 1000n));
190+
191+
const usMs = ms.map(t => t + 0.001);
192+
const usCol = columnFromArray(usMs, timestamp(TimeUnit.MICROSECOND), { useBigIntTimestamp: true });
193+
expect(Array.from(usCol)).toStrictEqual(usMs.map(t => BigInt(Math.round(t * 1000))));
194+
195+
const ns = [0, 8640000, -8640000];
196+
const nsCol = columnFromArray(ns, timestamp(TimeUnit.NANOSECOND), { useBigIntTimestamp: true });
197+
expect(Array.from(nsCol)).toStrictEqual(ns.map(t => BigInt(t) * 1000000n));
182198
});
183199

184200
it('builds interval year-month columns', () => {

test/table-from-ipc.test.js

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import { describe, it, expect } from "vitest";
22
import { readFile } from 'node:fs/promises';
33
import { tableFromIPC } from '../src/index.js';
4-
import { binaryView, bool, dateDay, decimal, decimal32, decimal128, decimal256, decimal64, empty, fixedListInt32, fixedListUtf8, float32, float64, int16, int32, int64, int8, intervalMonthDayNano, largeListView, listInt32, listUtf8, listView, map, runEndEncoded32, runEndEncoded64, struct, timestampMicrosecond, timestampMillisecond, timestampNanosecond, timestampSecond, uint16, uint32, uint64, uint8, union, utf8, utf8View } from './util/data.js';
4+
import { binaryView, bool, dateDay, decimal, decimal32, decimal128, decimal256, decimal64, empty, fixedListInt32, fixedListUtf8, float32, float64, int16, int32, int64, int8, intervalMonthDayNano, largeListView, listInt32, listUtf8, listView, map, runEndEncoded32, runEndEncoded64, struct, timestampMicrosecond, timestampMillisecond, timestampNanosecond, timestampSecond, uint16, uint32, uint64, uint8, union, utf8, utf8View, timestampNanosecondBigInt, timestampMicrosecondBigInt, timestampMillisecondBigInt, timestampSecondBigInt } from './util/data.js';
55
import { RowIndex } from '../src/util/struct.js';
66

77
const toBigInt = v => BigInt(v);
@@ -95,6 +95,10 @@ describe('tableFromIPC', () => {
9595
it('decodes timestamp microsecond data to dates', () => test(timestampMicrosecond, Array, { useDate: true }, toDate));
9696
it('decodes timestamp millisecond data to dates', () => test(timestampMillisecond, Array, { useDate: true }, toDate));
9797
it('decodes timestamp second data to dates', () => test(timestampSecond, Array, { useDate: true }, toDate));
98+
it('decodes timestamp nanosecond data to bigint', () => test(timestampNanosecondBigInt, BigInt64Array, { useBigIntTimestamp: true }));
99+
it('decodes timestamp microsecond data to bigint', () => test(timestampMicrosecondBigInt, BigInt64Array, { useBigIntTimestamp: true }));
100+
it('decodes timestamp millisecond data to bigint', () => test(timestampMillisecondBigInt, BigInt64Array, { useBigIntTimestamp: true }));
101+
it('decodes timestamp second data to bigint', () => test(timestampSecondBigInt, BigInt64Array, { useBigIntTimestamp: true }));
98102

99103
it('decodes interval year/month/nano data', () => test(intervalMonthDayNano));
100104

test/util/data.js

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,36 @@ export function timestampSecond() {
151151
return loadData(data, 'timestampSecond', vals);
152152
}
153153

154+
export function timestampNanosecondBigInt() {
155+
const ns = [456789n, 738209n];
156+
const ts = ['1992-09-20T11:30:00.123456789Z', '2002-12-13T07:28:56.564738209Z'];
157+
const data = [ts, ts.concat(null)];
158+
const vals = data.map(v => v.map((d, i) => d === null ? null : BigInt(+new Date(d)) * 1000000n + ns[i]));
159+
return loadData(data, 'timestampNanosecond', vals);
160+
}
161+
162+
export function timestampMicrosecondBigInt() {
163+
const us = [457000n, 738000n];
164+
const ts = ['1992-09-20T11:30:00.123457Z', '2002-12-13T07:28:56.564738Z'];
165+
const data = [ts, ts.concat(null)];
166+
const vals = data.map(v => v.map((d, i) => d === null ? null : BigInt(+new Date(d)) * 1000000n + us[i]));
167+
return loadData(data, 'timestampMicrosecond', vals);
168+
}
169+
170+
export function timestampMillisecondBigInt() {
171+
const ts = ['1992-09-20T11:30:00.123Z', '2002-12-13T07:28:56.565Z'];
172+
const data = [ts, ts.concat(null)];
173+
const vals = data.map(v => v.map(d => d === null ? null : BigInt(+new Date(d))));
174+
return loadData(data, 'timestampMillisecond', vals);
175+
}
176+
177+
export function timestampSecondBigInt() {
178+
const ts = ['1992-09-20T11:30:00Z', '2002-12-13T07:28:57Z'];
179+
const data = [ts, ts.concat(null)];
180+
const vals = data.map(v => v.map(d => d === null ? null : BigInt(+new Date(d)) / 1000n));
181+
return loadData(data, 'timestampSecond', vals);
182+
}
183+
154184
export function intervalMonthDayNano() {
155185
return loadData([
156186
['2 years', null, '12 years 2 month 1 day 5 seconds', '1 microsecond']
@@ -215,18 +245,18 @@ export function union() {
215245
export function map() {
216246
return loadData([
217247
[
218-
new Map([ ['foo', 1], ['bar', 2] ]),
219-
new Map([ ['foo', null], ['baz', 3] ])
248+
new Map([['foo', 1], ['bar', 2]]),
249+
new Map([['foo', null], ['baz', 3]])
220250
]
221251
], 'map');
222252
}
223253

224254
export function struct() {
225255
return loadData([
226-
[ {a: 1, b: 'foo'}, {a: 2, b: 'baz'} ],
227-
[ {a: 1, b: 'foo'}, null, {a: 2, b: 'baz'} ],
228-
[ {a: null, b: 'foo'}, {a: 2, b: null} ],
229-
[ {a: ['a', 'b'], b: Math.E}, {a: ['c', 'd'], b: Math.PI} ]
256+
[{ a: 1, b: 'foo' }, { a: 2, b: 'baz' }],
257+
[{ a: 1, b: 'foo' }, null, { a: 2, b: 'baz' }],
258+
[{ a: null, b: 'foo' }, { a: 2, b: null }],
259+
[{ a: ['a', 'b'], b: Math.E }, { a: ['c', 'd'], b: Math.PI }]
230260
], 'struct');
231261
}
232262

0 commit comments

Comments
 (0)