Skip to content

Commit c1e5954

Browse files
authored
[deploy] Merge pull request #189 from microsoft/dev
[deploy] Dev: data formulator 0.5 release
2 parents bb47180 + fd00d99 commit c1e5954

161 files changed

Lines changed: 18137 additions & 6724 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

β€ŽDEVELOPMENT.mdβ€Ž

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,5 +106,41 @@ How to set up your local machine.
106106
Open [http://localhost:5000](http://localhost:5000) to view it in the browser.
107107
108108
109+
## Security Considerations for Production Deployment
110+
111+
⚠️ **IMPORTANT SECURITY WARNING FOR PRODUCTION DEPLOYMENT**
112+
113+
When deploying Data Formulator to production, please be aware of the following security considerations:
114+
115+
### Database Storage Security
116+
117+
1. **Local DuckDB Files**: When database functionality is enabled (default), Data Formulator stores DuckDB database files locally on the server. These files contain user data and are stored in the system's temporary directory or a configured `LOCAL_DB_DIR`.
118+
119+
2. **Session Management**:
120+
- When database is **enabled**: Session IDs are stored in Flask sessions (cookies) and linked to local DuckDB files
121+
- When database is **disabled**: No persistent storage is used, and no cookies are set. Session IDs are generated per request for API consistency
122+
123+
3. **Data Persistence**: User data processed through Data Formulator may be temporarily stored in these local DuckDB files, which could be a security risk in multi-tenant environments.
124+
125+
### Recommended Security Measures
126+
127+
For production deployment, consider:
128+
129+
1. **Use `--disable-database` flag** for stateless deployments where no data persistence is needed
130+
2. **Implement proper authentication, authorization, and other security measures** as needed for your specific use case, for example:
131+
- Store DuckDB file in a database
132+
- User authentication (OAuth, JWT tokens, etc.)
133+
- Role-based access control
134+
- API rate limiting
135+
- HTTPS/TLS encryption
136+
- Input validation and sanitization
137+
138+
### Configuration for Production
139+
140+
```bash
141+
# For stateless deployment (recommended for public hosting)
142+
python -m data_formulator.app --disable-database
143+
```
144+
109145
## Usage
110146
See the [Usage section on the README.md page](README.md#usage).

β€ŽMANIFEST.inβ€Ž

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
include py-src/data_formulator/dist/*
2-
include py-src/data_formulator/dist/assets/*
2+
include py-src/data_formulator/dist/assets/*
3+
global-exclude .DS_Store
4+
exclude py-src/examples

β€ŽREADME.mdβ€Ž

Lines changed: 78 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,61 @@
11
<h1>
2-
<img src="./public/favicon.ico" alt="Data Formulator icon" width="28"> <b>Data Formulator: Create Rich Visualizations with AI</b>
2+
<img src="./public/favicon.ico" alt="Data Formulator icon" width="28"> <b>Data Formulator: Vibe with data, in control</b>
33
</h1>
44

55
<div>
66

77
[![arxiv](https://img.shields.io/badge/Paper-arXiv:2408.16119-b31b1b.svg)](https://arxiv.org/abs/2408.16119)&ensp;
88
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)&ensp;
9-
[![YouTube](https://img.shields.io/badge/YouTube-white?logo=youtube&logoColor=%23FF0000)](https://youtu.be/3ndlwt0Wi3c)&ensp;
9+
[![YouTube](https://img.shields.io/badge/YouTube-white?logo=youtube&logoColor=%23FF0000)](https://www.youtube.com/watch?v=GfTE2FLyMrs)&ensp;
1010
[![build](https://github.com/microsoft/data-formulator/actions/workflows/python-build.yml/badge.svg)](https://github.com/microsoft/data-formulator/actions/workflows/python-build.yml)
1111
[![Discord](https://img.shields.io/badge/discord-chat-green?logo=discord)](https://discord.gg/mYCZMQKYZb)
1212

1313
</div>
1414

15-
Transform data and create rich visualizations iteratively with AI πŸͺ„. Try Data Formulator now!
15+
πŸͺ„ Turn data into insights with AI Agents, with the exploration paths you choose. Try Data Formulator now!
1616

17-
Any questions? Ask on the Discord channel! [![Discord](https://img.shields.io/badge/discord-chat-green?logo=discord)](https://discord.gg/mYCZMQKYZb)
17+
- πŸ€– New in v0.5: agent model + interative control [(video)](https://www.youtube.com/watch?v=GfTE2FLyMrs)
18+
- πŸ”₯πŸ”₯πŸ”₯ Try our online demo at [https://data-formulator.ai](https://data-formulator.ai)
19+
- Any questions, thoughts? Discuss in the Discord channel! [![Discord](https://img.shields.io/badge/discord-chat-green?logo=discord)](https://discord.gg/mYCZMQKYZb)
1820

1921
<!-- [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/data-formulator?quickstart=1) -->
2022

21-
<kbd>
22-
<a target="_blank" rel="noopener noreferrer" href="https://codespaces.new/microsoft/data-formulator?quickstart=1" title="open Data Formulator in GitHub Codespaces"><img src="public/data-formulator-screenshot.png"></a>
23-
</kbd>
23+
https://github.com/user-attachments/assets/8ca57b68-4d7a-42cb-bcce-43f8b1681ce2
2424

25+
<!-- <kbd>
26+
<a target="_blank" rel="noopener noreferrer" href="https://codespaces.new/microsoft/data-formulator?quickstart=1" title="open Data Formulator in GitHub Codespaces"><img src="public/data-formulator-screenshot-v0.5.png"></a>
27+
</kbd> -->
2528

2629

2730
## News πŸ”₯πŸ”₯πŸ”₯
31+
32+
[11-07-2025] Data Formulator 0.5: Vibe with your data, in control
33+
34+
- πŸ“Š **Load (almost) any data**: load structured data, extract data from screenshots, from messy text blocks, or connect to databases.
35+
- πŸ€– **Explore data with AI agents**:
36+
- In agent mode, provide a high-level goal and ask agents to explore data for you.
37+
- To stay in control, directly interact with agents: ask for recommendations or specify chart designs with UI + NL inputs, and AI agents will formulate data to realize your design.
38+
- Use data threads to control branching exploration paths: backtrack, branch, or follow up.
39+
- βœ… **Verify AI generated results**: interact with charts and inspect data, formulas, explanations, and code.
40+
- πŸ“ **Create reports to share insights**: choose charts you want to share, and ask agents to create reports grounded in data formulated throughout exploration.
41+
42+
## Previous Updates
43+
44+
Here are milestones that lead to the current design:
45+
- **v0.2.2** ([Demo](https://github.com/microsoft/data-formulator/pull/176)): Goal-driven exploration with agent recommendations and performance improvements
46+
- **v0.2.1.3/4** ([Readme](https://github.com/microsoft/data-formulator/tree/main/py-src/data_formulator/data_loader) | [Demo](https://github.com/microsoft/data-formulator/pull/155)): External data loaders (MySQL, PostgreSQL, MSSQL, Azure Data Explorer, S3, Azure Blob)
47+
- **v0.2** ([Demos](https://github.com/microsoft/data-formulator/releases/tag/0.2)): Large data support with DuckDB integration
48+
- **v0.1.7** ([Demos](https://github.com/microsoft/data-formulator/releases/tag/0.1.7)): Dataset anchoring for cleaner workflows
49+
- **v0.1.6** ([Demo](https://github.com/microsoft/data-formulator/releases/tag/0.1.6)): Multi-table support with automatic joins
50+
- **Model Support**: OpenAI, Azure, Ollama, Anthropic via [LiteLLM](https://github.com/BerriAI/litellm) ([feedback](https://github.com/microsoft/data-formulator/issues/49))
51+
- **Python Package**: Easy local installation ([try it](#get-started))
52+
- **Visualization Challenges**: Test your skills ([challenges](https://github.com/microsoft/data-formulator/issues/53))
53+
- **Data Extraction**: Parse data from images and text ([demo](https://github.com/microsoft/data-formulator/pull/31#issuecomment-2403652717))
54+
- **Initial Release**: [Blog](https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/) | [Video](https://youtu.be/3ndlwt0Wi3c)
55+
56+
<details>
57+
<summary><b>View detailed update history</b></summary>
58+
2859
- [07-10-2025] Data Formulator 0.2.2: Start with an analysis goal
2960
- Some key frontend performance updates.
3061
- You can start your exploration with a goal, or, tab and see if the agent can recommend some good exploration ideas for you. [Demo](https://github.com/microsoft/data-formulator/pull/176)
@@ -74,11 +105,13 @@ Any questions? Ask on the Discord channel! [![Discord](https://img.shields.io/ba
74105

75106
- [10-01-2024] Initial release of Data Formulator, check out our [[blog]](https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/) and [[video]](https://youtu.be/3ndlwt0Wi3c)!
76107

108+
</details>
109+
77110
## Overview
78111

79-
**Data Formulator** is an application from Microsoft Research that uses large language models to transform data, expediting the practice of data visualization.
112+
**Data Formulator** is an application from Microsoft Research that uses AI agents to make it easier to turn data into insights.
80113

81-
Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines *user interface interactions (UI)* and *natural language (NL) inputs* for easier interaction. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI.
114+
Data Formulator is an AI-powered tool for analysts to iteratively explore and visualize data. Started with data in any format (screenshot, text, csv, or database), users can work with AI agents with a novel blended interface that combines *user interface interactions (UI)* and *natural language (NL) inputs* to communicate their intents, control branching exploration directions, and create reports to share their insights.
82115

83116
## Get Started
84117

@@ -92,16 +125,13 @@ Play with Data Formulator with one of the following options:
92125
# install data_formulator
93126
pip install data_formulator
94127

95-
# start data_formulator
96-
data_formulator
97-
98-
# alternatively, you can run data formulator with this command
128+
# Run data formulator with this command
99129
python -m data_formulator
100130
```
101131

102132
Data Formulator will be automatically opened in the browser at [http://localhost:5000](http://localhost:5000).
103133

104-
*Update: you can specify the port number (e.g., 8080) by `python -m data_formulator --port 8080` if the default port is occupied.*
134+
*you can specify the port number (e.g., 8080) by `python -m data_formulator --port 8080` if the default port is occupied.*
105135

106136
- **Option 2: Codespaces (5 minutes)**
107137

@@ -111,16 +141,38 @@ Play with Data Formulator with one of the following options:
111141

112142
- **Option 3: Working in the developer mode**
113143

114-
You can build Data Formulator locally if you prefer full control over your development environment and the ability to customize the setup to your specific needs. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).
144+
You can build Data Formulator locally if you prefer full control over your development environment and develop your own version on top. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).
115145

116146

117147
## Using Data Formulator
118148

119-
Once you've completed the setup using either option, follow these steps to start using Data Formulator:
149+
### Load Data
150+
151+
Besides uploading csv, tsv or xlsx files that contain structured data, you can ask Data Formulator to extract data from screenshots, text blocks or websites, or load data from databases use connectors. Then you are ready to explore.
152+
153+
<img width="1920" alt="image" src="https://github.com/user-attachments/assets/e23cdb47-984c-4ce4-a014-8f36e025e393" />
154+
155+
### Explore Data
156+
157+
There are four levels to explore data based depending on whether you want more vibe or more control:
120158

159+
- Level 1 (most control): Create charts with UI via drag-and-drop, if all fields to be visualized are already in the data.
160+
- Level 2: Specify chart designs with natural language + NL. Describe how new fields should be visualized in your chart, AI will automatically transform data to realize the design.
161+
- Level 3: Get recommendations: Ask AI agents to recommend charts directly from NL descriptions, or even directly ask for exploration ideas.
162+
- Level 4 (most vibe): In agent mode, provide a high-level goal and let AI agents automatically plan and explore data in multiple turns. Exploration threads will be created automatically.
163+
164+
https://github.com/user-attachments/assets/164aff58-9f93-4792-b8ed-9944578fbb72
165+
166+
- Level 5: In practice, leverage all of them to keep up with both vibe and control!
167+
168+
### Create Reports
169+
170+
Use the report builder to compose a report of the style you like, based on selected charts. Then share the reports to others!
171+
172+
<!--
121173
### The basics of data visualization
122-
* Provide OpenAI keys and select a model (GPT-4o suggested) and choose a dataset.
123-
* Choose a chart type, and then drag-and-drop data fields to chart properties (x, y, color, ...) to specify visual encodings.
174+
* Set up model provider, for agentic experience, model with reasoning and strong code generation ablity is recommended.
175+
* Describe the exploration
124176
125177
https://github.com/user-attachments/assets/0fbea012-1d2d-46c3-a923-b1fc5eb5e5b8
126178
@@ -140,12 +192,19 @@ https://github.com/user-attachments/assets/160c69d2-f42d-435c-9ff3-b1229b5bddba
140192
141193
https://github.com/user-attachments/assets/c93b3e84-8ca8-49ae-80ea-f91ceef34acb
142194
143-
Repeat this process as needed to explore and understand your data. Your explorations are trackable in the **Data Threads** panel.
195+
Repeat this process as needed to explore and understand your data. Your explorations are trackable in the **Data Threads** panel. -->
144196

145197
## Developers' Guide
146198

147199
Follow the [developers' instructions](DEVELOPMENT.md) to build your new data analysis tools on top of Data Formulator.
148200

201+
Help wanted:
202+
203+
* Add more database connectors (https://github.com/microsoft/data-formulator/issues/156)
204+
* Scaling up messy data extractor: more document types and larger files.
205+
* Adding more chart templates (e.g., maps).
206+
* other ideas?
207+
149208
## Research Papers
150209
* [Data Formulator 2: Iteratively Creating Rich Visualizations with AI](https://arxiv.org/abs/2408.16119)
151210

β€Žapi-keys.env.templateβ€Ž

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
11
# OpenAI Configuration
22
OPENAI_ENABLED=true
33
OPENAI_API_KEY=#your-openai-api-key
4-
OPENAI_MODELS=gpt-4.5-preview,o3-mini,gpt-4o-mini,o1,o1-mini # comma separated list of models
4+
OPENAI_MODELS=gpt-5,gpt-5-mini,gpt-5-nano,o1,o1-mini # comma separated list of models
55

66
# Azure OpenAI Configuration
77
AZURE_ENABLED=true
88
AZURE_API_KEY=#your-azure-openai-api-key
99
AZURE_API_BASE=https://your-azure-openai-endpoint.openai.azure.com/
1010
AZURE_API_VERSION=2024-02-15-preview
11-
AZURE_MODELS=o3-mini,gpt-4o
11+
AZURE_MODELS=gpt-4.1
1212

1313
# Anthropic Configuration
1414
ANTHROPIC_ENABLED=true
1515
ANTHROPIC_API_KEY=#your-anthropic-api-key
16-
ANTHROPIC_MODELS=claude-3-7-sonnet-latest,claude-3-5-haiku-latest
16+
ANTHROPIC_MODELS=claude-sonnet-4-20250514
1717

1818
# Ollama Configuration
1919
OLLAMA_ENABLED=true
2020
OLLAMA_API_BASE=http://localhost:11434
21-
OLLAMA_MODELS=codellama:7b # models with good code generation capabilities recommended
21+
OLLAMA_MODELS=deepseek-v3.1:latest # models with good code generation capabilities recommended
2222

2323
# if you want to add other models, you can add them with PROVIDER_API_KEY=your-api-key, PROVIDER_MODELS=model1,model2 etc
2424
# (replacing PROVIDER with the provider name like GEMINI, ANTHROPIC, AZURE, OPENAI, OLLAMA etc. as long as they are supported by LiteLLM)

β€Žindex.htmlβ€Ž

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,13 @@
99
name="Data Formulator"
1010
content="Concept-driven Visualization Authoring"
1111
/>
12+
13+
<!-- Preload critical images for faster initial load -->
14+
<link rel="preload" as="image" href="/gas_prices-thumbnail.webp" type="image/webp" />
15+
<link rel="preload" as="image" href="/global_energy-thumbnail.webp" type="image/webp" />
16+
<link rel="preload" as="image" href="/movies-thumbnail.webp" type="image/webp" />
17+
<link rel="preload" as="image" href="/unemployment-thumbnail.webp" type="image/webp" />
18+
1219
<!--
1320
manifest.json provides metadata used when your web app is installed on a
1421
user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/

β€Žpackage.jsonβ€Ž

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,45 +8,46 @@
88
"@emotion/styled": "^11.14.0",
99
"@fontsource/roboto": "^4.5.5",
1010
"@mui/icons-material": "^7.1.1",
11+
"@mui/lab": "^7.0.1-beta.18",
1112
"@mui/material": "^7.1.1",
1213
"@reduxjs/toolkit": "^1.8.6",
1314
"@types/dompurify": "^3.0.5",
1415
"@types/validator": "^13.12.2",
15-
"ag-grid-community": "^32.0.2",
16-
"ag-grid-enterprise": "^32.0.2",
17-
"ag-grid-react": "^32.0.2",
16+
"allotment": "^1.20.4",
1817
"d3": "^7.3.0",
1918
"dompurify": "^3.2.4",
19+
"exceljs": "^4.4.0",
20+
"html2canvas": "^1.4.1",
21+
"katex": "^0.16.22",
2022
"localforage": "^1.10.0",
2123
"lodash": "^4.17.21",
2224
"markdown-to-jsx": "^7.4.0",
23-
"mui-markdown": "^1.1.13",
25+
"mui-markdown": "^2.0.3",
2426
"prettier": "^2.8.3",
2527
"prism-react-renderer": "^1.3.5",
26-
"prismjs": "^1.29.0",
28+
"prismjs": "^1.30.0",
29+
"prop-types": "^15.8.1",
2730
"react": "^18.2.0",
2831
"react-animate-height": "^3.0.4",
2932
"react-animate-on-change": "^2.2.0",
30-
"react-diff-viewer": "^3.1.1",
3133
"react-dnd": "^16.0.1",
3234
"react-dnd-html5-backend": "^16.0.1",
3335
"react-dom": "^18.2.0",
36+
"react-katex": "^3.1.0",
3437
"react-redux": "^8.0.4",
3538
"react-router-dom": "^6.22.0",
3639
"react-selectable-fast": "^3.4.0",
3740
"react-simple-code-editor": "^0.13.1",
38-
"react-split-pane": "^0.1.92",
3941
"react-vega": "^7.6.0",
4042
"react-virtuoso": "^4.3.10",
4143
"redux": "^4.2.0",
4244
"redux-persist": "^6.0.0",
4345
"typescript": "^4.9.5",
44-
"validator": "^13.12.0",
46+
"validator": "^13.15.20",
4547
"vega": "^5.32.0",
4648
"vega-embed": "^6.21.0",
4749
"vega-lite": "^5.5.0",
48-
"vm-browserify": "^1.1.2",
49-
"xlsx": "^0.18.5"
50+
"vm-browserify": "^1.1.2"
5051
},
5152
"scripts": {
5253
"lint": "eslint -c eslint.config.js src/**/*.{ts,tsx} --fix",
@@ -73,6 +74,7 @@
7374
"@types/prismjs": "^1.26.0",
7475
"@types/react": "^18.3.3",
7576
"@types/react-dom": "^18.3.0",
77+
"@types/react-katex": "^3.0.4",
7678
"@typescript-eslint/eslint-plugin": "^8.16.0",
7779
"@typescript-eslint/parser": "^8.16.0",
7880
"@vitejs/plugin-react-swc": "^3.7.0",
@@ -82,6 +84,6 @@
8284
"globals": "^15.12.0",
8385
"sass": "^1.77.6",
8486
"typescript-eslint": "^8.16.0",
85-
"vite": "^5.4.19"
87+
"vite": "^5.4.21"
8688
}
8789
}
824 KB
Loading
202 KB
Loading

β€Žpublic/df_gas_prices.jsonβ€Ž

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

β€Žpublic/df_global_energy.jsonβ€Ž

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
Β (0)