You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: DEVELOPMENT.md
+36Lines changed: 36 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -106,5 +106,41 @@ How to set up your local machine.
106
106
Open [http://localhost:5000](http://localhost:5000) to view it in the browser.
107
107
108
108
109
+
## Security Considerations for Production Deployment
110
+
111
+
β οΈ **IMPORTANT SECURITY WARNING FOR PRODUCTION DEPLOYMENT**
112
+
113
+
When deploying Data Formulator to production, please be aware of the following security considerations:
114
+
115
+
### Database Storage Security
116
+
117
+
1. **Local DuckDB Files**: When database functionality is enabled (default), Data Formulator stores DuckDB database files locally on the server. These files contain user data and are stored in the system's temporary directory or a configured `LOCAL_DB_DIR`.
118
+
119
+
2. **Session Management**:
120
+
- When database is **enabled**: Session IDs are stored in Flask sessions (cookies) and linked to local DuckDB files
121
+
- When database is **disabled**: No persistent storage is used, and no cookies are set. Session IDs are generated per request for API consistency
122
+
123
+
3. **Data Persistence**: User data processed through Data Formulator may be temporarily stored in these local DuckDB files, which could be a security risk in multi-tenant environments.
124
+
125
+
### Recommended Security Measures
126
+
127
+
For production deployment, consider:
128
+
129
+
1. **Use `--disable-database` flag**for stateless deployments where no data persistence is needed
130
+
2. **Implement proper authentication, authorization, and other security measures** as needed for your specific use case, for example:
131
+
- Store DuckDB file in a database
132
+
- User authentication (OAuth, JWT tokens, etc.)
133
+
- Role-based access control
134
+
- API rate limiting
135
+
- HTTPS/TLS encryption
136
+
- Input validation and sanitization
137
+
138
+
### Configuration for Production
139
+
140
+
```bash
141
+
# For stateless deployment (recommended for public hosting)
142
+
python -m data_formulator.app --disable-database
143
+
```
144
+
109
145
## Usage
110
146
See the [Usage section on the README.md page](README.md#usage).
Transform data and create rich visualizations iteratively with AI πͺ. Try Data Formulator now!
15
+
πͺ Turn data into insights with AI Agents, with the exploration paths you choose. Try Data Formulator now!
16
16
17
-
Any questions? Ask on the Discord channel! [](https://discord.gg/mYCZMQKYZb)
17
+
- π€ New in v0.5: agent model + interative control [(video)](https://www.youtube.com/watch?v=GfTE2FLyMrs)
18
+
- π₯π₯π₯ Try our online demo at [https://data-formulator.ai](https://data-formulator.ai)
19
+
- Any questions, thoughts? Discuss in the Discord channel! [](https://discord.gg/mYCZMQKYZb)
18
20
19
21
<!-- [](https://codespaces.new/microsoft/data-formulator?quickstart=1) -->
20
22
21
-
<kbd>
22
-
<atarget="_blank"rel="noopener noreferrer"href="https://codespaces.new/microsoft/data-formulator?quickstart=1"title="open Data Formulator in GitHub Codespaces"><imgsrc="public/data-formulator-screenshot.png"></a>
<a target="_blank" rel="noopener noreferrer" href="https://codespaces.new/microsoft/data-formulator?quickstart=1" title="open Data Formulator in GitHub Codespaces"><img src="public/data-formulator-screenshot-v0.5.png"></a>
27
+
</kbd> -->
25
28
26
29
27
30
## News π₯π₯π₯
31
+
32
+
[11-07-2025] Data Formulator 0.5: Vibe with your data, in control
33
+
34
+
- π **Load (almost) any data**: load structured data, extract data from screenshots, from messy text blocks, or connect to databases.
35
+
- π€ **Explore data with AI agents**:
36
+
- In agent mode, provide a high-level goal and ask agents to explore data for you.
37
+
- To stay in control, directly interact with agents: ask for recommendations or specify chart designs with UI + NL inputs, and AI agents will formulate data to realize your design.
38
+
- Use data threads to control branching exploration paths: backtrack, branch, or follow up.
39
+
- β **Verify AI generated results**: interact with charts and inspect data, formulas, explanations, and code.
40
+
- π **Create reports to share insights**: choose charts you want to share, and ask agents to create reports grounded in data formulated throughout exploration.
41
+
42
+
## Previous Updates
43
+
44
+
Here are milestones that lead to the current design:
45
+
-**v0.2.2** ([Demo](https://github.com/microsoft/data-formulator/pull/176)): Goal-driven exploration with agent recommendations and performance improvements
46
+
-**v0.2.1.3/4** ([Readme](https://github.com/microsoft/data-formulator/tree/main/py-src/data_formulator/data_loader) | [Demo](https://github.com/microsoft/data-formulator/pull/155)): External data loaders (MySQL, PostgreSQL, MSSQL, Azure Data Explorer, S3, Azure Blob)
47
+
-**v0.2** ([Demos](https://github.com/microsoft/data-formulator/releases/tag/0.2)): Large data support with DuckDB integration
48
+
-**v0.1.7** ([Demos](https://github.com/microsoft/data-formulator/releases/tag/0.1.7)): Dataset anchoring for cleaner workflows
49
+
-**v0.1.6** ([Demo](https://github.com/microsoft/data-formulator/releases/tag/0.1.6)): Multi-table support with automatic joins
50
+
-**Model Support**: OpenAI, Azure, Ollama, Anthropic via [LiteLLM](https://github.com/BerriAI/litellm) ([feedback](https://github.com/microsoft/data-formulator/issues/49))
51
+
-**Python Package**: Easy local installation ([try it](#get-started))
52
+
-**Visualization Challenges**: Test your skills ([challenges](https://github.com/microsoft/data-formulator/issues/53))
53
+
-**Data Extraction**: Parse data from images and text ([demo](https://github.com/microsoft/data-formulator/pull/31#issuecomment-2403652717))
-[07-10-2025] Data Formulator 0.2.2: Start with an analysis goal
29
60
- Some key frontend performance updates.
30
61
- You can start your exploration with a goal, or, tab and see if the agent can recommend some good exploration ideas for you. [Demo](https://github.com/microsoft/data-formulator/pull/176)
@@ -74,11 +105,13 @@ Any questions? Ask on the Discord channel! [ and [[video]](https://youtu.be/3ndlwt0Wi3c)!
76
107
108
+
</details>
109
+
77
110
## Overview
78
111
79
-
**Data Formulator** is an application from Microsoft Research that uses large language models to transform data, expediting the practice of data visualization.
112
+
**Data Formulator** is an application from Microsoft Research that uses AI agents to make it easier to turn data into insights.
80
113
81
-
Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines *user interface interactions (UI)* and *natural language (NL) inputs*for easier interaction. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI.
114
+
Data Formulator is an AI-powered tool for analysts to iteratively explore and visualize data. Started with data in any format (screenshot, text, csv, or database), users can work with AI agents with a novel blended interface that combines *user interface interactions (UI)* and *natural language (NL) inputs*to communicate their intents, control branching exploration directions, and create reports to share their insights.
82
115
83
116
## Get Started
84
117
@@ -92,16 +125,13 @@ Play with Data Formulator with one of the following options:
92
125
# install data_formulator
93
126
pip install data_formulator
94
127
95
-
# start data_formulator
96
-
data_formulator
97
-
98
-
# alternatively, you can run data formulator with this command
128
+
# Run data formulator with this command
99
129
python -m data_formulator
100
130
```
101
131
102
132
Data Formulator will be automatically opened in the browser at [http://localhost:5000](http://localhost:5000).
103
133
104
-
*Update: you can specify the port number (e.g., 8080) by `python -m data_formulator --port 8080` if the default port is occupied.*
134
+
*you can specify the port number (e.g., 8080) by `python -m data_formulator --port 8080` if the default port is occupied.*
105
135
106
136
-**Option 2: Codespaces (5 minutes)**
107
137
@@ -111,16 +141,38 @@ Play with Data Formulator with one of the following options:
111
141
112
142
-**Option 3: Working in the developer mode**
113
143
114
-
You can build Data Formulator locally if you prefer full control over your development environment and the ability to customize the setup to your specific needs. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).
144
+
You can build Data Formulator locally if you prefer full control over your development environment and develop your own version on top. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).
115
145
116
146
117
147
## Using Data Formulator
118
148
119
-
Once you've completed the setup using either option, follow these steps to start using Data Formulator:
149
+
### Load Data
150
+
151
+
Besides uploading csv, tsv or xlsx files that contain structured data, you can ask Data Formulator to extract data from screenshots, text blocks or websites, or load data from databases use connectors. Then you are ready to explore.
There are four levels to explore data based depending on whether you want more vibe or more control:
120
158
159
+
- Level 1 (most control): Create charts with UI via drag-and-drop, if all fields to be visualized are already in the data.
160
+
- Level 2: Specify chart designs with natural language + NL. Describe how new fields should be visualized in your chart, AI will automatically transform data to realize the design.
161
+
- Level 3: Get recommendations: Ask AI agents to recommend charts directly from NL descriptions, or even directly ask for exploration ideas.
162
+
- Level 4 (most vibe): In agent mode, provide a high-level goal and let AI agents automatically plan and explore data in multiple turns. Exploration threads will be created automatically.
0 commit comments