|
1 | 1 | # DataJoint Overview |
2 | 2 |
|
3 | | -DataJoint is a library for interacting with scientific databases integrating computational dependencies as part of the data model. It is an ideal tool for team projects working on shared data-centric computational workflows. |
| 3 | +DataJoint is a library for interacting with scientific databases that support computational dependencies as part of the data model. |
| 4 | +DataJoint serves as a principal framework for organizing data and computations in team projects. |
4 | 5 |
|
5 | | -## Why use databases in scientific studes? |
6 | | - |
7 | | -Many scientists are reluctant to use databases due to their perceived unwieldiness, opting instead to use file repositories for managing their shared data. [Gray, 2005](https://arxiv.org/abs/cs/0502008) |
8 | | - |
9 | | -Yet databases provide several key advantages when it comes to sharing structured dynamic data: |
| 6 | +Databases provide several key advantages when it comes to sharing structured dynamic data: |
10 | 7 |
|
11 | 8 | 1. **Data structure:** databases communicate and enforce structure reflecting the logic of the scientific study. |
12 | 9 | 2. **Concurrent access:** databases support transactions to allow multiple agents to read and write the data concurrently. |
13 | 10 | 3. **Consistency and integrity:** database provide ways to ensure that data operations from multiple parties are combined correctly without loss, misidentification, or mismatches. |
14 | 11 | 4. **Queries:** Databases simplify and accelerate data queries -- functions on data to obtain precise slices of the data without needing to send the entire dataset for analysis. |
15 | 12 |
|
16 | | -## What does DataJoint bring? |
17 | 13 | DataJoint solves several key problems for using databases effectively in scientific projects: |
18 | 14 |
|
19 | 15 | 1. **Complete relational data model:** database programming directly from a scientific computing language such as MATLAB and Python without the need for SQL. |
20 | 16 | 2. **Data definition language:** to define tables and dependencies in simple and consistent ways. |
21 | 17 | 3. **Diagramming notation:** to visualize and navigate tables and dependencies. |
22 | 18 | 4. **Query language:** to create flexible and precise queries with only a few operators. |
23 | 19 | 5. **Serialization framework:** to store and retrieve numerical arrays and other data structures directly in the database. |
24 | | -6. **Support for automated distributed computations:** for computational dependencies in the data. |
| 20 | +6. **Support for automated distributed computations:** for computational dependencies in the data. |
0 commit comments