Skip to content

Commit c7f190b

Browse files
Improve Documentation (#23)
Reorganize the main Readme so that it emphasis the most important information. Move discussion of TypeTrees to a dedicated topic, and expand out to discussion Player and AssetBundle builds. Add more examples and some missing arguments to the UnityDataTool command line topic. Reorganize so that the experimental reference command is listed last. Fix a few errors and typos (and hopefully not introduce a few new ones)
1 parent 58753bf commit c7f190b

5 files changed

Lines changed: 288 additions & 123 deletions

File tree

78.1 KB
Loading
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Overview of Unity Content
2+
3+
This section gives an overview of the core Unity file types and how they are used in different types of builds. It also covers the important concept of "TypeTrees". This gives context for understanding what UnityDataTools can and cannot do.
4+
5+
## File Formats
6+
7+
### SerializedFile
8+
9+
A SerializedFile the name used for Unity's binary file format for serializing objects. It is made up of a file header,
10+
then each Object, serialized one after another. This binary format is also available in the Editor, but typically Editor content uses the Unity YAML format instead.
11+
12+
The SerializedFiles in build output represent the project content, but optimized for the target platform. Unity will combine objects from multiple source assets together into files, exclude certain objects (for example editor-only objects), and potentially split or duplicate objects across multiple output files. This arrangement of objects is called the `build layout`. Because of all this transformation, there is not a one-to-one mapping between the source assets and the SerializedFiles in the build output.
13+
14+
### Unity Archive
15+
16+
An Unity Archive is a container file (similar to a zip file). Unity can `mount` this file, which makes the files inside it visible to Unity's loading system, via the Unity "Virtual File System" (VFS). Unity Archives often apply compression to the content, but it is also possible to create an uncompressed Archive.
17+
18+
## AssetBundles
19+
20+
[AssetBundles](https://docs.unity3d.com/Manual/AssetBundlesIntro.html) use the Unity Archive file format, with conventions for what to expect inside the archive. The [Addressables](https://docs.unity3d.com/Manual/com.unity.addressables.html) package uses AssetBundles, so its build output is also made up of Unity Archive files.
21+
22+
AssetBundles always contain at least one SerializedFile. In the case of an AssetBundle containing Scenes there will be multiple Serialized Files. AssetBundles can also contain auxiliary files, such as .resS files containing Textures and Meshes, and .resource files containing audio or video.
23+
24+
UnityDataTools supports opening Archive files, so it is able to analyze AssetBundles.
25+
26+
## Player Builds
27+
28+
A player build produces content as well as compiled code (assemblies, executables) and various configuration files. UnityDataTool only concerns itself with the content portion of that output.
29+
30+
The content compromises of the scenes in the Scene List, the contents of Resources folders, content from the Project Preferences (the "GlobalGameManagers") and also all Assets referenced from those root inputs. This translates into SerializedFiles in the build output.
31+
32+
The SerializedFiles are named in a predictable way. This is a very quick summary:
33+
34+
* Each scene in the SceneList becomes a "level" file, e.g. "level0", "level1".
35+
* Referenced Assets shared between the Scenes becomes "sharedAssets" files, e.g. "sharedAssets0.assets", "sharedAssets1.assets".
36+
* The contents of the Resources folder becomes "resources.assets".
37+
* The Preferences become "globalgamemanager", "globalgamemanager.assets".
38+
39+
If [compression](https://docs.unity3d.com/6000.2/Documentation/ScriptReference/BuildOptions.CompressWithLz4HC.html) is enabled, the Player build will compress all the serialized files into a single Unity Archive file, called `data.unity3d`.
40+
41+
### Enabling TypeTrees in the Player
42+
43+
UnityDataTools supports Player build output, because that uses the same SerializedFiles and Archives that AssetBundles use. But often its output is not very useful. That is because, by default, Player builds do not include TypeTrees.
44+
45+
>[!IMPORTANT]
46+
>It is possible to generate TypeTrees for the Player data, starting in Unity 2021.2.
47+
>This makes that output compatible with UnityDataTool, but it is not a recommended flag to enable for your production builds.
48+
49+
To do so, the **ForceAlwaysWriteTypeTrees** Diagnostic Switch must be enabled in the Editor Preferences (Diagnostics->Editor section).
50+
51+
![](./TypeTreeForPlayer.png)
52+
53+
For more information about TypeTrees see the following section.
54+
55+
## TypeTrees
56+
57+
The TypeTree is a data structure exposing how objects have been serialized, i.e. the name, type and
58+
size of their properties. It is used by Unity when loading an SerializedFile that was built by a
59+
previous Unity version. When Unity is deserializing an object it needs to check if the current Type
60+
definition exactly matches the Type definition used when the object was serialized. If they do not match
61+
Unity will attempt to match up the properties as best as it can, based on the property names and structure
62+
of the data. This process is called a "Safe Binary Read" and is somewhat slower than the regular fast binary read path.
63+
64+
TypeTrees are important in the case of AssetBundles, to avoid rebuilding and redistributing all AssetBundles after each minor upgrade of Unity or after doing minor changes to your MonoBehaviour and ScriptableObject serialization. However there can be a noticable overhead to storing the TypeTrees in each AssetBundle, e.g. the header size of each SerializedFile is bigger.
65+
66+
TypeTrees also make it possible to load an AssetBundle in the Editor, when testing game play.
67+
68+
>[!NOTE]
69+
>There is a flag available when building AssetBundles that will exclude TypeTrees, see [BuildAssetBundleOptions.DisableWriteTypeTree](https://docs.unity3d.com/6000.2/Documentation/ScriptReference/BuildAssetBundleOptions.DisableWriteTypeTree.html). This has implications for future redistribution of your content, so use this flag with caution.
70+
71+
For Player Data the expectation is that you always rebuild all content together with each new build of the player.
72+
So the Assemblies and serialized objects will all have matching types definitions. That is why, by default, the types are not included.
73+
74+
UnityDataTools relies on TypeTrees in order to understand the content of serialized objects. Using this approach it does
75+
not need to hard code any knowledge about what exact types and properties to expect inside each built-in Unity type
76+
(for example Materials and Transforms). And it can interpret serialized C# classes (e.g. MonoBehaviours, ScriptableObjects
77+
and objects serialized through the SerializeReference attribute). That also means that UnityDataTools cannot understand
78+
Player built content, unless the Player was built with TypeTrees enabled.
79+
80+
>[!TIP]
81+
>The `binary2text` tool supports an optional argument `-typeinfo` to enable dumping out the TypeTrees in a SerializedFile header. That is a useful way to learn more about TypeTrees and to see exactly how Unity data is represented in the binary format.
82+
83+
### Platform details for using UnityDataTool with Player Data
84+
85+
The output structure and file formats for a Unity Player build are quite platform specific.
86+
87+
On some platforms the content is packaged into platform-specific container files, for example Android builds use .apk and .obb files. So accessing the actual SerializedFiles may involve mounting or extracting the content of those files, and possibly also opening a data.unity3d file inside them.
88+
89+
UnityDataTools directly supports opening the .data container file format used in Player builds that target Web platforms (e.g. WebGL). Specifically the "archive list" and "archive extract" command line option works with that format. Once extracted you can run other UnityDataTool commands on the output.
90+
91+
Android APK files are not difficult to open and expand using freely available utilities. For example on Windows they can be opened using 7-zip. Once the content is extracted you can run UnityDataTool commands on the output.
92+
93+
## Mapping back to Source Assets
94+
95+
Because Unity rearranges objects in the build into a build layout there is no 1-1 mapping between the output files and the original source assets. Only Scene files have a pretty direct mapping into the build output.
96+
97+
The UnityDataTool only looks at the output of the build, and has no information available about the source paths. This is expected, because the built output is optimized for speed and size, and there is no need to "leak" a lot of details about the source project in the data that gets shipped with the Player.
98+
99+
However in cases where you want to understand what contributes to the size your build, or to confirm whether certain content is actually included, then you may want to correlate the output back to the source assets in your project.
100+
101+
Often the source of content can be easily inferred, based on your own knowledge of your project, and the names of objects. For example the name of a Shader should be unique, and typically has a filename that closely matches the Shader name.
102+
103+
You can also use the [BuildReport](https://docs.unity3d.com/Documentation/ScriptReference/Build.Reporting.BuildReport.html) for Player and AssetBundle builds (excluding Addressables). The [Build Report Inspector](https://github.com/Unity-Technologies/BuildReportInspector) is a tool to aid in analyzing that data.
104+
105+
For AssetBundles built by [BuildPipeline.BuildAssetBundles()](https://docs.unity3d.com/ScriptReference/BuildPipeline.BuildAssetBundles.html), there is also source information available in the .manifest files for each bundle.
106+
107+
Addressables builds do not produce a BuildReport or .manifest files, but it offers similar build information in the user interface.

README.md

Lines changed: 57 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -1,98 +1,81 @@
11
# UnityDataTools
22

3-
The UnityDataTool is a set of command line tools showcasing what can be done with the
4-
UnityFileSystemApi native dynamic library. The main purpose of these tools is to analyze the
5-
content of Unity data files. You can directly jump
6-
[here](https://github.com/Unity-Technologies/UnityDataTools/blob/main/UnityDataTool/README.md)
7-
if your goal is to understand how to use the UnityDataTool command-line tool.
3+
The UnityDataTool is a command line tool and showcase of the UnityFileSystemApi native dynamic library.
4+
The main purpose is for analysis of the content of Unity data files, for example AssetBundles and
5+
Player content.
86

9-
The UnityFileSystemApi library is distributed in the Tools folder of the Unity editor (starting in
10-
version 2022.1.0a14). For simplicity, it is also included in this repository. The library is somewhat
11-
backward compatible, which means that it can read data files generated by any previous version of
12-
Unity. Ideally, you should copy UnityFileSystemApi (.dll/.dylib) from Unity Editor install path
13-
`Data/Tools/` subfolder to `UnityDataTool/UnityFileSystem/` of an Engine version that produced
14-
serialized data you want to analyze.
15-
16-
## What is the purpose of the UnityFileSystemApi native library?
17-
18-
The purpose of the UnityFileSystemApi is to expose the functionalities of the WebExtract and
19-
binary2text tools, but in a more flexible way. To fully understand what it means, let's first
20-
discuss how Unity generates the data files in a build. The data referenced by the scenes in a build
21-
is called the Player Data and is contained in SerializedFiles. A SerializedFile is the file format
22-
used by Unity to store its data. In builds, they contain the serialized assets in the target's
23-
platform-specific format.
24-
25-
When using AssetBundles or Addressables, things are slightly different. Firstly, note that
26-
Addressables are AssetBundles on disk so we will only use the term AssetBundle in the remaining of
27-
this document. AssetBundles are archive files (similar to zip files) that can be mounted at
28-
runtime. They contain SerializedFiles, but contrary to those of the Player Data, they include what
29-
is called a TypeTree<sup>[1](#footnote1)</sup>.
30-
31-
> Note: it is possible to generate TypeTrees for the Player data starting in Unity 2021.2.
32-
> To do so, the *ForceAlwaysWriteTypeTrees* Diagnostic Switch must be enabled in the Editor
33-
> Preferences (Diagnostic/Editor section).
34-
35-
The TypeTree is a data structure exposing how objects have been serialized, i.e. the name, type and
36-
size of their properties. It is used by Unity when loading an AssetBundle that was built by a
37-
previous Unity version (so you don't necessarily have to update all AssetBundles after upgrading a
38-
project to a newer version of Unity).
39-
40-
The content of a SerializedFile including a TypeTree can be converted to a human-readable format
41-
using the binary2text tool that can be found in the Tools folder of Unity. In the case of
42-
AssetBundles, the SerializedFiles must first be extracted using the WebExtract tool that is also in
43-
the Tools folder. For the Player Data, there is no TypeTree because it is included in a build and
44-
therefore not sensitive to Unity version upgrades. Skipping TypeTrees yields reduced file size and
45-
improved loading times.
46-
47-
The text file generated by binary2text can be very useful to
48-
diagnose issues with a build, but they are usually very large and difficult to navigate. Because of
49-
this, a tool called the [AssetBundle Analyzer](https://github.com/faelenor/asset-bundle-analyzer)
50-
was created to make it easier to extract useful information from these files in the form of a
51-
SQLite database. The AssetBundle Analyzer has been quite successful but it has several issues. It
52-
is extremely slow as it runs WebExtract and binary2text on all the AssetBundles of a project and
53-
has to parse very large text files. It can also easily fail because the syntax used by binary2text
54-
is not standard and can even be impossible to parse in some occasions.
7+
The [command line tool](./UnityDataTool/README.md) runs directly on Unity data files, without requiring the Editor to be running. It covers functionality of the Unity tools WebExtract and binary2text, with better performance. And it adds a lot of additional functionality, for example the ability to create a SQLite database for detailed analysis of build content. It is designed to scale for large build outputs and has been used to fine-tune big Unity-based games.
558

56-
The UnityFileSystemApi library has been created to expose WebExtract and binary2text
57-
functionalities. This enables the creation of tools that can read Unity data files with TypeTrees.
58-
With it, it becomes very easy to create a binary2text-like tool that can output the data in any
59-
format or a new faster and simpler AssetBundle Analyzer.
9+
The command line tool uses the UnityFileSystemApi library to access the content of Unity Archives and Serialized files, which are Unity's primary binary formats. This repository also serves as a reference for how this library could be used as part of incorporating functionality into your own tools.
6010

6111
## Repository content
6212

6313
The repository contains the following items:
64-
* UnityFileSystem: source code of a .NET class library exposing the functionalities or the
65-
UnityFileSystemApi native library.
66-
* UnityFileSystem.Tests: test suite for the UnityFileSystem library.
67-
* UnityFileSystemTestData: the Unity project used to generate the test data.
68-
* TestCommon: a helper library used by the test projects.
69-
* [UnityDataTool](UnityDataTool/README.md): a command-line tool providing several features that can
70-
be used to analyze the content of Unity data files.
14+
* [UnityDataTool](UnityDataTool/README.md): a command-line tool providing access to the Analyzer, TextDumper and other class libraries.
7115
* [Analyzer](Analyzer/README.md): a class library that can be used to extract key information
72-
from Unity data files and output it into a SQLite database (similar to the
73-
[AssetBundle Analyzer](https://github.com/faelenor/asset-bundle-analyzer)).
16+
from Unity data files and output it into a SQLite database.
7417
* [TextDumper](TextDumper/README.md): a class library that can be used to dump SerializedFiles into
7518
a human-readable format (similar to binary2text).
7619
* [ReferenceFinder](ReferenceFinder/README.md): a class library that can be used to find
7720
reference chains from objects to other objects using a database created by the Analyzer
21+
* UnityFileSystem: source code and binaries of a .NET class library exposing the functionalities or the
22+
UnityFileSystemApi native library.
23+
* UnityFileSystem.Tests: test suite for the UnityFileSystem library.
24+
* UnityFileSystemTestData: the Unity project used to generate the test data.
25+
* TestCommon: a helper library used by the test projects.
26+
27+
## Getting the UnityFileSystemApi library
28+
29+
The UnityFileSystemApi library is distributed in the Tools folder of the Unity editor (starting in
30+
version 2022.1.0a14). For convenience this repository includes a copy of the Unity 2022 Windows, Mac and Linux builds of the
31+
library, in the `UnityFileSystem/` directory. The library is somewhat backward compatible,
32+
which means that it can read data files generated by any previous version of
33+
Unity. Ideally, you should copy UnityFileSystemApi (.dll/.dylib) from Unity Editor install path
34+
`Data/Tools/` subfolder to `UnityDataTool/UnityFileSystem/` of an Engine version that produced
35+
serialized data you want to analyze.
7836

7937
## How to build
8038

39+
Currently, we do not host builds of UnityDataTools, you will need to clone or download this repo and build it yourself.
40+
8141
1) The projects in this solution require the [.NET 9.0 SDK](https://dotnet.microsoft.com/en-us/download/dotnet/9.0).
82-
2) Copy `UnityFileSystemApi` library from UnityEditor installation
83-
`{UnityEditor}/Data/Tools/` to `UnityDataTool/UnityFileSystem/` before building.
42+
2) Copy `UnityFileSystemApi` library from your Unity Editor installation, in
43+
`{UnityEditor}/Data/Tools/` to `UnityDataTool/UnityFileSystem/`. This step is typically optional, because a previously built version of the library is included in the repo that can read the output from most Unity Versions.
8444
3) Build using `dotnet build -c Release`
8545

86-
Note: You can use your favorite IDE to build solution.
87-
Tested Visual Studio and Rider on Windows and Rider on Mac.
46+
Note: Alternatively you can build with your favorite IDE. This was tested with Visual Studio and Rider on Windows and Rider on Mac.
47+
48+
See the documentation page for the [command line tool](./UnityDataTool/README.md) for information about how to run the tool after you have built it.
49+
50+
## What is the purpose of the UnityFileSystemApi native library?
51+
52+
The purpose of the UnityFileSystemApi is to expose the functionalities of the WebExtract and
53+
binary2text tools, but in a more flexible way.
54+
55+
To better understand the files and data formats that the Unity supports in the runtime see [this topic](./Documentation/unity-content-format.md).
56+
57+
## Origins
58+
59+
This tool is the evolution of the [AssetBundle Analyzer](https://github.com/faelenor/asset-bundle-analyzer)
60+
written by [Francis Pagé](https://www.github.com/faelenor).
61+
62+
That project was the first to introduce the SQLite database analysis of Unity build output to address
63+
the difficulty of diagnosing build issues through the raw binary2text output, which is large and difficult to navigate.
64+
65+
The AssetBundle Analyzer was quite successful, but it has several issues. It
66+
is extremely slow as it runs WebExtract and binary2text on all the AssetBundles of a project and
67+
has to parse very large text files. It can also easily fail because the syntax used by binary2text
68+
is not standard and can even be impossible to parse in some occasions.
69+
70+
To address those problems [@faelenor](https://www.github.com/faelenor) established this UnityDataTools
71+
repository and the UnityFileSystemApi library was created within Unity, to replace the usage of WebExtract and
72+
binary2text functionalities. With the library, it becomes very easy to create a binary2text-like tool
73+
that can output the data in any format, as well as the fast and simpler code for generating the SQLite output.
74+
75+
This tool continues to be useful in recent Unity versions, for example Unity 6.
8876

8977
## Disclaimer
78+
9079
This project is provided on an "as-is" basis and is not officially supported by Unity. It is an
9180
experimental tool provided as an example of what can be done using the UnityFileSystemApi. You can
9281
report bugs and submit pull requests, but there is no guarantee that they will be addressed.
93-
94-
---
95-
*Footnotes*: <a name="footnote1">1</a>: AssetBundles include the TypeTree by default but this can
96-
be disabled by using the
97-
[DisableWriteTypeTree](https://docs.unity3d.com/ScriptReference/BuildAssetBundleOptions.DisableWriteTypeTree.html)
98-
option.

0 commit comments

Comments
 (0)