IEEE 2941-2021
$93.71
IEEE Standard for Artificial Intelligence (AI) Model Representation, Compression, Distribution, and Management
Published By | Publication Date | Number of Pages |
IEEE | 2021 | 226 |
New IEEE Standard – Active. The AI development interface, AI model interoperable representation, coding format, and model encapsulated format for efficient AI model inference, storage, distribution, and management are discussed in this standard.
PDF Catalog
PDF Pages | PDF Title |
---|---|
1 | IEEE Std 2941-2021 Front Cover |
2 | Title page |
4 | Important Notices and Disclaimers Concerning IEEE Standards Documents Notice and Disclaimer of Liability Concerning the Use of IEEE Standards Documents |
5 | Translations Official statements Comments on standards Laws and regulations Data privacy Copyrights |
6 | Photocopies Updating of IEEE Standards documents Errata Patents IMPORTANT NOTICE |
7 | Participants Participants |
8 | Introduction |
9 | Contents |
11 | 1. Overview 1.1 Scope 1.2 Purpose 1.3 Word usage |
12 | 2. Normative references 3. Definitions, acronyms, and abbreviations 3.1 Definitions |
14 | 3.2 Acronyms and abbreviations 4. Symbols and operators 4.1 Arithmetic operators |
15 | 4.2 Logical operator 4.3 Relational operators |
16 | 4.4 Bitwise operators 4.5 Assignment operators 5. Framework of convolutional neural network representation and model compression |
17 | 6. Syntax and semantics of neural network models 6.1 Data structure 6.1.1 Data structure of neural network structure |
18 | 6.1.2 Data structure of neural network parameters |
19 | 6.2 Syntax description 6.2.1 Overview |
20 | 6.2.2 Definition of model structure 6.2.3 Definition of contributor list 6.2.4 Definition of computational graph |
21 | 6.2.5 Definition of operator node 6.2.6 Definition of variable node 6.2.7 Definition of attribute |
22 | 6.2.8 Definition of other type 6.2.9 Definition of tensor type |
23 | 6.2.10 Definition of tensor 6.2.11 Definition of tensor size |
24 | 6.2.12 Definition of dimension 6.3 Semantic description |
71 | 6.4 Definition of training operator 6.4.1 Loss function |
73 | 6.4.2 Definition of inverse operator |
75 | 7. Compression process 7.1 Multiple models 7.1.1 Definition of multiple models technology |
76 | 7.1.2 Compression of multiple models |
77 | 7.1.3 Shared compression operator for weights of multiple model layers 7.1.3.1 Definition 7.1.3.2 Weight aggregation |
79 | 7.1.4 Residual quantization compression 7.1.4.1 Definition of residual quantization for multiple models 7.1.4.2 Weight sharing |
81 | 7.2 Quantization 7.2.1 Definition |
82 | 7.2.2 Basic quantization operator |
83 | 7.2.2.1 Linear quantization |
84 | 7.2.2.2 Codebook quantization |
86 | 7.2.3 Parameter quantization operator 7.2.3.1 Nonlinear function mapping |
88 | 7.2.3.2 INT4 parameter quantization |
90 | 7.2.3.3 Parameter quantization for bounded ReLU |
91 | 7.2.4 Activate quantization operator 7.2.4.1 Trainable alpha quantization 7.2.4.2 INT4 activation quantization |
93 | 7.2.4.3 Activation quantization for bounded ReLU |
95 | 7.2.4.4 Ratio synchronization quantization |
97 | 7.3 Pruning 7.3.1 Overview |
98 | 7.3.2 Pruning operator |
100 | 7.4 Structured matrix 7.4.1 Structured matrix compression |
101 | 7.4.2 Method for the compression of block circulant matrix with signed vectors 7.4.2.1 Block circulant matrix compression operator |
102 | 7.4.2.2 Random vector dimension list and random vector generation operator |
104 | 7.4.3 Method for the low-rank sparse decomposed structured matrix 7.4.3.1 Definition 7.4.3.2 Decomposition compression operator for the convolutional layers in low-rank sparse decomposed structured matrix |
106 | 7.4.3.3 Compression operator of a fully connected or 1 × 1 convolutional layer in a low-rank sparse decomposed structured matrix |
107 | 8. Decompression process 8.1 Multiple models 8.1.1 Decompression for multiple models |
108 | 8.1.2 Decompression operator for weights of multiple model layers 8.1.2.1 Decompression for weights of multiple model layers 8.1.2.2 Decompression output multiple models |
110 | 8.1.2.3 Decompression output specific model |
111 | 8.1.2.4 Decompression output switched specific models |
112 | 8.1.3 Decompression of residual quantization for multiple models 8.1.3.1 Definition of decompression 8.1.3.2 Decompression of the output target model |
113 | 8.2 Dequantization 8.2.1 Definition 8.2.2 Basic dequantization operator |
115 | 8.2.2.1 Linear dequantization |
116 | 8.2.2.2 Codebook dequantization |
117 | 8.2.3 Parameter dequantization operator 8.2.3.1 Nonlinear function mapping dequantization |
118 | 8.2.3.2 INT4 parameter dequantization |
119 | 8.2.4 Activate dequantization operator 8.2.4.1 Trainable alpha value dequantization |
120 | 8.2.4.2 INT4 activation dequantization |
121 | 8.3 Inverse sparsity/inverse pruning operator 8.3.1 Definition |
122 | 8.3.2 Inverse sparsity |
123 | 8.4 Structured matrix 8.4.1 Decompression of structured matrix |
124 | 8.4.2 Method for the decompression of block circulant matrix with signed vectors 8.4.2.1 Block circulant matrix decompression operator |
126 | 8.4.2.2 Disturbance vector generation operator |
128 | 8.4.2.3 Operator on the layers using signed vector and block circulant matrix techniques |
129 | 8.4.3 Methods for the decompression of low-rank sparse decomposed structured matrix 8.4.3.1 Overview 8.4.3.2 Decompression operator for low-rank sparse decomposed structured matrix |
130 | 8.4.3.3 Decompression operator for the fully connected and 1 × 1 layers in low-rank sparse decomposed structured matrix |
131 | 9. Data generation 9.1 Definition 9.2 Training data generation method 9.2.1 Method of generating training data based on real data 9.2.1.1 Overview 9.2.1.2 Data augmentation method |
133 | 9.2.1.3 Generating data using the GAN |
135 | 9.2.2 Data-free training data generation method 9.2.2.1 Overview 9.2.2.2 Generating training data using the GAN |
138 | 9.3 Multiple models 9.3.1 Method for weight generation in multiple models 9.3.1.1 Multiple models weight update operator |
139 | 9.3.1.2 Multiple models weight shared data generation approach 1 |
141 | 9.3.1.3 Multiple models weight shared data generation approach 2 |
143 | 9.3.2 Residual quantization training method for multiple models |
144 | 9.4 Quantization 9.4.1 Parameter quantization 9.4.1.1 Data generation for INT4 parameter quantization |
150 | 9.4.1.2 Interval shrinkage quantization data generation |
154 | 9.4.2 Activate quantization 9.4.2.1 Data generation for INT4 activate quantization |
158 | 9.4.2.2 Trainable alpha quantization training data generation |
159 | 9.5 Pruning 9.5.1 Overview |
160 | 9.5.2 Sparse data generation method |
163 | 9.5.3 Incremental regularization pruning |
167 | 9.6 Structured matrix 9.6.1 Data generation of structured matrix |
168 | 9.6.2 Approach for generating data to be compressed in block circulant matrix with signed vectors |
172 | 9.6.3 Approach for generating the weight in low-rank sparse decomposed structured matrix 9.6.3.1 Overview |
173 | 9.6.3.2 Approaches for determining hyper-parameter R1, R2, groups and core_size 9.6.3.3 Process for the generation of weights of a low-rank sparse decomposed structured matrix |
175 | 10. Compressed representation of neural network 10.1 Specification of syntax and semantics |
180 | 10.2 Synatx 10.2.1 Neural network compression (NNC) bitstream syntax |
181 | 10.2.2 NNC header syntax |
182 | 10.2.3 NNC layer header syntax |
183 | 10.2.4 NNC 1D array syntax 10.2.5 NNC CTU3D syntax 10.2.6 NNC CTU3D header syntax |
184 | 10.2.7 NNC zdep_array syntax |
185 | 10.2.8 NNC CU3D syntax |
186 | 10.2.9 NNC predicted_codebook syntax 10.2.10 NNC sygnalled_codebook syntax |
187 | 10.2.11 NNC unitree3d syntax |
188 | 10.2.12 NNC octree3d syntax |
190 | 10.2.13 NNC tagtree3d syntax |
192 | 10.2.14 NNC uni_tagtree3d syntax |
194 | 10.2.15 NNC escape syntax 10.3 Semantics 10.3.1 Initialization 10.3.2 NNC bitstream semantics |
195 | 10.3.3 NNC header semantics 10.3.4 NNC layer header semantics |
196 | 10.3.5 NNC 1D array semantics 10.3.6 NNC CTU3D semantics 10.3.7 NNC CU3D header semantics 10.3.8 NNC zdep_array semantics 10.3.9 NNC CU3D semantics |
197 | 10.3.10 NNC predicted codebook semantics 10.3.11 NNC signaled codebook semantics 10.3.12 NNC unitree3d semantics |
199 | 10.3.13 NNC octree3d semantics |
200 | 10.3.14 NNC tagtree3d semantics |
202 | 10.3.15 NNC uni_tagtree3d semantics |
203 | 10.3.16 NNC escape semantics |
204 | 10.4 Parsing process 10.4.1 Description 10.4.2 Initialization 10.4.2.1 Initialization of context model 10.4.2.2 Initialization of AEC decoder 10.4.3 Parsing binary string 10.4.3.1 Description |
205 | 10.4.3.2 Determine ctxIdx |
208 | 10.4.3.3 Parsing bins 10.4.3.3.1 Parsing process 10.4.3.3.2 decode_decision |
209 | 10.4.3.3.3 decode_aec_stuffing_bit 10.4.3.3.4 decode_bypass 10.4.3.3.5 update_ctx |
210 | 10.4.3.4 Binarization 10.4.3.4.1 Description |
212 | 10.4.3.4.2 Binarization for fix length code (FL) 10.4.3.4.3 Binarization for unary code (U) |
213 | 10.4.3.4.4 Binarization for truncated unary code (TU) 10.4.3.4.5 kth-order Exp-Golomb codes (EGk) |
214 | 10.4.3.4.6 Joint truncated unary code and kth-order Exp-Golomb codes (UEGk) |
215 | 10.5 Decoding process 10.5.1 General decoding process 10.5.2 Decoding NNC header 10.5.3 Decoding NNC layer header 10.5.4 Decoding NNC sublayer |
217 | 10.5.5 Decoding 1D array 10.5.6 NNC CTU3D semantics |
218 | 10.5.7 Decoding CTU3D 10.5.8 Decoding ZdepArray 10.5.9 Decoding CU3D |
219 | 10.5.10 Decoding predicted codebook 10.5.11 Decoding signalled codebook 10.5.12 Decoding unitree3d |
220 | 10.5.13 Decoding octree3d 10.5.14 Decoding tagtree3d 10.5.15 Decoding uni_tagtree3d |
221 | 10.5.16 Decoding escape |
222 | 11. Model protection 11.1 Model protection definition |
223 | 11.2 Model encryption process |
224 | 11.3 Model decryption process |
225 | 11.4 Cipher model data structure definition |
226 | Back Cover |