Product Guide of MLU370-X4_ Intelligent
Accelerating Card
Release 0.9.4
Preliminary
Cambricon
April 25, 2021
Directory
1.
Foreword................................................................................................................................... 1
1.1. Copyright........................................................................................................................... 1
1.3. Update History...................................................................................................................2
2. Outline.......................................................................................................................................3
3. Specifications of MLU370-X4..................................................................................................5
3.1.
Performance Specifications............................................................................................... 5
3.2.
Software Specifications..................................................................................................... 5
3.3.
Specifications of the Use Environment............................................................................. 9
3.4.
Specifications of Structure and Dimension..................................................................... 10
3.5.
Size and Weight of the Package...................................................................................... 11
3.6. Heat Dissipation Specifications.......................................................................................11
3.6.1. MLU370-X4’s Board Card Power Consumption and Temperature Definitions..... 11
3.6.2.
Resistance Curve of the Radiator of MLU370-X4.................................................. 12
3.6.3. MLU370-X4 Supported Card Direction.................................................................. 13
3.6.4. MLU370-X4 Supported Ambient Temperature for Working and Minimum Airflow
Volume Requirements of the Radiator at Different Temperature.................................................. 13
3.6.5. A Curve for the Average Temperature of the Inlet and the Minimum Air Flow
Requirement through the Radiator of MLU370-X4.......................................................................14
3.7.
Power specifications and Electrical Specifications......................................................... 14
4. Development Environment of Cambricon NeuWare.............................................................. 16
Cambricon®
1.1. Copyright
DISCLAIMER
1. Foreword
Cambricon Technologies Corporation Limited (hereinafter referred to as eCambricon”ambricongies Corporatioon,
warranty (express, implied, or statutory) or guarantee regarding the information contained herein, and expressly
disclaims any and all implied warranties of merchantability, title, noninfringement of intellectual property or fitness
for a particular purpose, and Cambricon DOES NOT assume any liability arising out of the application or use of any
product or services. Cambricon shall have no liability related to any defaults, damages, costs or problems which may
be based on or attributable to: (i) the use of the Cambricon product in any manner that is contrary to this guide, or (ii)
customer product designs.
LIMITATION OF LIABILITY
In no event shall Cambricon be liable for any damages whatsoever (including, without limitation, damages for
loss of profits, business interruption and loss of information) arising out of the use of or inability to use this guide,
even if Cambricon has been advised of the possibility of such damages. Notwithstanding any damages that customer
might incur for any reason whatsoever, Cambricon a aggregate and cumulative liability towards customer for the
product described in this guide shall be limited in accordance with the Cambricon terms and conditions of sale for the
product.
ACCURACY OF INFORMATION
Information provided in this document is proprietary to Cambricon, and Cambricon reserves the right to make
any changes to the information in this document or to any products and services at any time without notice. The
information contained in this guide and all other information contained in Cambricon documentation referenced in
this guide is provided provided Cambricon does not warrant the accuracy or completeness of the information, text,
graphics, links or other items contained within this guide. Cambricon may make changes to this guide, or to the
products described therein, at any time without notice, but makes no commitment to update this guide.
Performance tests and ratings set forth in this guide are measured using specific chips or computer systems or
components. The results shown in this guide reflect approximate performance of Cambricon products as measured by
those tests. Any difference in system hardware or software design or configuration may affect actual performance.
As set forth above, Cambricon makes no representation, warranty or guarantee that the product described in this
guide will be suitable for any specified use. Cambricon does not represent or warrant that it tests all parameters of
each product.
It is customer customerable for any specified use. formance. A is suitable and fit for the application
planned by the customer and to do the necessary testing for the application in order to avoid a default of the
application or the product.
Weaknesses in customer warrant that it tests all parameters of each productlity of Cambricon product and may
result in additional or different conditions and/or requirements beyond those contained in this guide.
1
IP NOTICES
Cambricon and the Cambricon logo are trademarks and/or registered trademarks of Cambricon Corporation in
the Unites States and other countries. Other company and product names may be trademarks of the respective
companies with which they are associated.
This guide is copyrighted and is protected by worldwide copyright laws and treaty provisions. This guide may
not be copied, reproduced, modified, published, uploaded, posted, transmitted, or distributed in any way, without
Cambricon's prior written permission. Other than the right for customer to use the information in this guide with the
product, no other right or license, either express or implied, is granted by Cambricon under this guide. For the
avoidance of doubt, Cambricon does not grant any right or license (express or implied) to customer under any patents,
copyrights, trademarks, trade secret or any other intellectual property or proprietary rights of Cambricon.
Copyright
© Cambricon Corporation. All rights reserved.
1.2. Version Record
Table 1.1 Version Record
Name
of
the
Product Guide
of MLU370-X4_
Intelligent
Document
Accelerator Card
Version Number
Author
Date
V0.9.4
Cambricon
2021.04.25
1.3. Update History
V0.9.4
Update time
:
Updated Content:
-Preliminary Version
Cambricon®
2. Outline
Fig. 1.1 MLU370-X4 Intelligent Accelerating Card
Fully Upgraded AI Accelerator Card with Data Center Integrating Training and Inference
MLU370-X4 intelligent accelerator card is based on the new generation of Cambricon SIYUAN 370 chip
with PCIe 4.0 X16 interface. It is a full-height, full-length, single-width (FHFL-SS) standard PCIe size
accelerator card, suitable for the latest CPU platforms in the industry. In addition, it can be easily mounted on
the most advanced artificial intelligence server to quickly realize the deployment of AI computing power. The
power consumption of the MLU370-X4 accelerator card is only 150W, which can provide powerful computing
power support for highly diversified artificial
intelligence applications such as computer vision, natural
language processing, speech and traditional machine learning, and achieve AI computing with high energy
efficiency.
Cambricon SIYUAN 370 Chip
The Cambricon SIYUAN 370 chip is manufactured using advanced TSMC’s 7nm technology, and its
performance indicators are comprehensively improved compared to the previous generation. The SIYUAN 370
3
chip contains up to 24 MLU-Cores, and adopts the MLUv03 architecture to ensure multi-core parallel
efficiency; 24G memory can provide 3 times the memory bandwidth of the previous generation, effectively
solving the bandwidth bottleneck in the AI computing process; the new platform vMLU can support 8 instances
on one chip, helping customers achieve cloud virtualization and container-level resource isolation; the SIYUAN
370 chip provides comprehensive AI precision support for INT16, INT8, INT4, FP32, FP16, BF16, etc., to
meet the computing power requirements of diverse neural networks, and has both versatility and performance.
Significantly Improved AI Computing Power
Cambricon MLU370-X4 not only greatly improves fixed-point computing power, but also fully upgrades
floating-point computing power, and the built-in hardware video and image codec capabilities are further
enhanced. When INT8 precision is adopted for AI inference computations, the performance of non-sparse
network is 2 times higher than that of the previous generation of the accelerator card. Besides, the computing
power of floating-point precision such as FP32, FP16 and BF16 is also significantly enhanced, where FP16
precision can provide up to 96 TFLOPS peak computing power, which enables MLU370-X4 to be more widely
used in AI scenarios that require floating-point operations. Its built-in brand-new hardware video and picture
codec can provide 1.4 times the video performance of the previous generation of the accelerator card, and can
process up to 16 channels of 8k 30fps high-definition video at the same time. When the system processes this
type of application, it effectively reduces the CPU pre-processing load and PCIe bandwidth occupation, helping
the application performance to be improved.
Cambricon Neuware End-Cloud Integrated Software Stack
The Cambricon Neuware Software Stack adopts an end-cloud integrated architecture, which supports the
full range of Cambricon’s products to share the same software interface and complete ecology, and can
facilitate the development, migration and optimization of AI applications. The Cambricon inference &
acceleration engine (MagicMind) dedicated to MLU370 provides end-to-end model representation, model
optimization and deployment capabilities, supports multiple frameworks, algorithm models in multiple business
scenarios, and supports multiple AI computing hardware platforms (MLU&CPU).
New Platform vMLU Brings More Virtualized Instance Support
Cambricon virtualization technology vMLU supports the realization of 8 isolated AI computing instances
on MLU370-X4. Each instance has exclusive computing, memory, and codec resources, and can still maintain a
high efficiency of no less than 90% in a virtualized environment, realize cloud virtualization and container-level
resource isolation, and help customers make full use of hardware resources.
MLU-Link™ and ROCE v2, Set up Training Clusters Flexibly
The Cambricon MLU-Link group multi-core interconnection technology supports the interconnection
between SIYUAN chips and cross-system interconnection, and can realize the vertical expansion of the
computing center and meet the needs of super-large AI model training. MLU370-X4 supports a maximum of
2*200Gbps MLU-Links data communication bandwidth between chips, and can build a training cluster without
relying on switches; it can also support a separate ROCEv2 network with 2*100Gbps bandwidth, as well as a
hybrid networking of MLU-LinkTM and ROCE v2, so that the large-scale expansion of the training cluster can
be realized.
3. Specifications of MLU370-X4
3.1. Performance Specifications
Table 3.1 MLU370-X4 Intelligent Accelerator Card Hardware Specifications
MLU370-X4
Cambricon MLUv03
Computation Accuracy Supporting
INT16, INT8, INT4, FP32, FP16/BF16
Type of Board Card
Core Architecture
Core Frequency
Video Decoding
Memory Capacity
Memory Bit Wide
Memory Bandwidth
System Interface
PCI Identifier
Shape
TDP Power Consumption
ECC Protection
Heat Dissipation Scheme
1 GHz
support
24GB
384-bit
300GB/s
150W
yes
passive
PCI Express 4.0 x16
support lane reversal
PCIe Vendor ID
PCIe Device ID
PCIe Sub-Vendor ID
PCIe Sub-System ID
FHFL Single Slot
0xCABC
0x0370
0xCABC
0x0057
3.2. Software Specifications
Table 3.2 describes the software specifications of MLU370-X4 intelligent accelerator card.
Table 3.2 MLU370-X4 Intelligent Accelerator Card Software Specifications
Interface
Description
Interface
Description
PF
one, 64bit
(
:
)
BAR0: 256 MB prefetchable
BAR2: 256 MB prefetchable
BAR4: 256 MB prefetchable
VF
four, 64bit
(
):
BAR0: 256 MB prefetchable
BAR2: 256 MB prefetchable
BAR4: 256 MB prefetchable
Slave, M: Master).
SMBus
8bit address
(
)
0x8E(Write
0x8F
Read
)
(
)
The bit width of the SMBUS register is 32 bits. Table 3.3 describes the reading process of the register (S:
Table 3.3 Reading and Writing Process of SMBus Register
Direction
Bits
Content
PCIE Base address
M->S
M->S
S->M
M->S
S->M
M->S
M->S
S->M
S->M
S->M
S->M
S->M
S->M
S->M
S->M
1
8
1
8
1
1
8
1
8
1
8
1
8
1
8
START
SLAVE ADDRESS(Write)
REGISTER ADDRESS
RE START
SLAVE ADDRESS(Read)
ACK
ACK
ACK
ACK
ACK
ACK
DATA[7:0]
DATA[15:8]
DATA[23:16]
DATA[31:24]
Bits
Content
Direction
M->S
M->S
1
1
NACK
STOP
The definition, address, and description of the SMBUS register are shown in the Table 3.4.
Table 3.4 Description of SMBus Register
Address
Access
Description
Definition of the
Register
Power Consumption
of the Board Card
Temperature of the
Board Card
Temperature of the
Chip
Temperature of DDR
Particles
0x01
0x02
0x03
0x04
RO
RO
RO
RO
[31:0] Power Consumption of the Board Card; Data Type:
float; Unit: W
[31:0] Temperature of the Board Card; Data Type: float;
Unit: ℃
[31:0] Temperature of the Chip; Data Type: float; Unit: ℃
[31:0] Temperature of DDR Particle; Data Type: float;
Unit: ℃
0x05
RW
Writing 0x04, the main frequency is reduced to 25% of the
power brake
current;
Writing 0x01, restore the level before frequency reduction
0x19
RO
[31:16] power capping
setting power consumption value
Setting State of the
Power Consumption
of the Board Card
[15:0] TDP Power Consumption
Data Type: uint16_t
Unit: W
0x20
RO
Bit0
whether the power brake may enable
State Information
Bit7
whether the frequency capping IPU may enable
:
:
:
:
:
Bit1
over-temperature and frequency reduction state
Bit[5:2]
reserved
:
Bit6
whether the power capping may enable
Bit[17:7]
reserved
:
Bit18
power capping do not preserve while power off
in-band .0
disable
:
1
:
;
enable
Definition of the
Register
Address
Access
Description
Bit19
power capping preserve while power off in-band .0
:
Bit20
power capping do not preserve while power off out of
disable
1
:
;
enable
:
:
band. 0
disable
1
enable
:
;
:
Bit21
:
disable
1
:
;
enable
Bit[31:22]
reserved
:
power capping preserve while poweroff out of band 0
:
0x23
RO
[31:16] reserved
[15:8] over-temperature power-off temperature
[7:0] over-temperature frequency reduction
Temperature
Threshold
Information
Data Type:uint8_t
Unit:
℃
0x29
RW
[31:16] reserved
Power capping
[14:0] Power Capping Value of the Board Card
[15] feature flag of power capping, 0
1:power down and save
:
temporary effect;
Data Type: uint15_t
Unit: W
(If the value is 0, the power capping is released.)
PCIE Vendor ID and
Device ID
PCIE Sub-Vendor ID
and Sub-System ID
PCIE_negotiated_spe
ed
PCIE_negotiated_link
_width
0xA0
RO
[31:16] Device ID:0x0370
[15:0] Vendor ID :0xCABC
0xA1
RO
[31:16] Sub-System ID : 0x0057
[15:0] Sub-Vendor ID:0xCABC
0xA2
RO
[7:0] display PCIE negotiated speed, for example, 0x04 means
gen4 16GT/s, 0x03 means gen3
5GT/s, 0x01 means gen1
8GT/s, 0x02 means gen2
2.5GT/s
0xA3
RO
[7:0] display PCIE negotiated width, for example, 0x16 means
0x08 means X8, 0x04 means X4, 0x02 means X2, 0x01
X16
;
means X1
Type of the Board
Card
0xF0
RO
[7:0] display the type of the board card, for example, 0x57
means X4 model.
Address
Access
Description
Definition of the
Register
Equipment
Manufacturer
Hardware Version
Number
Firmware Version
Number
Manufacturing Time
Serial Number
Lower SN Number
Higher SN Number
0xF1
0xF2
0xF3
0xF4
0xF5
0xF6
0xF7
RO
RO
RO
RO
RO
RO
RO
[3:0] display the serial number of the equipment manufacturer
[7:0] display the hardware version number, for example,
0x11means the hardware version V1.1.
[11:0] display the firmware version number, for example,
0x113 means that the main version number is 1, the
sub-version number is 1, and the patch number is 3.
[15:0] display the manufacturing time, for example,
0x2101means that the manufacturing time is January, 2021.
[19:0] display the serial number of the equipment, for
example, 0x00030 means that the serial number is 00030.
[31:0] low 8-bit data of SN number, for example, the low 8-bit
data of SN: 572101300030 is saved as 0x01300030.
[31:16] reserved
[15:0] high 4-bit data of SN number, for example, the high
4-bit data of SN
572101300030 is saved as 0x5721.
:
high 8-bit data of Part_number (the ASCII code corresponding
to the character)
Part_number_1
Part_number_2
Part_number_3
0xF8
RO
[31:0] “MLU3”
0xF9
RO
[31:0] “70-X”
middle 8-bit data of Part_number (the ASCII code
corresponding to the character)
0xFA
RO
[7:0] “4”
low 8-bit data of Part_number (the ASCII code corresponding
to the character)
Table 3.5 shows how to obtain SN information.
Table 3.5 SN Number Decomposition
SN Number
[47:40]
[39:24]
[23:20]
[19:0]
0x572101300030
Card
e.g., 0x57
Type of the Board
Manufacturing
Time
Equipment
Manufacturer
Serial Number
e.g., 0x00030
e.g., 0x2101
e.g., 0x3
3.3. Specifications of the Use Environment
Table 3.6 describes the specifications of the use environment of MLU370-X4 intelligent accelerator card.
Table 3.6 Specifications of the Use Environment of MLU370-X4 Intelligent Accelerating Card
Item
Operating Temperature
Storage Temperature
Operating Humidity
Storage Humidity
Value
0
℃~
45°C
-40
75
℃~
℃
5%—95% Relative Humidity
5%—95% Relative Humidity
3.4. Specifications of Structure and Dimension
The size structure and size of MLU370-X4 intelligent accelerator card are shown in Fig. 3.1:
Fig. 3.1 Size of MLU370-X4 Intelligent Accelerating Card
Toolless design is applied to the top cover of MLU370-X4. After the bracket is disassembled, the top cover
can be taken off directly for convenient disassembly and assembly.
Fig. 3.2 Toolless design apply to MLU370-X4 top cover
3.5. Size and Weight of the Package
The size and weight information of the package of MLU370-X4 intelligent accelerator card is shown in
Table 3.7:
Type
Single Card
Whole Case of
Industry
Weight
727g
14.1kg
Table 3.7 Size and Weight of the Package of MLU370-X4
Size
Remark
266.7 mm*111.15mm*18.3mm
NA
600mm*400mm*253mm
16 Cards Per Box
Remarks: the weight is an actual measured value, tolerance +-10%
3.6. Heat Dissipation Specifications
3.6.1. MLU370-X4’s Board Card Power Consumption and Temperature
Definitions
Table 3.8 Specification of the Use Environment of MLU370-X4 Intelligent Accelerating Card
Items
Parameters
Thermal Design Power
(TDP) of Whole Board Card
Recommended Operating Tj(Junction temperature) of MLU
150W
0-90
℃
92
℃
50%
95
℃
Frequency Drop Tj of MLU
Frequency Drop Range of MLU
Shutdown Tj of MLU
3.6.2. Resistance Curve of the Radiator of MLU370-X4
The resistance curve measured by the radiator of MLU370-X4 is shown in Fig. 3.3:
Fig. 3.3 Resistance Curve of the Radiator of MLU370-X4
The comparison table of the air flow of heat dissipation and pressure drop of the board card is shown
Table 3.9 MLU370-X4 Board Card Air Flow of the Radiator- Pressure Drop of
Air Flow
CFM
(
)
Wind Pressure (Pa)
in Table 3.9:
6.3
7.5
60
85
8.9
10.3
14.2
120
161
306
3.6.3. MLU370-X4 Supported Card Direction
The air inlet direction of MLU370-X4 is shown in Fig 3.4:
Fig. 3.4 Airflow Direction for PCIE Card
3.6.4. MLU370-X4 Supported Ambient Temperature for Working and
Minimum Airflow Volume Requirements of the Radiator at Different
Temperature
MLU370-X4 can work (TDP mode) at the ambient temperature of 0-45
(air intake temperature of the
radiator of board card). The minimum airflow requirements under main temperature conditions are shown in the
Table 3.10:
℃
Table 3.10 MLU370-X4 Minimum Air Flow Requirement of the Radiator vs Ambient Thermometer
Temperature of the Inlet
Minimum Air Flow Requirement of the Radiator
CFM)
(℃)
(
25
6.3
30
35
40
45
7.5
8.9
10.3
14.2
3.6.5. A Curve for the Average Temperature of the Inlet and the Minimum
Air Flow Requirement through the Radiator of MLU370-X4
Fig. 3.5 Inlet Temperature versus Airflow Requirement
3.7. Power specifications and Electrical Specifications
The input voltage of the power interface and current specifications are shown in Table 3.11 and Table
3.12.
Table 3.11 Power Interface and Input Voltage
Power Interface
Minimum Voltage
Normal Voltage
Maximum Voltage
PCIe Gold Finger (12V
11.04V
)
CPU 8-pin connector
12V
11.04V
(
)
12V
12V
12.96V
12.96V
PCIe Gold Finger (3V3
3.0V
3.3V
3.63V
)
Power Interface
Peak Current
Moving Average
Table 3.12 Current Specification
PCIe Gold Finger (12V
)
CPU 8-pin
12V
(
)
20A
17A
13A
33A
30A
25A
The specification of Power Capping is shown in Table 3.13
:
Table 3.13 Power Capping
Item
Power Capping Threshold
Power Capping Response time
typical
(
)
Power Capping Response time
max
(
)
The specification of Power Brake is shown in Table 3.14
:
Table 3.14 Power Brake
Item
PB# PCIe pin assignment
Power Brake response time
typical
(
)
PB# input insertion low time
min
(
)
Power brake hardware slowdown factor
200us
1ms
5ms
200us
1ms
5ms
Value
150W
50ms
100ms
Value
B30
150us
250ms
4x
4. Development Environment of Cambricon NeuWare
NeuWare can fully support all kinds of mainstream programming frameworks, such as TensorFlow, Caffe,
PyTorch, and MXNet. With the above mentioned programming frameworks, users can easily and conveniently
develop and deploy their deep learning applications on Cambricon MLU370-X4. At the same time, NeuWare
provides complete runtime system and driver software to speed up the system integration procedure.
NeuWare further provides a series of tools including application development, function debugging and
performance optimization. The application development tools include machine learning library, runtime library,
compiler, model retraining tools and domain-specific (e.g., video analysis) SDK; the function debugging tools
can fulfill all the requirements from different levels of programming framework and function library; the
performance optimization tools include tools for performances analysis and system monitoring.
The Cambricon inference acceleration engine (MagicMind) provides end-to-end model representation,
model optimization and deployment capabilities, supports multiple frameworks, algorithm models in multiple
business scenarios, and supports multiple AI computing hardware platforms (MLU&CPU).
Fig. 4.1 Cambricon NeuWare
For more information, please visit www.cambricon.com
Tel: 86-10-83030003
Email: business@cambricon.com
Address: 11th Floor, Block D, Truth Plaza, No. 7 Zhichun Road, Haidian District, Beijing, China
5. Compliance
The MLU370-X Series is compliant with the regulations listed in this section. Compliance
marks, including the FCC ID numbers, can be found on the labels of each devices.
United States
Federal Communications Commission (FCC)
This device complies with Part 15 of the FCC Rules.
Operation is subject to the following two conditions: (1) This device may not cause harmful interference,
and (2) this device must accept any interference received, including interference that may cause undesired
operation.
This equipment has been tested and found to comply with the limits for a Class B digital device, pursuant
to Part 15 of the FCC Rules. These limits are designed to provide reasonable protection against harmful
interference in a residential installation. This equipment generates, uses and can radiate radio frequency energy
and,
if not
installed in accordance with the instructions, may cause harmful
interference to radio
communications. However, there is no guarantee that interference will not occur in a particular installation.
If this equipment does cause interference to radio or television reception, which can be determined by
turning the equipment off and on, the user is encouraged to try to correct the interference by one or more of the
following measures:
Reorient or relocate the receiving antenna
Increase the separation between the equipment and receiver
Connect the equipment into an outlet on a circuit different from that to which the receiver is connected
Consult the dealer or an experienced radio/TV technician for help
Caution: Any changes or modifications not expressly approved by the party responsible for compliance
could void the user's authority to operate this equipment.
Underwriters Laboratories (UL)
UL Listed Product Logo for MLU370-X Series Intelligent Processing Cards
model name MLU370-X.
,