mindformers.modules¶
MindFormers Transformers API.
mindformers.modules.layers¶
A Dropout Implements with P.Dropout and P.DropoutDoMask for parallel training. |
|
Fixed Sparse Attention Layer. |
|
A self-defined layer norm operation using reduce sum and reduce mean |
|
The dense connected layer. |
mindformers.modules.transformer¶
Get the Lower triangular matrix from the input mask. |
|
The parallel config of |
|
The multilayer perceptron with two linear layers with dropout applied at final output. |
|
The configuration of MoE (Mixture of Expert). |
|
This is an implementation of multihead attention in the paper Attention is all you need. |
|
OpParallelConfig for the setting data parallel and model parallel. |
|
Transformer module including encoder and decoder. |
|
Transformer Decoder module with multi-layer stacked of TransformerDecoderLayer, including multihead self attention, cross attention and feedforward layer. |
|
Transformer Decoder Layer. |
|
Transformer Encoder module with multi-layer stacked of TransformerEncoderLayer, including multihead self attention and feedforward layer. |
|
Transformer Encoder Layer. |
|
TransformerOpParallelConfig for setting parallel configuration, such as the data parallel and model parallel. |
|
TransformerRecomputeConfig for the setting recompute attributes for encoder/decoder layers. |
|
The embedding lookup table from the 0-th dim of the parameter table. |