mindformers.models

models init

mindformers.models

mindformers.models.BaseConfig

Base Config for all models' config

mindformers.models.BaseImageProcessor

This is an base image processor used to provide basic image processing functions for sequential and image feature extractors.

mindformers.models.PreTrainedModel

Base class for all models.

mindformers.models.BaseProcessor

Base processor

mindformers.models.PreTrainedTokenizer

Base class for all slow tokenizers.

mindformers.models.bert

mindformers.models.bert.BertConfig

BERT config class which defines the model size.

mindformers.models.bert.BertModel

Bidirectional Encoder Representations from Transformers.

mindformers.models.bert.BertForMultipleChoice

Bert with dense layer for txt classification task.

mindformers.models.bert.BertForPreTraining

Provide bert pre-training loss through network.

mindformers.models.bert.BertForQuestionAnswering

Bert with dense layer for question answering task.

mindformers.models.bert.BertTokenizer

Construct a BERT tokenizer.

mindformers.models.bert.BertProcessor

Bert processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input. Args: tokenizer (PreTrainedTokenizerBase): The tokenizer of BertModel. max_length (int, optional, defaults to 128): The maximum length (in number of tokens) for the inputs to BertModel. padding (str, optional, defaults to max_length): Activates and controls padding. Accepts the following values:.

mindformers.models.t5

mindformers.models.t5.T5Config

T5 config class which defines the model size

mindformers.models.t5.T5ForConditionalGeneration

A T5 model with the loss added.

mindformers.models.t5.T5Processor

T5 processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input.

mindformers.models.t5.T5Tokenizer

Construct a T5 tokenizer.

mindformers.models.clip

mindformers.models.clip.CLIPConfig

Config For CLIP Model

mindformers.models.clip.CLIPVisionConfig

Config For CLIP Vision Module

mindformers.models.clip.CLIPTextConfig

Config For CLIP Text Module

mindformers.models.clip.CLIPModel

CLIPModel.

mindformers.models.clip.CLIPTokenizer

CLIP Tokenizer

mindformers.models.clip.CLIPProcessor

CLIP Processor, consists of a feature extractor (BaseFeatureEXtractor) for image input, and a tokenizer (PreTrainedTokenizerBase) for text input.

mindformers.models.clip.CLIPImageProcessor

CLIPImageProcessor.

mindformers.models.mae

mindformers.models.mae.ViTMAEConfig

Config for Mae model

mindformers.models.mae.ViTMAEForPreTraining

Pretrain MAE Module.

mindformers.models.mae.ViTMAEProcessor

ViTMAEProcessor, consists of a feature extractor (BaseFeatureEXtractor) for image input.

mindformers.models.mae.ViTMAEImageProcessor

ViTMAEImageProcessor.

mindformers.models.swin

mindformers.models.swin.SwinConfig

Swin config class which defines the model size

mindformers.models.swin.SwinModel

Swin Transformer.

mindformers.models.swin.SwinForImageClassification

Swin Transformer Model.

mindformers.models.swin.SwinImageProcessor

SwinImageProcessor.

mindformers.models.swin.SwinProcessor

Swin processor, consists of a feature extractor (BaseFeatureEXtractor) for image input.

mindformers.models.vit

mindformers.models.vit.ViTConfig

Config for ViT model

mindformers.models.vit.ViTModel

Vision Transformer with support for patch or hybrid CNN input stage.

mindformers.models.vit.ViTProcessor

Vit processor, consists of a feature extractor (BaseFeatureEXtractor) for image input, and a tokenizer (PreTrainedTokenizerBase) for text input.

mindformers.models.vit.ViTImageProcessor

ViTImageProcessor.

mindformers.models.vit.ViTForImageClassification

Vision Transformer with support for patch or hybrid CNN input stage.

mindformers.models.gpt2

mindformers.models.gpt2.GPT2Config

Gpt config class which defines the model size

mindformers.models.gpt2.GPT2Model

The backbone of GPT network

mindformers.models.gpt2.GPT2LMHeadModel

Provide gpt training loss or logits through network. Args: config (GPT2Config): The config of Gpt2Model.

mindformers.models.gpt2.GPT2Tokenizer

Construct a GPT-2 tokenizer.

mindformers.models.gpt2.GPT2Processor

GPT2 processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input. Args: tokenizer (PreTrainedTokenizerBase): The tokenizer of GPTModel. max_length (int, optional, defaults to 128): The maximum length (in number of tokens) for the inputs to GPTModel. padding (str, optional, defaults to max_length): Activates and controls padding. Accepts the following values:.

mindformers.models.glm

mindformers.models.glm.GLMConfig

GLM config class which defines the model size Args: batch_size (int, optional, defaults to 1): batch size for input data, use in predict. vocab_size (int, optional, defaults to 130528): Vocabulary size of the GLM model. Defines the maximum number of different tokens that can be represented by the inputs_ids passed when calling [GLMModel]. hidden_size (int, optional, defaults to 4096): Dimensionality of the embeddings and hidden states. num_layers (int, optional, defaults to 28): Number of hidden layers in the Transformer encoder. num_heads (int, optional, defaults to 32): Number of attention heads for each attention layer in the Transformer encoder. inner_hidden_size (int, optional, defaults to 16384): Dimensionality of hidden states in FeedForward. seq_length (int, optional, defaults to 512): The sequence length of input_ids, default is 512. embedding_dropout_prob (float, optional, defaults to 0.0): Dropout rate applied to the embedding probs. attention_dropout_rate (float, optional, defaults to 0.1): Dropout rate applied to the attention probs. hidden_size_per_attention_head (int, optional, defaults to None): hidden size per attention head. default "None" means hidden-size/num-attention-heads. layernorm_order (str, optional, defaults to post): define where is the layernorm added in transformer layers, support "pre" "post" "sandwich", default is "post". layernorm_epsilon (float, optional, defaults to 1.0e-5): epsilon value in layernorm, default is 1.0e-5. use_final_layernorm (bool, optional, defaults to True): whether to use final layernorm or not after all layers, default is True. embed_parallel_config(EmbeddingOpParallelConfig): The parallel configure. Default default_embedding_parallel_config, an instance of TransformerOpParallelConfig with default args. parallel_config(TransformerOpParallelConfig): The parallel configure. Default default_transformer_config, an instance of TransformerOpParallelConfig with default args. moe_config(MoEConfig): The configuration of MoE (Mixture of Expert). Default is an instance of MoEConfig with default values. Please see MoEConfig. use_past (bool, optional, defaults to False): Whether or not the model should use the past last key/values attentions (if applicable to the model) to speed up decoding. only available for generation. activation_func (str, optional, defaults to GELU): The activate function used in Linear, default is GELU. position_encoding_2d (bool, optional, defaults to True): Whether to use 2d format of position encoding for GLM model, default is True. param_init_type (str, optional, defaults to = "float16"): Network parameter initialization type, default is "float16". layernorm_compute_type (str, optional, defaults to = "floa32"): compute dtype for layernorm, default is "float32". softmax_compute_type (str, optional, defaults to = "floa32"): compute dtype for softmax, default is "float32". compute_dtype (str, optional, defaults to = "floa16"): compute dtype for network, default is "float16". bos_token_id (int, optional, defaults to 130004): A special token representing the beginning of a sentence. eos_token_id (int, optional, defaults to 130005): A special token representing the end of a sentence. mask_token_id (int, optional, defaults to 130000): A special token representing an mask token. gmask_token_id (int, optional, defaults to 130000): A special token representing an gmask token. pad_token_id (int, optional, defaults to 3): A special token used to make arrays of tokens the same size for batching purpose. Will then be ignored by attention mechanisms or loss computation. is_enhanced_encoder (bool, optional, defaults to True): glm specified branch control, deprecated. is_sample_acceleration (bool, optional, defaults to False): Whether to do sample in construct to accelerate generation. This can accelerate post process a bit during generation, but will lose the flexibility of generation config, not commended. Default to False. checkpoint_name_or_path (str, optional, defaults to "") checkpoint path or name used to load to the network. max_decode_length (int, optional, defaults to 2048): The maximum length the generated tokens can have. top_k (int, optional, defaults to 5): The number of highest probability vocabulary tokens to keep for top-k-filtering. top_p (float, optional, defaults to 1.0): If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. repetition_penalty (float, optional, defaults to 1.0): The parameter for repetition penalty. 1.0 means no penalty. See [this paper](https://arxiv.org/pdf/1909.05858.pdf) for more details. do_sample (bool, optional, defaults to False): Whether or not to use sampling; use greedy decoding otherwise. ignore_index (int, optional, defaults to -100): index that will be ignored in input_ids and labels for training.

mindformers.models.glm.GLMChatModel

Provide glm chat capability through network. Args: config (GLMConfig): The config of GLMModel.

mindformers.models.glm.GLMForPreTraining

Provide glm training loss or logits through network.

mindformers.models.glm.ChatGLMTokenizer

Construct a ChatGLM tokenizer.

mindformers.models.glm.GLMProcessor

GLM processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input.

mindformers.models.llama

mindformers.models.llama.LlamaConfig

LLaMA config class which defines the model size.

mindformers.models.llama.LlamaModel

Transformer decoder consisting of config.num_hidden_layers layers. Each layer is a [LlamaDecoderLayer] Args: config(LlamaConfig): the config of network.

mindformers.models.llama.LlamaForCausalLM

Provide llama training loss or logits through network.

mindformers.models.llama.LlamaTokenizer

Construct a Llama tokenizer.

mindformers.models.llama.LlamaProcessor

Llama processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input. Args: tokenizer (PreTrainedTokenizerBase): The tokenizer of LlamaModel. max_length (int, optional, defaults to 128): The maximum length (in number of tokens) for the inputs to LlamaModel. padding (str, optional, defaults to max_length): Activates and controls padding. Accepts the following values:.

mindformers.models.bloom

mindformers.models.bloom.BloomConfig

Bloom config class which defines the model size.

mindformers.models.bloom.BloomModel

The backbone of Bloom network

mindformers.models.bloom.BloomLMHeadModel

Provide bloom training loss or logits through network.

mindformers.models.bloom.BloomTokenizer

Tokenize the input string and convert them into the ids.

mindformers.models.bloom.BloomProcessor

Bloom processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input. Args: tokenizer (PreTrainedTokenizerBase): The tokenizer of Bloom. max_length (int, optional, defaults to 128): The maximum length (in number of tokens) for the inputs to Bloom. padding (str, optional, defaults to max_length): Activates and controls padding. Accepts the following values:.