mindformers.models¶

models init

mindformers.models¶

`mindformers.models.BaseConfig`	Base Config for all models' config
`mindformers.models.BaseImageProcessor`	This is an base image processor used to provide basic image processing functions for sequential and image feature extractors.
`mindformers.models.PreTrainedModel`	Base class for all models.
`mindformers.models.BaseProcessor`	Base processor
`mindformers.models.PreTrainedTokenizer`	Base class for all slow tokenizers.

mindformers.models.bert¶

`mindformers.models.bert.BertConfig`	BERT config class which defines the model size.
`mindformers.models.bert.BertModel`	Bidirectional Encoder Representations from Transformers.
`mindformers.models.bert.BertForMultipleChoice`	Bert with dense layer for txt classification task.
`mindformers.models.bert.BertForPreTraining`	Provide bert pre-training loss through network.
`mindformers.models.bert.BertForQuestionAnswering`	Bert with dense layer for question answering task.
`mindformers.models.bert.BertTokenizer`	Construct a BERT tokenizer.
`mindformers.models.bert.BertProcessor`	Bert processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input. Args: tokenizer (PreTrainedTokenizerBase): The tokenizer of BertModel. max_length (int, optional, defaults to 128): The maximum length (in number of tokens) for the inputs to BertModel. padding (str, optional, defaults to max_length): Activates and controls padding. Accepts the following values:.

mindformers.models.t5¶

`mindformers.models.t5.T5Config`	T5 config class which defines the model size
`mindformers.models.t5.T5ForConditionalGeneration`	A T5 model with the loss added.
`mindformers.models.t5.T5Processor`	T5 processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input.
`mindformers.models.t5.T5Tokenizer`	Construct a T5 tokenizer.

mindformers.models.clip¶

`mindformers.models.clip.CLIPConfig`	Config For CLIP Model
`mindformers.models.clip.CLIPVisionConfig`	Config For CLIP Vision Module
`mindformers.models.clip.CLIPTextConfig`	Config For CLIP Text Module
`mindformers.models.clip.CLIPModel`	CLIPModel.
`mindformers.models.clip.CLIPTokenizer`	CLIP Tokenizer
`mindformers.models.clip.CLIPProcessor`	CLIP Processor, consists of a feature extractor (BaseFeatureEXtractor) for image input, and a tokenizer (PreTrainedTokenizerBase) for text input.
`mindformers.models.clip.CLIPImageProcessor`	CLIPImageProcessor.

mindformers.models.mae¶

`mindformers.models.mae.ViTMAEConfig`	Config for Mae model
`mindformers.models.mae.ViTMAEForPreTraining`	Pretrain MAE Module.
`mindformers.models.mae.ViTMAEProcessor`	ViTMAEProcessor, consists of a feature extractor (BaseFeatureEXtractor) for image input.
`mindformers.models.mae.ViTMAEImageProcessor`	ViTMAEImageProcessor.

mindformers.models.swin¶

`mindformers.models.swin.SwinConfig`	Swin config class which defines the model size
`mindformers.models.swin.SwinModel`	Swin Transformer.
`mindformers.models.swin.SwinForImageClassification`	Swin Transformer Model.
`mindformers.models.swin.SwinImageProcessor`	SwinImageProcessor.
`mindformers.models.swin.SwinProcessor`	Swin processor, consists of a feature extractor (BaseFeatureEXtractor) for image input.

mindformers.models.vit¶

`mindformers.models.vit.ViTConfig`	Config for ViT model
`mindformers.models.vit.ViTModel`	Vision Transformer with support for patch or hybrid CNN input stage.
`mindformers.models.vit.ViTProcessor`	Vit processor, consists of a feature extractor (BaseFeatureEXtractor) for image input, and a tokenizer (PreTrainedTokenizerBase) for text input.
`mindformers.models.vit.ViTImageProcessor`	ViTImageProcessor.
`mindformers.models.vit.ViTForImageClassification`	Vision Transformer with support for patch or hybrid CNN input stage.

mindformers.models.gpt2¶

`mindformers.models.gpt2.GPT2Config`	Gpt config class which defines the model size
`mindformers.models.gpt2.GPT2Model`	The backbone of GPT network
`mindformers.models.gpt2.GPT2LMHeadModel`	Provide gpt training loss or logits through network. Args: config (GPT2Config): The config of Gpt2Model.
`mindformers.models.gpt2.GPT2Tokenizer`	Construct a GPT-2 tokenizer.
`mindformers.models.gpt2.GPT2Processor`	GPT2 processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input. Args: tokenizer (PreTrainedTokenizerBase): The tokenizer of GPTModel. max_length (int, optional, defaults to 128): The maximum length (in number of tokens) for the inputs to GPTModel. padding (str, optional, defaults to max_length): Activates and controls padding. Accepts the following values:.

mindformers.models.glm¶

`mindformers.models.glm.GLMConfig`	GLM config class which defines the model size Args: batch_size (int, optional, defaults to 1): batch size for input data, use in predict. vocab_size (int, optional, defaults to 130528): Vocabulary size of the GLM model. Defines the maximum number of different tokens that can be represented by the inputs_ids passed when calling [GLMModel]. hidden_size (int, optional, defaults to 4096): Dimensionality of the embeddings and hidden states. num_layers (int, optional, defaults to 28): Number of hidden layers in the Transformer encoder. num_heads (int, optional, defaults to 32): Number of attention heads for each attention layer in the Transformer encoder. inner_hidden_size (int, optional, defaults to 16384): Dimensionality of hidden states in FeedForward. seq_length (int, optional, defaults to 512): The sequence length of input_ids, default is 512. embedding_dropout_prob (float, optional, defaults to 0.0): Dropout rate applied to the embedding probs. attention_dropout_rate (float, optional, defaults to 0.1): Dropout rate applied to the attention probs. hidden_size_per_attention_head (int, optional, defaults to None): hidden size per attention head. default "None" means hidden-size/num-attention-heads. layernorm_order (str, optional, defaults to post): define where is the layernorm added in transformer layers, support "pre" "post" "sandwich", default is "post". layernorm_epsilon (float, optional, defaults to 1.0e-5): epsilon value in layernorm, default is 1.0e-5. use_final_layernorm (bool, optional, defaults to True): whether to use final layernorm or not after all layers, default is True. embed_parallel_config(EmbeddingOpParallelConfig): The parallel configure. Default default_embedding_parallel_config, an instance of TransformerOpParallelConfig with default args. parallel_config(TransformerOpParallelConfig): The parallel configure. Default default_transformer_config, an instance of TransformerOpParallelConfig with default args. moe_config(MoEConfig): The configuration of MoE (Mixture of Expert). Default is an instance of MoEConfig with default values. Please see MoEConfig. use_past (bool, optional, defaults to False): Whether or not the model should use the past last key/values attentions (if applicable to the model) to speed up decoding. only available for generation. activation_func (str, optional, defaults to GELU): The activate function used in Linear, default is GELU. position_encoding_2d (bool, optional, defaults to True): Whether to use 2d format of position encoding for GLM model, default is True. param_init_type (str, optional, defaults to = "float16"): Network parameter initialization type, default is "float16". layernorm_compute_type (str, optional, defaults to = "floa32"): compute dtype for layernorm, default is "float32". softmax_compute_type (str, optional, defaults to = "floa32"): compute dtype for softmax, default is "float32". compute_dtype (str, optional, defaults to = "floa16"): compute dtype for network, default is "float16". bos_token_id (int, optional, defaults to 130004): A special token representing the beginning of a sentence. eos_token_id (int, optional, defaults to 130005): A special token representing the end of a sentence. mask_token_id (int, optional, defaults to 130000): A special token representing an mask token. gmask_token_id (int, optional, defaults to 130000): A special token representing an gmask token. pad_token_id (int, optional, defaults to 3): A special token used to make arrays of tokens the same size for batching purpose. Will then be ignored by attention mechanisms or loss computation. is_enhanced_encoder (bool, optional, defaults to True): glm specified branch control, deprecated. is_sample_acceleration (bool, optional, defaults to False): Whether to do sample in construct to accelerate generation. This can accelerate post process a bit during generation, but will lose the flexibility of generation config, not commended. Default to False. checkpoint_name_or_path (str, optional, defaults to "") checkpoint path or name used to load to the network. max_decode_length (int, optional, defaults to 2048): The maximum length the generated tokens can have. top_k (int, optional, defaults to 5): The number of highest probability vocabulary tokens to keep for top-k-filtering. top_p (float, optional, defaults to 1.0): If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. repetition_penalty (float, optional, defaults to 1.0): The parameter for repetition penalty. 1.0 means no penalty. See [this paper](https://arxiv.org/pdf/1909.05858.pdf) for more details. do_sample (bool, optional, defaults to False): Whether or not to use sampling; use greedy decoding otherwise. ignore_index (int, optional, defaults to -100): index that will be ignored in input_ids and labels for training.
`mindformers.models.glm.GLMChatModel`	Provide glm chat capability through network. Args: config (GLMConfig): The config of GLMModel.
`mindformers.models.glm.GLMForPreTraining`	Provide glm training loss or logits through network.
`mindformers.models.glm.ChatGLMTokenizer`	Construct a ChatGLM tokenizer.
`mindformers.models.glm.GLMProcessor`	GLM processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input.

mindformers.models.llama¶

`mindformers.models.llama.LlamaConfig`	LLaMA config class which defines the model size.
`mindformers.models.llama.LlamaModel`	Transformer decoder consisting of config.num_hidden_layers layers. Each layer is a [LlamaDecoderLayer] Args: config(LlamaConfig): the config of network.
`mindformers.models.llama.LlamaForCausalLM`	Provide llama training loss or logits through network.
`mindformers.models.llama.LlamaTokenizer`	Construct a Llama tokenizer.
`mindformers.models.llama.LlamaProcessor`	Llama processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input. Args: tokenizer (PreTrainedTokenizerBase): The tokenizer of LlamaModel. max_length (int, optional, defaults to 128): The maximum length (in number of tokens) for the inputs to LlamaModel. padding (str, optional, defaults to max_length): Activates and controls padding. Accepts the following values:.

mindformers.models.bloom¶

`mindformers.models.bloom.BloomConfig`	Bloom config class which defines the model size.
`mindformers.models.bloom.BloomModel`	The backbone of Bloom network
`mindformers.models.bloom.BloomLMHeadModel`	Provide bloom training loss or logits through network.
`mindformers.models.bloom.BloomTokenizer`	Tokenize the input string and convert them into the ids.
`mindformers.models.bloom.BloomProcessor`	Bloom processor, consists of a tokenizer (PreTrainedTokenizerBase) for text input. Args: tokenizer (PreTrainedTokenizerBase): The tokenizer of Bloom. max_length (int, optional, defaults to 128): The maximum length (in number of tokens) for the inputs to Bloom. padding (str, optional, defaults to max_length): Activates and controls padding. Accepts the following values:.