Overview

The Phi-3-Mini-128K-Instruct is a state-of-the-art, lightweight model featuring 3.8 billion parameters. It is optimized for English language applications in both commercial and research environments. This model is a member of the Phi-3 series and supports up to 128K tokens in context length. The training dataset includes a mix of synthetic data and high-quality, filtered web content with a focus on dense reasoning tasks.

Training and Performance

Following its initial training, the model underwent further enhancement via supervised fine-tuning and preference optimization, aiming to boost its compliance with instructions and safety protocols. In performance benchmarks that test various capabilities such as common sense, language understanding, mathematics, coding, long-term context management, and logical reasoning, Phi-3 Mini-128K-Instruct stood out for its robust capabilities among models with fewer than 13 billion parameters.

Intended Uses and Applications

Phi-3 Mini-128K-Instruct is designed for scenarios that are constrained by memory and computing resources or require quick response times. It is particularly effective for tasks requiring strong reasoning abilities, including coding, mathematics, and logic. This model serves as a foundational tool for advancing research on language and multimodal models and is suitable for developing AI-powered generative features.

Considerations for Use

While versatile, the model is not specifically tailored for all potential applications. Developers should evaluate the model thoroughly for accuracy, safety, and fairness, especially in high-risk applications, and ensure compliance with all relevant legal and regulatory requirements.

Technical Integration

The model is incorporated into the development version (4.40.0) of the Transformers library. Users should ensure they enable trust_remote_code=True when loading the model and consider updating their local Transformers installation to the development version for optimal performance. The accompanying tokenizer supports a vocabulary size of up to 32064 tokens and offers flexibility for extension in various applications.

Sample inference code 

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(

    “microsoft/Phi-3-mini-128k-instruct”, 

    device_map=”cuda”, 

    torch_dtype=”auto”, 

    trust_remote_code=True, 

)

tokenizer = AutoTokenizer.from_pretrained(“microsoft/Phi-3-mini-128k-instruct”)

messages = [

    {“role”: “system”, “content”: “You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user.”},

    {“role”: “user”, “content”: “Can you provide ways to eat combinations of bananas and dragonfruits?”},

    {“role”: “assistant”, “content”: “Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey.”},

    {“role”: “user”, “content”: “What about solving an 2x + 3 = 7 equation?”},

]

pipe = pipeline(

    “text-generation”,

    model=model,

    tokenizer=tokenizer,

)

generation_args = {

    “max_new_tokens”: 500,

    “return_full_text”: False,

    “temperature”: 0.0,

    “do_sample”: False,

}

output = pipe(messages, **generation_args)

print(output[0][‘generated_text’])

This summary provides a concise overview of the Phi-3-Mini-128K-Instruct’s capabilities, intended uses, and technical integration details, designed to guide users and developers in effectively leveraging this advanced AI tool.