How Azure OpenAI Model Update Works

This Azure OpenAI article will explain how Azure OpenAI model updates work and how you should configure your model deployment.

As OpenAI and Azure develop new AI models, they also retire them every few months to introduce new ones.

Azure gives us the following option when deploying an OpenAI model from the Azure OpenAI portal.

  • Select a specific model (Preview, for example)
  • Select the default model
  • Auto-update to default

The above options work for different use cases; however, if you want to be safe and prevent unwanted outages in case a model has retired, your best option is to select the Auto-update to default option.

The following screenshot shows the options when selecting a model:

If you are using GPT-4 0314, which is not available to choose from, you will be upgraded in July automatically.

Deprecation Vs Retirement

Another essential concept and terminology is understanding the difference between the Depreciation and retirement of models in Azure OpenAI.

When Azure OpenAI retires a model, it is no longer available to consume, and the REST API service that belongs to the service returns an error. When a model is deprecated, the API still works for existing customers until the model is retired.

Understanding how to select Azure OpenAI models is essential for outrage-free deployments, and if you decide to go with a non-default model, make sure you have the right process to update the model before it gets to its retirement phase.

Preview Models

Preview models offer the latest features and capacity of tokens. Tokens limitation shows the capacity of a model, and the higher the limit, the more input and output a model can process. For example, GPT-4 32K offers a 32K token limit compared to GPT-4 0613, which has an 8K limit.

In the context of the preview model, if you decide to use a preview model, using it in a production environment is not recommended.

Visit our YouTube channel

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.