GitHub Models

In this section of the workshop you will start to write code again actual LLMs, including those from OpenAI! You're going to use a service called GitHub Models that hosts LLMs. You can use GitHub Models, with limits, for free. All you need is the same free, personal GitHub account that you used to create a GitHub Codespace in the previous section.

To get started with GitHub Models go to https://gh.io/models.

This will take you to GitHub Models on the GitHub Marketplace. And you can see a few of the models hosted. Click the link to explore the full model catalog to see the entire list of models hosted on GitHub Models.

You can filter this list. Click the Publisher dropdown, and select Azure OpenAI Service. This shows all of the models from the Azure OpenAI Service. It includes well known models like GPT-4o and GPT-4.1. Let's take a closer look at one of the models. Click on the card for OpenAI GPT-4.1-mini.

You will see the information page for the GPT-4.1-mini model. Direct your attention in the right sidebar to the Free rate limit tier. Since you are using GitHub Models for free, you will be affected by this. For GPT-4.1-mini, the rate limit tier is Low. Click on it to see what this means.

You'll see a table with the rate limits for different tiers, models and pricing plans. For GitHub Models, your usage will fall under the Copilot Free pricing plans. Notice that the Low tier allows you 15 requests per minute and 150 requests per day. Granted in comparison to a production application, this is not a lot. But it's actually quite generous for exploring the model's capabilities and more than enough for this workshop.

Also notice the tokens per request. For the Low rate limit tier this is 8000 tokens in and 4000 tokens out. The "tokens in" is the maximum number of tokens that may be submitted with a request. Recall that a token is on average, about 3/4 of a word so 8000 tokens is approximately 6000 words. This comes out to about 15-20 written pages. The "tokens out" is the maximum number of tokens that is generated by the LLMs. So 4000 tokens would be approximately 3000 words. And you can do this 150 times a day.

Notice that the High rate limit tier is more restrictive. This limit is applied to the larger models. There are also special limits applied to the embedding models that we won't make use of in this workshop. And there are also some models with custom rate limits that depend on the model itself. For example, Grok-3 and DeepSeek. And for GPT-5 and the OpenAI reasoning models, those are not available with the Copilot Free tier. But we will still have plenty of free models to use for the workshop.

Go back to the model information page. Click on the Playground button in the upper right of the page. This brings up a ChatGPT-like interface where you can interact with the model. Enter a prompt in the text box at the bottom with the placeholder Type your prompt.... Something like:

What are three advantages of OpenAI GPT-4.1 over GPT-3.5?

Again, similar to ChatGPT, the playground will display the generated response and render any markdown. Also notice in the upper left of the playground, you can see the total number of input tokens, output tokens and the time to generate the response.

In the left sidebar you can again see information about the model For example, the training cutoff date is May of 2024. And the Context is the number of tokens for the input and output per request if you are using a paid pricing plan. For the free plan, you can ignore this because as we just saw, you are restricted to 8000 tokens in and 4000 tokens out.

At the top of the sidebar, click the Parameters tab. Here you can set values to configure how the model behaves.