在5分钟内使用Python开始使用Google Gemini Pro

免费使用强大的大语言模型功能！

Google Gemini — Source: Bard becomes Gemini

简介

Google Gemini Pro是Google最新的AI模型Gemini的一部分，宣称是迄今为止该公司最强大和最通用的AI模型。这标志着Google在AI发展上迈出了重要的一步，旨在通过最先进的性能处理各种任务，并在许多领先的基准测试中表现出色。Gemini Pro与Gemini Ultra和Gemini Nano一起被引入，标志着Google DeepMind所称的Gemini时代的开始，旨在利用AI的能力为世界各地的人们开启新的机遇。

Gemini Pro在2024年1月与Samsung合作，将Gemini Nano和Gemini Pro整合到Galaxy S24智能手机系列中，全球上市。事实上，即使在撰写本文时（2024年2月8日），他们的ChatGPT竞争对手助手应用Bard也被更名为Gemini。我们还通过Google One订阅服务的AI高级版推出了“Gemini Advanced with Ultra 1.0”。

Gemini Pro的一个关键特性是其API，它旨在允许开发者快速开发和集成基于人工智能的功能到他们的应用程序中。该API支持多种编程语言，包括Python，本文将使用Python来向您展示如何免费开始使用Gemini Pro的大型语言模型（截至2024年2月）。

双子座基本要素

Google的Gemini是一套设计用于处理各种任务的AI模型，包括使用文本和图像输入进行内容生成和问题解决。以下是可以通过API轻松访问的不同Gemini模型的简要概述：

双子座API定价

在撰写本文的时刻，即2024年2月13日，Gemini Pro API可以免费使用，但是我的直觉告诉我他们很快会引入基于代币的定价，你可以从他们的网站上看到以下截图。

开始使用Gemini Pro和Python

让我们从现在开始使用Gemini Pro API和Python构建基本的LLM功能。我们将向您展示如何获取API密钥，然后在Python中使用相关的Gemini LLMs。

从Google AI Studio获取您的API密钥

Google AI Studio是一个免费的基于网络的工具，可以帮助您快速开发提示并获取应用程序开发的API密钥。您可以使用您的Google账号登录Google AI Studio，并从这里获取您的API密钥。

记住要把密钥保存在安全的地方，不要在公共平台（如GitHub）上公开暴露它。

Google Gemini Pro目前还没有在所有国家都可访问，但预计很快将在所有地方提供。如果您还无法使用它，请尝试使用VPN。在此处查看可用地区。

使用Python与Gemini Pro API进行文本输入

为了开始使用 Gemini Pro API，我们需要从PyPI或GitHub安装google-generativeai包。

pip install -q -U google-generativeai

现在我已经将我的API密钥保存在一个YAML文件中，所以我可以加载它，而无需在我的代码中公开密钥。我加载这个文件，并将我的API密钥加载到一个变量中，如下所示。

import yaml

with open('gemini_key.yml', 'r') as file:
    api_creds = yaml.safe_load(file)

GOOGLE_API_KEY = api_creds['gemini_key']

下一步是通过API创建与Gemini Pro模型的连接，具体操作如下：首先需要使用您的API设置配置，然后加载模型（或者更确切地说，创建与Google服务器上模型的连接）。

import google.generativeai as genai

genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel('gemini-pro')

我们现在已经准备好开始使用Gemini Pro了！让我们开始做一个基本任务来获取一些信息。

response = model.generate_content("Explain Generative AI with 3 bullet points")
to_markdown(response.text)

to_markdown（...）函数使文本输出变得更加漂亮，您可以从官方文档或使用我的Colab笔记本获取该函数。

让我们现在尝试一个更实际的例子，想象一下您正在自动化跨多个语言地区的IT支持。我们将让LLM尝试检测客户问题的源语言，将其翻译成英文，并以客户的原始语言回复。

it_support_queue = [
    "I can't access my email. It keeps showing an error message. Please help.",
    "Tengo problemas con la VPN. No puedo conectarme a la red de la empresa. ¿Pueden ayudarme, por favor?",
    "Mon imprimante ne répond pas et n'imprime plus. J'ai besoin d'aide pour la réparer.",
    "Eine wichtige Software stürzt ständig ab und beeinträchtigt meine Arbeit. Können Sie das Problem beheben?",
    "我无法访问公司的网站。每次都显示错误信息。请帮忙解决。"
]

it_support_queue_msgs = f"""
"""
for i, msg in enumerate(it_support_queue):
  it_support_queue_msgs += "\nMessage " + str(i+1) + ": " + msg

prompt = f"""
Act as a customer support agent. Remember to ask for relevant information based on the customer issue to solve the problem.
Don't deny them help without asking for relevant information. For each support message mentioned below
in triple backticks, create a response as a table with the following columns:


  orig_msg: The original customer message
  orig_lang: Detected language of the customer message e.g. Spanish
  trans_msg: Translated customer message in English
  response: Response to the customer in orig_lang
  trans_response: Response to the customer in English


Messages:
'''{it_support_queue_msgs}'''
"""

现在我们已经准备好将提示放入LLM中了，让我们执行它吧！

response = model.generate_content(prompt)
to_markdown(response.text)

Response to our prompt from Gemini Pro LLM

很棒！我相信如果有更详细的信息或是一个RAG系统的话，回应会更加相关和有用。

使用Python和Gemini Pro Vision API进行文本和图像输入

Google发布了一款Gemini Pro Vision多模LML，它可以接受文本和图片作为输入，并返回文本作为输出。请记住，这仍然只是一个只输出文本的LML。让我们用一个简单的用例来理解图片并从中创作一个短故事吧！

我们先加载图片。

import PIL.Image

img = PIL.Image.open('cat_pc.jpg')
img

在此之后，我们加载Gemini Pro Vision模型并将以下提示发送给它以获取响应。

model = genai.GenerativeModel('gemini-pro-vision')
prompt = """
Describe the given picture first based on what you see.
Then create a short story based on your understanding of the picture.

Output should have both the description and the short story as two separate items 
with relevant headings
"""
response = model.generate_content(contents=[prompt, img])
to_markdown(response.text)

Response to our prompt from Gemini Pro Vision LLM

总体来说还不错！虽然我可能已经看到了GPT-4配合DALL-E可以将这款游戏识别为《动物之森》，这样更准确。但是非常不错，我得说。

您还可以使用Gemini Pro来构建交互式聊天体验。这涉及向API发送消息并接收响应，支持多轮对话。请随时查看详细的API文档以获取一些示例！

结论

总而言之，无论您是经验丰富的人工智能开发者还是刚刚入门的初学者，谷歌的Gemini Pro和Python为您将先进的人工智能融入应用程序和项目提供了一种非常直观且强大的方法。此外，Gemini Pro API目前可以免费使用，这就是一个探索AI LLMs能力的机会，而且无需初期投资。虽然未来可能会有定价变动，但是以零成本开始使用如此强大的工具的机会实在太棒了！

希望你明白了如何通过Google AI Studio获得API密钥，并在很短的时间内执行你的第一个Python脚本，使用Gemini Pro API开始。现在就去尝试将其运用到你自己的问题和项目中吧！

如需联系，请通过我的领英或我的网站与我取得联系。我进行大量的人工智能咨询、培训和项目。

在这里获取在Google Colab笔记本中的完整代码！