Home ▸ Integrations ▸

Google Generative AI

The Google Generative AI integration adds a conversation agent, speech-to-text, and text-to-speech entities powered by Google Generative AI to Home Assistant. The conversation agent can optionally be allowed to control Home Assistant.

Controlling Home Assistant is done by providing the AI access to the Assist API of Home Assistant. You can control what devices and entities it can access from the exposed entities page. The AI is able to provide you information about your devices and control them.

This integration does not integrate with sentence triggers.

This integration requires an API key to use, which you can generate here, and to be in one of the available regions.

Configuration

To add the Google Generative AI service to your Home Assistant instance, use this My button:

Manual configuration steps

If the above My button doesn’t work, you can also perform the following steps manually:

Browse to your Home Assistant instance.
Go to Settings > Devices & Services.
In the bottom right corner, select the Add Integration button.
From the list, select Google Generative AI.
Follow the instructions on screen to complete the setup.

Generate an API Key

The Google Generative AI API key is used to authenticate requests to the Google Generative AI API. To generate an API key take the following steps:

Visit the API Keys page to retrieve the API key you’ll use to configure the integration.

On the same page, you can see your plan: free of charge if the associated Google Cloud project doesn’t have billing, or pay-as-you-go if the associated Google Cloud project has billing enabled. Comparison of the plans is available at this pricing page. The major differences include: the free of charge plan is rate limited, and free prompts/responses are used for product improvement.

Options

Options for Google Generative AI can be set via the user interface, by taking the following steps:

Browse to your Home Assistant instance.
Go to Settings > Devices & Services.
If multiple instances of Google Generative AI are configured, choose the instance you want to configure.
Select the integration, then select Configure.

Instructions

Instructions for the AI on how it should respond to your requests. It is written using Home Assistant Templating.

Control Home Assistant

If the model is allowed to interact with Home Assistant. It can only control or provide information about entities that are exposed to it.

Recommended settings

If enabled, the recommended model and settings are chosen.

If you choose to not use the recommended settings, you can configure the following options:

Model

Model used to generate response.

Temperature

Creativity allowed in the responses. Higher values produce a more random and varied response. A temperature of zero will be deterministic.

Top P

Probability threshold for top-p sampling.

Top K

Number of top-scored tokens to consider during generation.

Maximum Tokens to Return in Response

The maximum number of words or “tokens” that the AI model should generate.

Safety settings

Thresholds for different harmful categories.

Enable Google Search tool

Enables the model to query Google Search. This can only be enabled when the “Control Home Assistant” setting is set to “No control”. See below for a workaround using it with “Assist”.

Google Search

Due to an API limitation we cannot have the Google Search tool together with other tools. Request fails with 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Tool use with function calling is unsupported', 'status': 'INVALID_ARGUMENT'}}. But you can do the following workaround that exposes a script to voice assistants. The script calls a Google Generative AI Conversation that only has the Google Search tool enabled.

Workaround for Google Search tool

Add a second Google Generative AI service.
Select Configure
In the Control Home Assistant section, uncheck Assist and any other options.
Uncheck Recommended model settings
Select Submit
Check Enable Google Search tool
Increase Maximum tokens to return in response
Select Submit
Create a script (Settings > Automations & scenes > Scripts > Create script)
Select 3 dots > Edit in YAML and enter the following (edit the conversation.google_generative_ai_2 to match the entity created from the 1st step):

sequence:
  - action: conversation.process
    metadata: {}
    data:
      agent_id: conversation.google_generative_ai_2
      text: "{{ query }}"
    response_variable: result
  - variables:
      result:
        response: "{{ result.response.speech.plain.speech }}"
  - stop: ""
    response_variable: result
alias: "Assist: Search Google"
description: >-
  Makes a Google search to answer questions that are completely unrelated with
  the smart home and are exclusively about current events or information in
  real-time like the current president, results of last night's game, release
  dates, etc.
fields:
  query:
    selector:
      text: null
    name: Query
    description: The query to search Google for
    required: true

Select Save script
Select 3 dots > Settings > Voice assistants
Check Expose Assist

Talking to Super Mario

You can use this integration to talk to Super Mario and, if you want, have him control devices in your home.

The tutorial is using OpenAI, but this could also be done with the Google Generative AI integration.

Actions

Generate content

Tip

This action isn’t tied to any integration entry, so it won’t use the model, prompt, or any of the other settings in your options. If you only want to pass text, you should use the conversation.process action.

Allows you to ask Gemini Pro or Gemini Pro Vision to generate content from a prompt consisting of text and optionally attachments (images, PDFs, etc.). This action populates response data with the generated content.

Data attribute	Optional	Description	Example
`prompt`	no	The prompt for generating the content.	Describe this image
`filenames`	yes	File names for attachments to include in the prompt.	/tmp/image.jpg

action: google_generative_ai_conversation.generate_content
data:
  prompt: >-
    Very briefly describe what you see in this image from my doorbell camera.
    Your message needs to be short to fit in a phone notification. Don't
    describe stationary objects or buildings.
  filenames: /tmp/doorbell_snapshot.jpg
response_variable: generated_content

The response data field text will contain the generated content.

Another example with multiple images:

action: google_generative_ai_conversation.generate_content
data:
  prompt: >-
    Briefly describe what happened in the following sequence of images
    from my driveway camera.
  filenames:
    - /tmp/driveway_snapshot1.jpg
    - /tmp/driveway_snapshot2.jpg
    - /tmp/driveway_snapshot3.jpg
    - /tmp/driveway_snapshot4.jpg
response_variable: generated_content

Speak

The tts.speak action is the modern way to use TTS. Add the speak action, select the Google Generative AI TTS entity, select the media player entity or group to send the TTS audio to, and enter the message to speak.

Text-to-speech (TTS) generation is controllable, meaning you can use natural language to structure interactions and guide the style, accent, pace, and tone of the audio. You can change the way the text is spoken directly in the message by, e.g. entering “Say cheerfully: Have a wonderful day” instead of just “Have a wonderful day”.

For more options about speak, see the Speak section on the main TTS building block page.

In YAML, your action will look like this:

action: tts.speak
target:
  entity_id: tts.google_generative_ai_tts
data:
  media_player_entity_id: media_player.tv
  message: Say cheerfully: Have a wonderful day!
  options:
    voice: <voice-name>

You can configure the following options:

Option attribute	Optional	Description	Example
`voice`	yes	The voice name to be used for the generated speech. The default is `zephyr`.	`achernar`

The input language is detected automatically. Check the Google AI documentation for the supported languages.

Video tutorial

This video tutorial explains how Google Generative AI can be set up, how you can send an AI-generated message to your smart speaker when you arrive home, and how you can analyze an image taken from your doorbell camera as soon as someone rings the doorbell.

Troubleshooting

To aid in diagnosing issues it may help to turn up verbose logging by adding these to your configuration.yamlThe configuration.yaml file is the main configuration file for Home Assistant. It lists the integrations to be loaded and their specific configurations. In some cases, the configuration needs to be edited manually directly in the configuration.yaml file. Most integrations can be configured in the UI. [Learn more]:

logger:
  logs:
    homeassistant.components.conversation: debug
    homeassistant.components.conversation.chat_log: debug
    homeassistant.components.google_generative_ai_conversation: debug

Google Generative AI

Configuration

Generate an API Key

Options

Google Search

Talking to Super Mario

Actions

Generate content

Speak

Video tutorial

Troubleshooting

Removing the integration

To remove an integration instance from Home Assistant

Help us improve our documentation

Configuration

Generate an API Key

Options

Google Search

Talking to Super Mario

Actions

Generate content

Speak

Video tutorial

Troubleshooting

Removing the integration

To remove an integration instance from Home Assistant

Related topics

Related links

Help us improve our documentation