Mountain View, CA
Customer Service: Call center automation (CCA), Chatbots on web sites (BOT), through the general personal assistants (GPA), Mobile apps (MBL), messaging applications (MES)
Device and equipment control: In-home device control (DVC)
]Core technologies: Speech recognition (SR), Speech synthesis (TTS), Natural Language Processing (NLP), Speaker Recognition (SKR), Translation, Machine Learning (ML)
Scope: Full, Specialized (SPC)
System: Cloud-based (CLD)
- Google Assistant supports outside applications as “actions” that can be reached through Google Assistant. Such actions can be company digital assistants or specialized services.
- Web sites can be marked up to be available through Google Assistant.§ Google provides tools for developers of actions.
- Google Dialogflow is a cloud-based development suite for creating conversational interfaces for websites, mobile applications, popular messaging platforms, and IoT devices. It can be used to build interfaces such as chatbots and conversational IVR.
- Google has cloud AI and machine learning software, including AI building blocks. AI building blocks include “conversation” options, such as speech recognition, text-to-speech, and natural language processing.
- Speech-to-Text can stream text results, immediately returning text as it’s recognized from streaming audio or as the user is speaking.
- Natural Language Processing can be used to extract information about people, places, and events; better understand social media sentiment and call center conversations; and integrate analyzed text with a document archive on Google Cloud Storage.
Google Assistant supports outside applications as “actions” that can be reached through Google Assistant. Such actions can be company digital assistants or specialized services. Some general actions include a Clock for reminders or alarms and a Weather function. Companies such as American Express and Walmart also have actions. Google Assistant can also control connected home devices.
Google provides tools for developers of actions. One option is to extend an existing Android app to the Google Assistant. “Deep link” connects users directly into a specific activity using App Actions and can surface relevant content on the Assistant with “Slices.”
Google’s cloud AI and Machine Learning offerings include: (1) AI building blocks that allow developers to add language, conversation, sight, and structured data to their applications; (2) AI Hub, a hosted repository of plug-and-play AI components, and (3) AI Platform, a code-based data science development environment. A number of the options include a free trial. AI building blocks include “conversation” options.
A cloud Speech-to-TextAPI supports speech recognition across 120 languages. It enables developers to convert audio to text by applying neural network models through an API. It can be used for voice command-and-control, to transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio using Google’s machine learning technology.
Speech-to-Text supports identifying what language is spoken in the utterance. Speech-to-Text can stream text results, immediately returning text as it’s recognized from streaming audio or as the user is speaking. Alternatively, Speech-to-Text can return recognized text from audio stored in a file. Speech-to-Text can transcribe proper nouns (e.g., names, places) and appropriately format language (e.g., dates, phone numbers). (Google says it supports more than 10 times more proper nouns than the number of words in the entire Oxford English Dictionary.) It is said to handle noisy audio from many environments without requiring additional noise cancellation. In multi-participant recordings where each participant is recorded in a separate channel (e.g., phone call with two channels or video conference with four channels), Speech-to-Text will recognize each channel separately and then annotate the transcripts, retaining the order. Speech-to-Text is priced per 15 seconds of audio processed after a 60-minute free tier.
Speech-to-Text comes with multiple pre-built enhanced models: (1) Command_and_search: For short queries such as voice commands or voice search; (2) Phone_call: For audio that originated from telephony, such as phone calls (typically recorded at an 8 khz sampling rate); (3) Video: For audio that originated from video or includes multiple speakers. Ideally the audio is recorded at a 16khz or greater sampling rate (a premium model that costs more than the standard rate); and (4) Default: For audio that is not one of the specific audio models (for example, long-form audio. Ideally, the audio is high-fidelity, recorded at a 16khz or greater sampling rate).
Cloud Text-to-Speech synthesizes natural-sounding speech in more than 180 voices across 30+ languages and variants. It supports any application or device that can send a REST or gRPC request including phones, PCs, tablets, and IoT devices (e.g., cars, TVs, speakers). Cloud Text-to-Speech is priced per 1 million characters of text processed after the free tier.
The language offerings in AI building blocks include Natural Language. Natural Language uses machine learning to reveal the structure and meaning of text. It can be used to extract information about people, places, and events; better understand social media sentiment and call center conversations; and integrate analyzed text with a document archive on Google Cloud Storage. AutoML Natural Language lets you easily build and train your own ML models. Further, Natural Language API’s pre-trained models deliver language understanding features, including content classification and sentiment, entity, and syntax analysis.
Google Dialogflow is an end-to-end build-once-and-deploy-everywhere development suite for creating conversational interfaces for websites, mobile applications, popular messaging platforms, and IoT devices, originally produced by api.ai, which Google acquired. It can be used to build interfaces (such as chatbots and conversational IVR) that enable natural interactions between a company and its users. Dialogflow supports 20+ languages and one-click integration with 14 different platforms. Dialogflow can go beyond text to voice interactions.
AI building blocks also include Translation. If you want your website and apps to be able to instantly translate texts, you can use Translation API’s pre-trained neural machine translation to deliver fast, dynamic results for more than one hundred languages. And developers and localization experts with limited machine learning expertise can create production-ready custom models with AutoML Translation.