S
Speakly.PRO

Speech & Audio

Use text-to-speech, speech recognition, and audio recording in lessons.

The Lex Editor includes speech and audio tools that bring language lessons to life. Students can listen to text read aloud, practice pronunciation by recording themselves, and interact with speech-to-text exercises.

Text-to-Speech (TTS)

Text-to-speech converts written text into spoken audio. This is valuable for:

  • Modeling correct pronunciation
  • Creating listening exercises without recording your own voice
  • Helping students hear unfamiliar words

AI-Generated Audio (Lesson Builder)

When using the AI lesson builder, listening sections can include AI-generated audio using advanced TTS engines. The AI writes a dialogue or monologue script, and the TTS service converts it into natural-sounding speech.

Voice configuration:

  • Multiple voices available per language
  • Each speaker in a dialogue can have a different voice
  • Voice selection is available through the lesson builder configuration

Emotional tags for natural delivery:

The AI can include emotional tags in scripts to make the speech more natural:

Tag TypeExamplesBest For
Emotions[calm], [excited], [nervous], [happy]Natural conversation feel
Delivery[whispering], [shouting], [laughing]Expressive dialogue
Pace[slowly], [quickly], [cautiously]Speed variation
Effects[applause], [footsteps], [door closing]Scene setting

Tags are adapted by level:

  • A1/A2: calm, slow, clear delivery only
  • B1/B2: natural conversation with moderate emotional variation
  • C1/C2: full range of emotions, delivery styles, and sound effects

Emotional tags direct the voice synthesis engine -- they are not spoken aloud. Tags are automatically removed from the transcript shown to students.

Audio Player Widget

The generated audio appears in the lesson as an audio player widget with:

  • Playback controls (play, pause, scrub)
  • Speed adjustment (0.5x to 1.5x)
  • Transcript toggle (hidden, shown after listening, or always visible)
  • No limits on replaying

Speech-to-Text Recognition

The speech-to-text plugin allows students to dictate text using their microphone. The browser's speech recognition engine converts spoken words into written text.

Use Cases

  • Dictation exercises -- students hear a phrase and speak it; the system transcribes their speech
  • Voice input for essays -- students can dictate their writing instead of typing
  • Accessibility -- provides an alternative input method for students who have difficulty typing

Speech recognition quality depends on the browser and the student's microphone. Chrome offers the best speech recognition support. Results may vary for less common languages.

Audio Recording (Speech Recorder)

The speech recorder widget lets students record audio responses directly in the lesson. See the Speech Recorder widget for full details on creating these exercises.

Recording Workflow for Students

Read the Prompt

The widget displays instructions (e.g., "Describe your favorite holiday").

Grant Microphone Access

The browser asks for microphone permission the first time.

Record

Click the record button. A timer shows the elapsed time. Recording stops at the maximum duration or when the student clicks stop.

Review

Students can play back their recording and decide whether to keep it or re-record.

Submit

Click submit to save the recording. Teachers can listen to it later for evaluation.

Configuration Options

  • Maximum duration: 30, 60, or 120 seconds
  • Read-aloud mode: provide reference text that the student reads
  • Free-response mode: student speaks freely in response to a prompt
  • Re-recording: students can re-record as many times as needed before submitting

Combining Speech and Audio in Lessons

A well-designed listening and speaking lesson might include:

  1. Audio player with a dialogue (TTS-generated or uploaded)
  2. True/false or multiple choice comprehension questions about the audio
  3. Speech recorder where students repeat key phrases or answer questions orally
  4. Fill-in-the-blank exercises testing vocabulary from the audio

This combination tests listening comprehension, pronunciation, and vocabulary recall in a single cohesive section.

Frequently Asked Questions

What audio formats are supported for upload?

MP3, WAV, M4A, and WebM formats are supported. MP3 is recommended for the best balance of quality and file size.

Can I use my own voice instead of TTS?

Yes. You can upload your own audio recordings instead of using AI-generated TTS. Use the audio upload option in the Media tools.

Can I grade student recordings?

Student recordings are saved and accessible from the student's submission. Teachers can listen to recordings and provide feedback or grades manually.

Is there automatic pronunciation scoring?

Currently, speech recordings are evaluated manually by teachers. Automatic pronunciation scoring may be available in future updates.