Sound editor

Core component including powerful tools of speech signals examination, outstanding possibilities for visible speech analysis and display, transcripts segmentation, automatic and semi-automatic identification tools together with many other functions.

Audio recording examination for speaker identification by speech samples
(formants and main speech signal fundamental pitch analysis)


• File hex view
• Words search in signals markup
• Speech to Text plugin
• Connection of Sound Cleaner to SIS
• Visualization
• Editing and processing
• Speech and noise detection
• Transcription and segmentation
• Speakers diarization in dialog/ polylog
• Multiwindow interface
• Signals comparison
• Signal properties calculation

• Managing projects, creating reports
• Identification
• Automatic comparison
• Speech features comparison
• Formants comparison
• Fundamental pitch comparison
• “Methodology”
• Common solution
• Analysis of the audio recording extracted from the video file
• EdiTracker and diagnostic module

File hex view

This operation allows to examine the binary content of audio files titles in hexadecimal format.

Searching words in signals markup

This operation allows to perform search of all similar words in two preset signals or find words in the list selected by operator.

Speech to Text plugin

Availability of user dictionaries allows the expert familiar with phonetics to add additional dictionaries with transcripts to the speech recognition module. It is useful when the audio recording speech contains slang, i.e. words that are not common for formal language. In this case automatic recognition “does not know” how these words are pronounced. Therefore, additional dictionary can be instructive.

Connecting Sound Cleaner to SIS (using VST plugin)

There is “Sound Cleaner” operation In SIS “Processing” menu. It allows to use signal processing schemes of Sound Cleaner for work with signal in SIS.


The applied signal spectral representation algorithms provide for the maximum quality and clarity of visible speech. The user catches on the appropriate representation or uses presets for different types of spectral analysis.

  • Oscillograms
  • FFT and LPC spectrograms
  • Long-time-average and instantaneous spectrum
  • Cepstrogram
  • Autocorrelogram

  • Fundamental pitch extractor
  • Formant extractor
  • Energy
  • Histogram and histogram correlation

Editing and processing

SIS provides a wide variety of expert editing and signal processing tools that improve the intelligibility of recorded speech and prepare audio recordings for further analysis.

  • Amplitude normalisation
  • Linear transformation
  • DC Offset Suppression
  • Mixing
  • Modulation
  • Tempo correction*
  • Resampling
  • Bit depth conversion
  • Stereo separation and merging two
    mono signals to stereo
  • Phase change
  • Adaptive inverse filter
  • Adaptive tone suppressor
  • Adaptive broadband noise filters

Detecting speech and noises

The speech detector automatically marks speech fragments in the audio signal that are
suitable for identification. The module can also be configured to detect noisy areas: dial
tones, clipped fragments and clicks.

Text transcription and speech segmentation

The speech-to-text plugin allows to automatically obtain the text content of a speech signal of an audio recording in Russian, English, Spanish, Kazakh, and Arabic. Additionally, the transcription is accompanied by word-to-word segmentation indicating the location of spoken words. This functionality allows the expert to work effectively with large amounts of audio recordings.

In manual mode, selected audio fragments can easily be assigned to particular categories (e.g., different speakers, sounds or noises) with text comments while the general text will be exported to MS Word. If there are two files of transcribed text, the programme can automatically search for all matching words in the audio recordings compared.

Automatic text transcription with the segmentation of lines spoken by speakers

Separating speakers in a dialogue/polylogue

The module automatically marks lines according to speakers. Its reliability is up to 95% with a signal-to-noise ratio of at least 20 dB and the duration of each speaker’s speech of at least 16 seconds.

Using built-in algorithms, the module allows segmentation of the lines spoken by up to 5 speakers.

Multi-window interface

SIS allows several audio files to be opened in one or several windows at the same time. The windows can be positioned according to a particular task: vertically for identification purposes or horizontally to compare copies of audio recordings or the various sound cleaning options.
Signals can be opened in several layers in one window, and their colours and transparency can be changed for better visualisation.

Working with audio recordings in a multi-window interface

Signal comparison

Windows can be connected according to time and spectral domain, which makes measurement easier using vertical and horizontal cursors. The instant spectrum can be overlaid for better visual comparison.
Pitch histograms can be compared visually or numerically using values of minimum, maximum,
median, asymmetry and general correlation.

Signal analysis

SIS automatically calculates the signal characteristics, based on which the expert arrives at a conclusion if the recording is suitable for the identification analysis.

  • Frequency response
  • Signal-to-noise ratio
  • Reverberation time
  • Clipping and tonal noises
  • Clear speech duration

Signal characteristics assessment

Working with projects and creating reports

IKAR Lab 3 organises the expert’s workflow efficiently. The project opens files that are related to examination directly from SIS, whether they are audio, text, video or photographic files. These files and identification results can be saved in a structured way, as can reports created in MS Word. The report can be supplemented with information on the settings for illustrations and visible representations of speech, screenshots of the working screen or its area.


This unique tool based on biometric algorithms and expert modules is made to automate and formalise the processes involved in audio forensics identification research: searching for comparable words and sounds, selecting sounds and melodic fragments to be compared, comparing speakers’ formants and pitches, and performing speech analysis. The results are presented as numerical indicators to contribute to the overall identification conclusion.

Automatic comparison

The module performs 1:1 voice signal comparison. The method it uses depends on the speech signal characteristics of the audio recordings studied. All results are based on the extraction of voice biometric traits and calculations regarding their similarity.

More methods of comparison: cxvector (a development on xvector) is used as the main method, and, in addition, smart-speaker and gen6-v3 (when the clear speech content in an audio recording is from 1.5 to 5 seconds). The new functionality offers faster and more secure identification.

The module’s machine learning process involved tens of thousands of speakers to make the engine train on the audio recordings made by speakers of different genders, ages, ethnicities, and languages. The varied types of speech material were captured in various channels and in multiple sound recording sessions. The high reliability of the biometric engine has been confirmed in NIST testing.

Automatic identification results

Comparison of formants

The process of comparing formants with the module involves two stages.

  1. Search and selection of reference sound
    fragments for known and unknown
  • using the scatter plot with vowel triangle
    and highlighting the searching area
  • specifying the frequency range of formants
  • by the position of horizontal marks
    indicating the limits in hertz and percentage
  • using a graphical vowel chart

2. Expert comparison. The module
automatically calculates FR, FA and LR for
the sounds selected and decides whether
the outcome of identification is positive,
negative or undefined

Speaker identification using the expert
method of formant comparison

Additional features:

  • Visual comparison of selected sounds on a vowel chart
  • Comparison of the average formant values for selected sounds of two speakers
  • Specifying words or triads as textual comments on reference fragments
  • Exporting tables of reference fragments and results to MS Word

Pitch comparison

The pitch comparison module compares the specificities of speakers’ melodic patterns. The module enables melodic fragments to be selected, attributes them to 1 of 18 possible melodic types and compares them according to 15 parameters, including maximum, average and minimum pitch values, rate of pitch change, skewness, kurtosis and others.

The algorithm generates results in the form of a match percentage for each parameter and delivers an overall identification/elimination
conclusion or an inconclusive result. All data can be easily exported as text reports.

Speaker identification using pitch comparison

Identification wizard

This plugin offers a step-by-step identification process, displays the stages of research, and visualises the results for any comparison made.

Overall conclusion

The outcome of each method can be saved in a given project. The programme is designed to bring the results from each module into account when making an overall conclusion. The expert can adjust the relative weight of each method in the overall conclusion or their significance can be automatically assigned through a calculation of the qualitative and quantitative characteristics of the audio recordings being compared. Based on the results, the expert can automatically generate a detailed report.

Analysis of an audio track extracted from a video

With the new SIS method, the expert gets immediate access to the audio track of a video file without requiring any additional editors. Just, upload the video file and SIS will automatically extract the audio track from the video and open it in a separate window.

The module allows work to be simultaneously done on a video in the video player and
an audio track in the editor. The video and the sound are synchronised, and the video is
automatically modified while the audio track is being edited.

Extraction and analysis of the audio track from a video


The plugin performs diagnostics of the authenticity of analogue and digital audio recordings and greatly simplifies expert analysis using SIS by providing the user with manual and automatic analysis methods.

Authenticity check for the use of digital preprocessing of the audio recording

EdiTracker analysis methods

  • Specifying the recording device parameters
  • Identifying traces of previous digital signal processing
  • Auditory analysis
  • Detecting traces of tampering through
    phase shifts in the harmonics and phase scanning
  • Scanning background noise

Specifying the recording device parameters

Every analogue recording device has unique characteristics, such as frequency response, total harmonic distortion, pitch variation, effective frequency range, tempo deviation, etc.

EdiTracker automatically assesses these characteristics using a test signal. A mismatch between recording device parameters and characteristics of a signal allegedly recorded with that unit may be an indication of tampering.

Identifying traces of digital preprocessing

Digital processing of analogue signals always requires a specific sample rate. During the digitising process, a phenomenon known as aliasing occurs. Aliasing degrades the audio quality as high-frequency components are superimposed on low-frequency ones.

The vast majority of analogue-to-digital and digital-to-analogue converters use anti- aliasing filters. EdiTracker automatically detects traces of such filters, the presence of which may suggest that the audio has been digitised.

Detecting traces of tampering through phase shifts in the harmonics

EdiTracker automatically scans audio for technical narrow-band signals which normally come from an electrical network (ENF), batteries, nearby electrical appliances, etc.,

and estimates their phase continuity. An unjustified phase break can be interpreted as potential evidence of audio editing.

Scanning background noise

Background scanning detects dramatic changes in the spectrum that are unnoticeable on the waveform and which may be signs of audio editing. EdiTracker also automatically scans the integrity of background noises and marks any abrupt change in noise level.

Authenticity check for hidden editing of an audio recording based on the uniformity of the background noise

Auditory analysis

During the playback of an original audio recording, the entirety of the audio communication—including verbal and nonverbal speaker output and additional background interference—come together to form a complete and integrated picture of the audio and speech environment. Auditory analysis of these events based on the known characteristics of the recording

equipment and methods used can reveal possible violations in the intergrity of the overall audio picture and identify the location, facts and methods of such violations. EdiTracker provides an extended list of auditory and linguistic indicators that may indicate breaches in the authenticity of a recording. These resources can be used to create a textual report.

Diagnostic module

A new SIS module for a more reliable assessment of the authenticity and examinability of an audio recording. The module detects various signal features that explain the nature of its origin or possible processing methods, which may either be unknown or deliberately hidden. In addition to EdiTracker, it detects the application of certain operations on a signal using the following methods

  • Spoofing detection
  • DC offset analysis
  • Analysis of A/μ encoding traces
  • Analysis of MP3 encoding traces

Spoofing detection

The spoofing detector searches for traces of spoofing attacks in the audio recording, such as replays, speech synthesis and voice disguising. This algorithm is based on a neural network trained on various types of spoofing. As a result, it can conclude whether or not the audio recording is masquerading as the authentic recording of a speaker.

Expert spoofing detection analysis

DC offset analysis

This module analyses the audio recording to identify any dramatic change in DC offset, as this may be a sign of integrity violation. If such a violation is detected, the module highlights the corresponding areas.

Detecting the disturbance of DC offset uniformity iin two areas in the audio recording

Detection of A/μ coding

This module analyses the audio recording to detect areas with signs of A/μ encoding. The possibility that an audio recording has been processed using these codecs is not indicated by the recording format. In the event of the detection of such coding, the module highlights the corresponding areas or the entire audio recording.

Detecting A/μ coding areas

Detection of MP3 coding

This module analyses the audio recording to identify signs of MP3 coding. The possibility that an audio recording has been processed using this codec is not indicated by the recording format. In the event of the detection of such MP3 coding, the module displays a message describing the signs detected. Additionally, spectrograms, graphs and histograms are displayed, explaining the decision made by the algorithm.

Detecting MP3 coding