speechtotext.model.modelWrapper.ModelWrapper

class ModelWrapper(model_version)[source]

Bases: ABC

Abstract Wrapper for model.

If audio needs to be converted use convert_sample in get_transcript_of_file.

Wrapper for models.

Parameters:: model_version (WhisperModel) – Model version of whisper to use.

Methods

`benchmark_n_samples`	Benchmark n samples with model.
`benchmark_sample`	Benchmark sample with model.
`benchmark_samples`	Benchmark samples with model.
`convert_sample`	Convert sample to correct format.
`get_model`	Get model.
`get_transcript_of_file`

Attributes

PATH_OF_TEMP_CONVERTED_AUDIO_FILE

path to temp file that will be created to convert the audio files to an accepted audio format.

PATH_OF_TEMP_CONVERTED_AUDIO_FILE: str = 'converted_audio_file.wav'

path to temp file that will be created to convert the audio files to an accepted audio format.

Type:: PATH_OF_TEMP_CONVERTED_AUDIO_FILE

_append_error(samples, audio_id, error)[source]

Append error to model_errors.

Parameters:

samples (SampleDataset) – Dataset of audio.
id (str) – Id of failed sample.
error (str) – Error message.

_benchmark_sample_with_time(dataset, audio_id, with_cleaning=True)[source]

Benchmark sample for model with timer.

Parameters:

dataset (Dataset) – Dataset of audio.
id (str) – Id of audio file.
with_cleaning (bool, optional) – Set True to clean transcripts. Defaults to True.

Returns:

Metrics of the transcript.

Return type:

Metrics

benchmark_n_samples(dataset, number_of_samples, with_cleaning=True)[source]

Benchmark n samples with model.

Parameters:

dataset (Dataset) – Dataset of audio.
number_of_samples (int) – Number of random samples to benchmerk.
with_cleaning (bool, optional) – Set True to clean transcripts. Defaults to True.

Returns:

List of metrics for each sample.

Return type:

list

benchmark_sample(dataset, audio_id, with_cleaning=True)[source]

Benchmark sample with model.

Parameters:

dataset (Dataset) – Dataset of audio.
id (str) – Id of audio file.
with_cleaning (bool, optional) – Set True to clean transcripts. Defaults to True.

Returns:

Metrics of the transcript.

Return type:

Metrics

benchmark_samples(samples, with_cleaning=True)[source]

Benchmark samples with model.

Parameters:

dataset (Dataset) – Dataset of audio.
number_of_samples (int) – Number of random samples to benchmerk.
with_cleaning (bool, optional) – Set True to clean transcripts. Defaults to True.

Returns:

List of metrics for each sample.

Return type:

list

convert_sample(path_to_sample)[source]

Convert sample to correct format.

Parameters:

path_to_sample (str) – Path to sample.
override (bool, optional) – Override original file. Defaults to False.

Returns:

Path to converted sample.

Return type:

str

abstract get_model()[source]: Get model. Set self.model.