speechtotext.datasets.DatasetBare

class DatasetBare(path_to_dir, name, file_ext='.wav')[source]

Bases: object

Bare dataset class.

Creates dataset object. There needs to be an transcripts.txt directly in the dir. :type path_to_dir: str :param path_to_dir: Path to dir ending with “/”. :type path_to_dir: str :type name: str :param name: Name of dataset. :type name: str :type file_ext: str :param file_ext: Extention of files. :type file_ext: str

Methods

get_path_of_fragment

Gets path of fragment.

get_text_of_id

Get text of fragment id.

number_of_samples

Get number of samples in dataset.

validate_samples

Validate if samples have a corresponding file.

get_path_of_fragment(audio_id)[source]

Gets path of fragment.

Parameters:

id (str) – Id of file.

Raises:

FileNotFoundError – If id doesn’t exist.

Returns:

Path to fragment.

Return type:

str

get_text_of_id(audio_id)[source]

Get text of fragment id.

Parameters:

id (str) – Id of fragment.

Returns:

String of spoken text.

Return type:

str

number_of_samples()[source]

Get number of samples in dataset.

Returns:

Number of samples in dataset.

Return type:

int

validate_samples()[source]

Validate if samples have a corresponding file.

Returns:

True if all samples have a file.

Return type:

bool