Welcome to hmmkay’s documentation!¶
Hmmkay is a basic library for discrete Hidden Markov Models that relies on numba’s justintime compilation. It supports decoding, likelihood scoring, fitting (parameter estimation), and sampling.
Hmmkay accepts sequences of arbitrary length, e.g. 2d numpy arrays or lists
of iterables. Hmmkay internally converts lists of iterables into Numba typed
lists of numpy arrays (you might want to do that yourself to avoid repeated
convertions using hmmkay.utils.check_sequences()
)
Scoring and decoding example:
>>> from hmmkay.utils import make_proba_matrices
>>> from hmmkay import HMM
>>> init_probas, transition_probas, emission_probas = make_proba_matrices(
... n_hidden_states=2,
... n_observable_states=4,
... random_state=0
... )
>>> hmm = HMM(init_probas, transition_probas, emission_probas)
>>> sequences = [[0, 1, 2, 3], [0, 2]]
>>> hmm.log_likelihood(sequences)
8.336
>>> hmm.decode(sequences) # most likely sequences of hidden states
[array([1, 0, 0, 1], dtype=int32), array([1, 0], dtype=int32)]
Fitting example:
>>> from hmmkay.utils import make_observation_sequences
>>> sequences = make_observation_sequences(n_seq=100, n_observable_states=4, random_state=0)
>>> hmm.fit(sequences)
Sampling example:
>>> hmm.sample(n_obs=2, n_seq=3) # return sequences of hidden and observable states
(array([[0, 1],
[1, 1],
[0, 0]]), array([[0, 2],
[2, 3],
[0, 0]]))
API Reference¶
HMM class¶

class
hmmkay.
HMM
(init_probas, transitions, emissions, n_iter=10)[source]¶ Discrete Hidden Markov Model.
The number of hidden and observable states are determined by the shapes of the probability matrices passed as parameters.
Parameters:  init_probas (arraylike of shape (n_hidden_states,)) – The initial probabilities.
 transitions (arraylike of shape (n_hidden_states, n_hidden_states)) – The transition probabilities.
transitions[i, j] = P(st+1 = j / st = i)
.  emissions (arraylike of shape (n_hidden_states, n_observable_states)) – The probabilities of symbol emission.
emissions[i, o] = P(Ot = o / st = i)
.  n_iter (int, default=10) – Number of iterations to run for the EM algorithm (in
fit()
).

decode
(sequences, return_log_probas=False)[source]¶ Decode sequences with Viterbi algorithm.
Given a sequence of observable states, return the sequence of hidden states that mostlikely generated the input.
Parameters:  sequences (arraylike of shape (n_seq, n_obs) or list (or numba typed list) of iterables of variable length) – The sequences of observable states
 return_log_probas (bool, default=False) – If True, logprobabilities of the joint sequences of observable and hidden states are returned
Returns:  best_paths (ndarray of shape (n_seq, n_obs) or list of ndarray of variable length) – The most likely sequences of hidden states.
 log_probabilities (ndarray of shape (n_seq,)) – logprobabilities of the joint sequences of observable and hidden
states. Only present if
return_log_probas
is True.

fit
(sequences)[source]¶ Fit model to sequences.
The probabilities matrices
init_probas
,transitions
andemissions
are estimated with the EM algorithm.Parameters: sequences (arraylike of shape (n_seq, n_obs) or list (or numba typed list) of iterables of variable length) – The sequences of observable states Returns: self Return type: HMM instance

log_likelihood
(sequences)[source]¶ Compute loglikelihood of sequences.
Parameters: sequences (arraylike of shape (n_seq, n_obs) or list (or numba typed list) of iterables of variable length) – The sequences of observable states Returns: log_likelihood Return type: array of shape (n_seq,)

sample
(n_seq=10, n_obs=10, random_state=None)[source]¶ Sample sequences of hidden and observable states.
Parameters:  n_seq (int, default=10) – Number of sequences to sample
 n_obs (int, default=10) – Number of observations per sequence
 random_state (int or np.random.RandomState instance, default=None) – Controls the RNG, see sciktlearn glossary for details.
Returns:  hidden_states_sequences (ndarray of shape (n_seq, n_obs))
 observable_states_sequences (ndarray of shape (n_seq, n_obs))
Utils¶
The utils module contains helpers for input checking, parameter generation and sequence generation.

hmmkay.utils.
make_observation_sequences
(n_seq=10, n_observable_states=3, n_obs_min=10, n_obs_max=None, random_state=None)[source]¶ Generate random observation sequences.
Parameters:  n_seq (int, default=10) – Number of sequences to generate
 n_observable_states (int, default=3) – Number of observable states.
 n_obs_min (int, default=10) – Minimum length of each sequence.
 n_obs_max (int or None, default=None) – If None (default), all sequences are of length
n_obs_min
and a 2d ndarray is returned. If an int, the length of each sequence is chosen randomly withn_obs_min <= length < n_obs_max
. A numba typed list of arrays is returned in this case.  random_state (int or np.random.RandomState instance, default=None) –
Controls the RNG, see sciktlearn glossary for details.
Returns: sequences – The generated sequences of observable states
Return type: ndarray of shape (n_seq, n_obs_min,) or numba typed list of ndarray of variable length

hmmkay.utils.
make_proba_matrices
(n_hidden_states=4, n_observable_states=3, random_state=None)[source]¶ Generate random probability matrices.
Parameters:  n_hidden_states (int, default=4) – Number of hidden states
 n_observable_states (int, default=3) – Number of observable states
 random_state (int or np.random.RandomState instance, default=None) –
Controls the RNG, see sciktlearn glossary for details.
Returns:  init_probas (arraylike of shape (n_hidden_states,)) – The initial probabilities.
 transitions (arraylike of shape (n_hidden_states, n_hidden_states)) – The transition probabilities.
transitions[i, j] = P(st+1 = j / st = i)
.  emissions (arraylike of shape (n_hidden_states, n_observable_states)) – The probabilities of symbol emission.
emissions[i, o] = P(Ot = o / st = i)
.

hmmkay.utils.
check_sequences
(sequences, return_longest_length=False)[source]¶ Convert sequences into appropriate format.
This helper is called before any method that uses sequences. It is recommended to convert your sequences once and for all before using the
HMM
class, to avoid repeated convertions.Parameters:  sequences (arraylike of shape (n_seq, n_obs) or list/typed list of iterables of variable length.) – Lists of iterables are converted to typed lists of numpy arrays, which can have different lengths. 2D arrays are untouched (all sequences have the same length).
 return_longest_length (bool, default=False) – If True, also return the length of the longest sequence.
Returns: sequences – The sequences converted either to ndarray or numba typed list of ndarrays.
Return type: ndarray of shape (n_seq, n_obs) or typed list of ndarray of variable length