An untrained generative model with a few qubits

Welcome back to our tutorial on generative modeling with quantum hardware. Here, we will prepare a generative model with a few qubits. In the previous tutorials, we have learned that, in a generative machine learning task, you first receive data. Our goal is to learn a model of the true distribution underlying the data. Then, we want to be able to generate new data from this model. They have similar characteristics, but are not necessarily inside the training set.

The moving pieces are:

  • The dataset. What type of data is it?
  • The model. Is it flexible enough to approximate the data distribution well? Or is it highly specialized like in the earlier gaussian case?
  • The loss function. How do you estimate the distance between the training data samples and your model distribution?
  • The optimization algorithm. How do you get from (potentially very bad) initial parameters to a good set of parameters (Tutorial 2)?

In this tutorial, we will be focusing on the model component. We replace the gaussian probability density function of the last two tutorials with a quantum circuit. In turn, we will also have a look at how to adapt the dataset such that it is compatible with the quantum circuit model.

To simulate the outcomes of quantum circuits, we will be using the qiskit Python library. See more about qiskit in the qiskit textbook. Let’s first install qiskit and import the necessary Python libraries.

import numpy as np
import matplotlib.pyplot as plt

from qiskit.circuit import QuantumCircuit, Parameter
from qiskit import Aer

A brief reminder of quantum circuits

In our first tutorial on quantum machine learning, we presented this minimal example of a quantum circuit with one qubit and one parameter 𝜃.

# define a symbolic parameter theta
theta = Parameter(r"theta")

# create a quantum circuit with one qubit
circuit = QuantumCircuit(1)
# apply an X rotation with the symbolic parameter theta
circuit.rx(theta, 0)
# define the measurement
# draw the resulting circuit

This diagram represents the what we just coded up using qiskit. We start with one qubit, apply one gate, and then measure it. Let us now find out how the value of 𝜃 modulates the distribution of outcomes. To run the circuit, we need to import and initialize a simulator backend:

simulator = Aer.get_backend("aer_simulator")

# to plot it up
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(22, 4))

for ax, theta in zip((ax1, ax2, ax3), [np.pi * 1 / 4, np.pi * 1 / 2, np.pi * 3 / 4]):

    # bind the parameter theta of the circuit to pi/4
    circuit_with_params = circuit.bind_parameters([theta])

    # run the circuit and get the result
    result =, shots=1000).result()

    # extract the measurement counts as a python dictionary
    count_dict = result.get_counts()

    # for convenience sort by binary order
    count_dict = dict(sorted(count_dict.items()))

    # print the counts
    print(f"The results for theta = {round(theta, 3)} was:", count_dict)
    ax.set_title(rf"theta = {round(theta, 3)}", fontsize=16)
    ax.set_ylabel("Counts", fontsize=16)
    ax.set_xlabel("Outcome", fontsize=16)

It becomes clear that \theta controls the probability distribution over outcomes “0” or “1”. \theta=\frac{\pi}{2} creates an equal superposition of “0” and “1” in the measurement basis, wheras \theta=\frac{\pi}{4} and \theta=\frac{3\pi}{2} had a larger probability for measuring “0” or “1”, respectively. The slight discrepancy in the measurement outcomes from the exact probabilities, for example 50\% “0” and 50\% “1” for the \theta=\frac{\pi}{2} case, are due to the inherent randomness of quantum measurements. If you toss a coin 1000 times, the most likely individual outcome is 500/500, but you are very unlikely to get that perfect balance of outcomes and not any other outcome that is close to it. However, if you take more and more measurements, the observed probabilities of measurements will approach the exact probabilities of 50\%.

Let us now write a function that returns a qiskit circuit with a flexible number of qubits 𝑛, and with varying depth 𝑑. The depth of a quantum circuit is a vague term for the number of gates in a quantum circuit, or, more commonly, how often a fixed structure of gates is repeated. These structures are commonly called circuit layers. Because we will be working with multiple qubits, the quantum circuit should also contain two-qubit CNOT gates. In addition to the 𝜃 values of single-qubit gates controlling the measurement probabilities for each qubit, the CNOT gates correlate the measurement outcomes and create entanglement in the prepared quantum state. See our tutorial on Bell’s Game, which highlights the power of entanglement in light of the recent Nobel Prices in Physics.

def get_circuit(number_of_qubits, number_of_layers):

    number_of_parameters = 2 * number_of_qubits * number_of_layers
    thetas = [Parameter(r"theta" + f"{ii+1}") for ii in range(number_of_parameters)]

    # initialize the qiskit circuit
    circuit = QuantumCircuit(number_of_qubits)

    parameter_counter = 0

    # loop over the layers of the quantum circuit
    for l in range(number_of_layers):

        # apply a layer of RX rotations
        for n in range(number_of_qubits):
            circuit.rx(thetas[parameter_counter], n)
            parameter_counter += 1

        # apply a layer of RZ rotations
        for n in range(number_of_qubits):
            circuit.rz(thetas[parameter_counter], n)
            parameter_counter += 1

        # apply a layer of CNOTs
        for n in range(number_of_qubits - 1):
            circuit.cnot(n, n + 1)

    # instructions to measure all qubits

    return circuit

For 𝑛=4 qubits and 𝑑=2 layers, the quantum circuit looks like this:

number_of_qubits = 4
number_of_layers = 2

circuit = get_circuit(number_of_qubits, number_of_layers)

You can see that there are 16 \theta parameters \theta_1, \dots \theta_{16} in the circuit. In fact, the circuit has 2\cdot n \cdot d parameters in general. Let us select random parameters \theta_i \in [0, 2pi] and observe the measurement outcome distribution.


# the quantum circuit has 2*n*d parameters
number_of_parameters = 2 * number_of_qubits * number_of_layers

# define value for the initial parameters
parameters = np.random.uniform(0, 2 * np.pi, size=number_of_parameters)

# bind the circuit parameters to numbers
circuit_with_params = circuit.bind_parameters(parameters)

# run the circuit and get the result
result =, shots=1000).result()

# extract the measurement counts as a python dictionary
model_dict = result.get_counts()

print("The results was:", model_dict)

# for convenience sort by binary order
model_dict = dict(sorted(model_dict.items()))

# do the plotting
fig, ax = plt.subplots(1, 1)

ax.set_ylabel("Counts", fontsize=16)
ax.set_xlabel("Outcome", fontsize=16)

Lets think about what this plot represents. We start out with four qubits and a quantum circuit acting on it. Then we prepare and measure the quantum state multiple times and observe outcomes of the encoded proabbility distirbution. Does this sound familiar to you? You might think this sounds like a generative model – and you are right! The parametrized quantum state encodes our model distribution, and we can generate data from it by measuring the qubits.

The data now is discrete, i.e., it consists of “0”s and “1”s, instead of continuous numbers in the domain [-1.5, 1.5] as in our earlier tutorials. You can think of quantum circuit generative models as enabling continuous parametrization of a discrete probability distribution with 2^n possible outcomes – something, that is not easily done classically. While it may seem that only being able to model discrete probability distributions is a limitation, but in fact there are many potential use cases for generative models on discrete data, e.g., in finance applications.

A dataset for quantum computers

So lets think about a potential dataset for our generative model with qubits. Last time, the data looked like this:

data_samples = np.array(
      [-3.61078109e-01, -1.29291442e-01, -9.82588388e-02, -2.58432923e-01,
       -4.97863965e-01, -4.78795061e-01, -5.10663163e-01,  4.36223299e-02,
       -1.01858250e-02, -5.57657880e-01, -5.61011437e-01, -2.63656495e-01,
       -5.00136079e-01, -6.30125377e-01, -1.12999295e-01, -3.22838879e-01,
       -7.75881873e-01,  1.68190537e-01, -2.69496934e-01, -3.04754810e-01,
       -2.62099948e-01,  1.36922064e-01,  3.93391710e-01, -8.12181958e-01,
       -3.15903057e-01,  1.10533721e-01, -2.50864695e-01,  6.56386891e-02,
       -2.37497275e-01, -6.04398679e-01, -5.32560555e-01, -1.62669444e-01,
       -4.45701188e-01, -1.67952827e-01, -5.07722157e-01, -8.51854037e-02,
       -1.35821066e+00, -3.39440908e-01, -6.41656480e-01, -9.51452941e-01,
       -8.23083293e-01, -5.69844420e-01, -1.04400576e-01, -3.71350420e-01,
       -8.65458568e-01, -2.64222047e-01,  8.06578890e-04, -5.68337779e-01,
       -6.25077227e-01, -1.00012510e+00,  7.59518951e-01,  2.46827712e-01,
        5.70561107e-01,  6.52003162e-01,  6.73384341e-01,  8.04894781e-01,
        6.34541410e-01,  3.63315273e-01,  4.36242632e-01,  3.31473086e-01,
        5.18170902e-01,  6.00943305e-01,  7.09919182e-01,  5.42156544e-01,
        5.96010036e-01,  6.32350524e-01,  5.11792431e-01,  7.26352234e-01,
        5.24889933e-01,  6.33500447e-01,  7.76251971e-01,  7.53647980e-01,
        3.36153030e-01,  8.15441364e-01,  5.57272462e-01,  1.44661947e-01,
        6.16792976e-01,  6.91086153e-01,  6.87286823e-01,  3.98429153e-01,
        1.07054311e+00,  6.24690432e-01,  6.84406406e-01,  7.26905590e-01,
        3.09651946e-01,  7.78085930e-01,  3.60425786e-01,  6.33481589e-01,
        3.17310855e-01,  6.56363955e-01,  6.10979533e-01,  8.08938177e-01,
        7.71336408e-01,  6.11452930e-01,  5.03192825e-01,  5.66868324e-01,
        7.22434360e-01,  5.64687162e-01,  5.11510302e-01,  7.02255988e-01]

fig, ax = plt.subplots(1, 1)

x_points = np.linspace(-1.5, 1.5, 1000)

    label="Training Data",

ax.set_ylabel("Counts", fontsize=16)
ax.set_xlabel("Data Value", fontsize=16)

We already suggestively plotted a histogram of the continuous data with 16 bins. What we are going to do now is to count the number of data samples in each bin, and assign each bin to one of the 16 possible binary outcomes of a quantum circuit. It is most comforable to assign them in the order of ascending binary numbers, i.e., 0000,0001,0010,…,1110,1111.

# get the data in 16 bins and extract the counts
counts, bins = np.histogram(data_samples, bins=16)

# generate the 2^4 = 16 discrete outcomes as bitstrings
string_outcomes = [
    bin(ii)[2:].zfill(number_of_qubits) for ii in range(2**number_of_qubits)

# group counts and strings into a dictionary
training_data_dict = dict(zip(string_outcomes, counts))

{'0000': 1, '0001': 0, '0010': 2, '0011': 4, '0100': 4, '0101': 10, '0110': 7, '0111': 9, '1000': 7, '1001': 5, '1010': 3, '1011': 8, '1100': 14, '1101': 19, '1110': 6, '1111': 1}

This new form of the data looks very familiar to qiskit practitioners. It represents a discretized form of the continuous dataset, and can in fact be plotted together with our quantum circuit outcomes:

fig, ax = plt.subplots(1, 1)

x_points = np.linspace(-1.5, 1.5, 1000)
    / sum(training_data_dict.values()),  # turn the counts into probabilities
    label="Training Data",
    / sum(model_dict.values()),  # turn the counts into probabilities
    label="Model Data",

ax.set_ylabel("Probability", fontsize=16)
ax.set_xlabel("Outcome", fontsize=16)

While we are able to plot the training data probabilities and our empirical quantum circuit measurement probabilities together, the distributions clearly don’t match at all. In the previous tutorials, we could make educated guesses for how to parametrize the gaussian model to match the gaussian data, but would you know how to do that with a parametrized quantum circuit? No, that is too hard to know in general and for arbitrary datasets. This is where training of the model parameters comes in. For that we need a loss function.

A loss function for generative modeling with qubits

We remember the KL divergence from the last tutorial:

    \[\text{KL}(p_{data}, p_{model}) = \sum_{x\sim \mathcal{D}} \left[ p_{data}(x) log(p_{data}(x)) - p_{data}(x) log(p_{model}(x)) \right]\]

We are now working with discrete x as well as dictionaries of measurement counts. So, we will have to define a KL divergence loss function that accepts these formats:

def kl_divergence(data_distribution, model_distribution):
    # we will require a small offset "e" to avoid possible log(0) singularities
    e = 1e-6

    # make sure the values in the dictionaries add up to 1
    sum_data = sum(data_distribution.values())
    sum_model = sum(model_distribution.values())
    for bin_outcome, count in data_distribution.items():
        data_distribution[bin_outcome] = max(count / sum_data, e)
    for bin_outcome, count in model_distribution.items():
        model_distribution[bin_outcome] = max(count / sum_model, e)

    # calculate the first term in the KL divergence formula
    term1 = 0
    for bin_outcome, prob in data_distribution.items():
        term1 += prob * np.log(prob)

    # calculate the second term in the KL divergence formula
    term2 = 0
    for bin_outcome, prob in data_distribution.items():
        term2 += prob * np.log(model_distribution.get(bin_outcome, e))

    return term1 - term2

And measure our initial loss given the dataset, our quantum circuit, and a random parameter guess.

kl_divergence(training_data_dict, model_dict)

Wrapping it up

As expected, the KL divergence is not zero and in fact quite large. In the next tutorial, we will apply a gradient descent algorithm to reduce the KL divergence. Furthermore, we will study the role that entanglement plays in this algorithm and how it affects training performance.

To summarize what we saw in this tutorial, we implemented a quantum circuit generative model on a few qubits using qiskit. Because of the binary nature of qubit measurements, quantum circuit generative models operate on datasets consisting of discrete data. Interestingly, we were able to transfer the continuous dataset from the earlier tutorials. We simply transform it into a dataset that’s suitable for quantum circuits. However, the initial KL divergence value between the training dataset and samples from our quantum circuit was still quite large. See you soon when we train this quantum circuit model!