An introduction to Quantum-inspired optimization using Azure Quantum

28 min readOct 4, 2022

This article is about quantum-inspired optimization methods available on Microsoft Azure Quantum that enhance classical algorithms by emulating quantum effects. The Microsoft QIO algorithms run on classical hardware like CPUs. Since they don’t need any quantum computers to work, such methods can be used today, before the arrival of fault-tolerant quantum computers, to solve industrial combinatorial optimization problems that can be formulated as PUBO, QUBO or Ising problems. This article offers an introduction, a worked example with Python code and discusses further aspects like relationship to quantum annealing and quantum computing.

Introduction

First let’s define what we mean with optimization.

Optimization

Our goal is to find an optimal solution to a business problem from a set of candidate solutions. Optimal may mean for instance highest profit, lowest cost, biggest savings, lowest resource consumption or more generally, the ‘best value’ of a particular metric.

Example: Imagine a sightseeing holiday trip in a region of France. The goal: minimize total petrol (or diesel or electricity) cost while visiting N sites exactly once. This might be of special value in times of high energy costs! In this case we can define a cost function that depends on factors like petrol price, road distance between towns and villages and the particular travel route.

What’s the motivation?

Optimization problems can be sometimes solved exactly, e.g. using integer linear programming for combinatorial problems. However, solvers like Gurobi [1] have their limitations in terms of the size of problems they can solve exactly. Where exact solutions are not possible anymore or where a brute force search for an optimal solution takes too long or does not come sufficiently close to an optimal solution, approximate solutions based on specific heuristics can come to the rescue.

The question then arises what problems we may want to solve. Some famous types of combinatorial optimization problems are the minimum spanning tree problem (MST)[2], the knapsack problem [3] and the travelling salesman problem (TSP) [4]. For bigger ‘sizes’ of such and related problems, an exhaustive search for the best solution is not tractable (i.e. we might not find such solution in years or in our lifetime -:). However, as already mentioned, we can resort to approximation algorithms. Various real-world problems can be mapped to types of problems as mentioned above. For example, our holiday route optimization problem can be mapped to the TSP problem. Discrete optimization problems arise in multiple industries including telecommunications, with the logistics sector as a prime example.

Why things can take a long time

To find the best permutation of a sequence of 18 items where ‘best’ is measured by some metric, we would have to look at 18! permutations. Typing in 18! / 10⁶ / 60 / 60 / 24 / 365 into Google Search returns 203: If we could evaluate the metric for 1 million permutations per second, it would take us about 203 years to do so if we computed around the clock. Imagine you have 18 books on your bookshelf, and you don’t like the look of the arrangement. You could try a surprisingly long time to find the optimally pleasing arrangement…. With 4 books, we have 24 possible arrangements. With 18 books, we get about 6,402,373,000,000,000.

What are quantum-inspired optimization algorithms?

They are algorithms that run on classical compute resources like CPUs, and they emulate major phenomena from quantum physics. An example is to emulate adiabatic quantum computing [5]. We will use QIO in short to refer to quantum-inspired optimization.

Can we use quantum computing for optimization?

Where it takes far too long to compute an exact solution, future quantum computers can have the potential to speed up solution finding. See e.g. the QAOA framework (Quantum Approximate Optimization Algorithm) [6]. In the absence of fault-tolerant quantum computers of sufficient size (number of logical qubits and quantum volume) as is the case today, quantum-inspired optimization offers a benefit available already now, on classical computers.

Could we use Quantum Annealers for optimization?

The answer is yes (of course), as long as the problem specification is supported by the quantum annealer. For example, using D-Wave is an option for QUBO problems (we come to that later).

This leads me to summarize the arguments why one would use QIO today.

Arguments in favour of using QIO today

QIO is an option when a classical optimization solver starts to get exhausted on the problem at hand.
It could allow us to find a solution faster, all else equal.
QIO could find a better approximation result than other algorithms with the same computation time allotted.
It might also permit us to make a more realistic model of the original problem.
QIO methods can be used now, even for industrial-size problems, as they are not requiring any quantum devices [12].

QIO methods also have limitations

That’s not too surprising: nearly everything else in life has some limitations! The algorithms are heuristics, so not guaranteed to find the optimal solution. Moreover, QIO algorithms may not always outperform other optimization methods.

Who provides QIO

Microsoft Azure Quantum [7] provides cloud-based QIO methods through Azure Quantum ‘providers’. Example providers as of Oct 2022 are 1QBit’s 1Qloud, Toshiba’s SQBM+, and Microsoft QIO. This article is based on experience gained with Microsoft QIO using Azure Quantum.

The Microsoft QIO provider [8] in turn offers different ‘targets’, standing for different target solvers and thus algorithms. Example algorithms (including some classical ones) are Simulated Annealing, Parallel Tempering, Quantum Monte Carlo, Substochastic Monte Carlo and Tabu Search. The list of algorithms is available at [8].

Finding a global minimum for a cost function and useful terminology

Before we look deeper, let’s clarify some language used in this context.

System (or problem) configuration

This is an assignment of values to the problem’s variables which we call for convenience x₀, x₁ …. An array of x variables can also be referred to by a vector x. In the example of our holiday trip, a configuration might represent a particular route taken throughout the holiday where the xᵢ refer to towns and villages. Each problem configuration comes with a particular cost (e.g. total expense regarding petrol, diesel or electricity).

Cost function

This is a function of a system’s configuration. Cost can be low or high, and we want to find the system configuration x that minimizes the total cost H(x). From time to time, we might also want to maximize a ‘profit’ function H’(x), but that is the same as minimizing the negative profit function H(x) = -H’(x). So, in general, we are good with a cost function that we can minimize.
A cost function might consist of one or more components: The mandatory component is an objective function h(x), while optional components are functions that represent constraints cᵤ(x) arising from the business problem.

In the example of the holiday trip, the objective function might measure the total spend on petrol from using a rental car. The constraint function would penalize routes that lead to cities, towns and villages being visited multiple times (once is good enough for us).

Constraint

This is a relation between variables that must hold for a solution to be valid. For example the difference between x₂ and x₁ should always be at least 2: x₂ — x₁ ≧ 2.

Search space

It represents all the feasible, valid solutions to the problem (whether they are optimal or not). The solution with the lowest cost represents the optimal solution at a global minimum of the cost function. A solution will be one of the possible, but also valid system configurations (e.g. a valid driving route for our holiday). Invalid configurations (like a route where we forget to visit one of the desired locations or visit the same places multiple times) can be excluded from the search space.

Optimization landscape

If we imagine a cost (say petrol cost) for each valid problem configuration (holiday driving route or sequence of locations visited), we get an optimization landscape. Getting helicoptered into this landscape, we can start walking down the hills in search of the absolute lowest point in all valleys, a global minimum in the cost function and in the landscape. If our problem description involved just two continuous variables x₁ and x₂, the cost could be plotted as a 3rd dimension on top of the plane in which the two variables live. The optimization landscape would look like a 3D model of a mountainous area. That’s just a rough analogy as in our case, we have discrete configurations, not a continuum of configurations.

3D landscape of cost plotted over two continuous variables with local minima and a global minimum — An optimization landscape with local minima and a global minimum. Source: the author.

Search strategies

Regarding strategies for how to conduct the search for such a global minimum, (lowest point in all valleys of the landscape), one can imagine many: e.g. not just searching as a single person (called a walker), instead use multiple walkers simultaneously. Let them stay in contact via cell phones and coordinate their search. Walkers might use a jetpack to propel them high into the air to glance over a mountain range to see whether there is a deeper valley just across the mountain (called thermal jumps). Or a walker might be able to construct a tunnel through a mountain to get to the other side if it doesn’t seem a good use of resources to speculatively climb up a mountain range just to see whether any deeper valley (it might just be another local minimum) is somewhere in the neighbourhood (emulating quantum tunnelling through an energy barrier).

Classical strategies to navigate in simpler-shaped cost functions (e.g. pretty convex ones) include gradient descent, as used in machine learning to minimize a cost function there.

PUBO and QUBO

First, let’s consider our starting point.

Binary optimization problem

In this form of optimization problem, the values of the variables are restricted to only two values, e.g. 0 and 1. The cost function H(x₀, …) is a function of those N variables and we want to minimize this cost function H, in general subject to some equality or inequality constraints, like g(x₀, … ) = 0, or f(x₀, … ) > 0. Such constraints like x₁ > 0 restrict the values the variables can take.

A hypothetical example of a cost function might be
H(x₀, x₁, x₂) = 6 -2x₀ + 3 x₀x₁ + 4 x₁/(x₂ -x₀ +1) -4x₂, with
f(x₀, x₂) = x₂ — x₀ +1 > 0. This function has a global minimum of
H = 0 at (1, 0, 1).

A table that shows values of the cost function H(x) based on binary value assignments to variables x0 to x2. — Cost function values H(x) for different variable assignments. Binary optimization problem.

Polynomial Unconstrained Binary Optimization (PUBO)

Now we restrict the freedom of this cost function to be of the PUBO type. We define

H(x) = ∑ᵢ wᵢ pᵢ ( x₀ , x₁ , … )

H is a weighted sum of various products of the x variables. In addition, we say, there are no constraints on the x variables like the functions f(x) and g(x) above. H then represents a polynomial in the binary x variables which are not restricted themselves in any way (they are free to take any of the, well, not so many, binary values). This is a Polynomial Unconstrained Binary Optimization problem (or PUBO in short). An example would be:

H(x₀, x₁, x₂) = 4 –3x₀ + 3x₀x₁– 6x₀x₁x₂ + 2x₂.

H(x) is minimal at x = (1, 1, 1) and is a polynomial of degree 3.

This table shows the cost function values H(x) for different assignments of binary values to variables x0 to x2, for a PUBO problem. — Cost function values H(x) for different variable assignments. PUBO problem.

Quadratic Unconstrained Binary Optimization (QUBO)

We can further restrict the form of the cost function to be a polynomial of degree 2 only. Then the problem is called a Quadratic Unconstrained Binary Optimization problem (QUBO).

Ising model

When we have a PUBO formulation where the binary variables take values from (-1, 1), we may think of this as an Ising model.

Which QIO algorithms are best for my problem?

Do we get any guidance on the suitability of particular algorithms for a problem at hand? According to Microsoft, their QIO algorithms are suitable for particular types of optimization problems:

The business problem should be mappable to a discrete combinatorial optimization problem.
The cost function needs to be expressible as polynomial over binary variables to get a PUBO, QUBO or Ising problem (as we just explained).
Moreover, the optimization landscape should be rugged, but structured, i.e. not too simple. The number of variables should not be trivially low, but e.g. considerably larger than 100.

Process of working with QIO

How do we want to go about solving a real business optimization challenge? Next some suggested steps one can follow in a real project.

Workflow for quantum-inspired optimization

1. Formulate the business problem. What shall be actually optimized so that business value gets created from the optimization?

2. Write up the objective and any business constraints of the business problem. This can be done in plain English or using a bit of mathematical notation.

3. Convert above objectives and constraints into mathematical functions of binary decision variables which take binary values (0, 1) or (-1, 1). The objective function should be a function of say N variables (the xᵢ variables we used above). An assignment of values to those variables defines a problem configuration and will have an associated cost, reflected by the objective function h(x). Constraints are modelled e.g. as penalty terms cᵤ(x).

4. Tweak above if necessary, to make all nicely conform to a QUBO, PUBO or Ising mathematical formulation.
E.g. the objective function shall be minimized, penalty terms (which model constraints) shall be minimized for the most desirable cost variable assignments that don’t violate the business constraints (then no penalties shall apply).

5. Combine objective function h(x) and constraint terms cᵤ(x) into an overall cost function H(x) which we want to minimize:

H(x) = h(x) + ∑ᵤ αᵤ cᵤ(x).

Note that we need to combine the constraints cᵤ using some weight parameters, here called αᵤ.

6. Code the total cost function H(x) in Azure Quantum using Python and QIO programming artifacts:

a. Define the terms of the QUBO/PUBO formulation

b. Collect the terms into a QIO Problem.

7. Select a particular solver (thus QIO algorithm like Quantum Monte Carlo), optionally also set parameters for the solver, in case a parameterized version of the solver is chosen.

8. Submit a job to the relevant backend on Azure Quantum.

9. Retrieve the result that will be available in a general, problem-agnostic format.

10. Map the result back into the business domain.

11. Validate the solution. Is the suggested solution a valid one?

12. Interpret results.

Conversion of business problem, objective and business constraints into QUBO/PUBO format

Once we have a description of what the optimization objective is and what business constraints we want to apply, we need to map this to a mathematical format.

Microsoft QIO algorithms expect a binary optimization problem as input. This means that the problem variables should all be binary variables xᵢ.
The influence of such a variable xᵢ on the cost is determined by some associated real-valued weight wᵥ, which is associated with the product term pᵥ in which the variable is showing up.
A term is a product like wᵥx₁, or wᵥx₁x₂ or wᵥx₁x₂x₃ etc. for several variable index values and a weight wᵥ.
The total cost is the sum of all terms. As one can imagine, the total cost is then a polynomial of a particular degree (e.g. 2 for a quadratic polynomial).

Example of a QIO problem and solution

Let’s follow the above numbered process steps using a small example.

1. The business context: selection of projects. We have a set of N candidate projects. Each project requires an (integer) amount of resources rᵢ and promises to deliver a commercial benefit bᵢ. Given we have a total fixed amount of resources R at our disposal, we need to decide which projects to select from the candidate pool in order to maximize the business benefit.

2. Objective: Maximize the total benefit over the possible selection of projects. Constraint: The total amount of required resources shall be ≦ R.

3. We can map this simple problem from the business domain to a knapsack problem with binary variables xᵢ with values from {0, 1}. Often we can use the binary variables as membership indicator variables. Here, for xᵢ:

xᵢ = 1, if project Pᵢ is in the final selection, else 0.

First, we write up the object function h(x).

h(x) = -∑ᵢ bᵢ xᵢ

h(x) is the negative of the total business value from the selected projects. If we later minimize h(x), we automatically maximize -h(x), which is the total business benefit.

Second, we translate our single business constraint c₀ into a PUBO formulation as well. The total amount of required resources shall be ≦ R.

From Andrew Lucas [9], [10], we get a recommendation how to formulate this using some auxiliary binary variables to help us:

This image shows a number of mathematical formulas to define the constraint term c0.

How can we interpret this? Assume our fixed amount of resources R = 17. Log₂17 = 4.087463, rounded down gives M = 4. The highest number that can be represented with 4 bits is 15. If yM is 1, the term (R+1–2⁴) = (17+1–16) = 2. In total we then get W= 2 + 15= 17, which is our resource budget R.

If all auxiliary variables were set to 1 at the same time, the above formula would thus read:

The variable W can therefore take any integer values between 0 and R, just depending on the M+1 auxiliary binary variables y. The polynomial c₀, which represents a non-negative penalty, has the following effect: The optimization can settle at zero penalty in a configuration (variable assignment) such that W is of integer value and in the range [0, R], if the sum over the resources of the selected projects equals that value W. In other words: If the optimization procedure settles in a different configuration, where the actual sum of resource amounts is larger than our fixed budget R, c₀ will turn into a positive-valued penalty (due to the squaring), as long as the weight A > 0.

4. Nothing to do in this step. The reason is that we don’t need to simplify further our cost function H, as Microsoft QIO offers a construct to define squared linear combination terms.

5. Combine everything into the total cost function H:

The image shows the mathematical formula for the complete cost function H(x).

6. Code the cost function on Azure Quantum. Below shows an illustration of how above cost function could be implemented.

Python code to create the knapsack problem.

Next we create the objective function h(x).

Python code to create the objective function h(x). This will be part of the overall cost function H(x).

The following code creates the penalty terms,

Python code to create constraint or penalty terms for the cost function H(x).

and the RHS inside the squared term:

Further Python code to complete the penalty terms in the cost function H(x).

In above code, an instance of a Problem is created based on the terms and then the problem description is expanded with the squared linear combination term, using the weight A. Next we define a simple instance of the project selection scenario:

Specification of an example optimization problem: Optimal selection of projects based on a fixed resource budget. Goal is to maximize benefits.

7. Select a particular solver. In below code, we use 3 different types of solvers, whereby we use Quantum Monte Carlo twice with slightly different parameter settings.

Python code to select the QIO solvers that shall be used to solve the optimization problem.

8. + 9. Submit a job to the backend and retrieve the results.

Python code to call the optimize method for each solver and hand over the problem specification.

10. + 11. Map the results back into the business domain and validate results. Usually, this can be achieved with a bit of helper code like shown below.

Python code to present a summary of the results found by the quantum-inspired optimization algorithms.

Code to print the detailed optimization results.

Final piece of Python code to display the results returned by the optimization solvers.

12. Interpret results: We find for this exercise that Parallel Tempering and Substochastic Monte Carlo find the same solution. They select the same six projects and max out the resource budget of 50. Quantum Monte Carlo also maxes out the resource budget, selects one more project, but falls short 2 or 1 points regarding benefits. Thus, its solutions are not optimal here.

Table of results returned by the algorithm Parallel Tempering.

As per above, the solver ParallelTempering selects 6 projects, achieves a total benefit of 30 and consumes the full resource budget of R = 50.

Table of results returned by the algorithm Substochastic Monte Carlo.

The solver SubstochasticMonteCarlo finds exactly the same solution.
Let’s see what Quantum Monte Carlo with the first parameter settings finds. From below we see it finds a solution that consumes the full resource budget R=50, but produces only benefits of 28.

Table of results returned by the algorithm Quantum Monte Carlo using a first choice of parameter settings.

What happens if we modify the parameter settings for the Quantum Monte Carlo Solver:

Table of results returned by the algorithm Quantum Monte Carlo using a second choice of parameter settings.

The solver returns a solution that again makes use of the full resource budget, and is only 1 point below the maximum benefit of 30 which the first two solvers found.

Other aspects of interest to developers

Is there any help around for converting a business problem to a QUBO/PUBO formulation?

There is a paper often referred to by Andrew Lucas [9]. It gives some tips and hints regarding how to find a formulation for problems of a particular complexity class, namely Non-deterministic Polynomial time (NP).
Some example problems are Set Cover, Knapsack with integer weights, graph colouring and satisfiability.

Microsoft QIO Solvers — which ones to use

Microsoft offers different solvers, for example Substochastic Monte Carlo. Often two versions are offered: The first one is parameter-free, the 2nd one is parameterized and comes with parameters for which the developer can accept default values or set problem-specific values. I understand that the parameter-free solvers automatically determine decent parameter values at runtime, thereby increasing the overall runtime, but reducing the experimental overhead the developer has otherwise to face. Microsoft recommends starting with parameter-free simulated annealing first, then try parameter-free parallel tempering before moving to other solvers or even parameterized solvers.
And to remember: Microsoft QIO solvers support Ising-type problems (variable values +1, -1) and binary unconstrained optimization problems (variable values 0, 1) of quadratic/polynomial nature.

Decisions the modeler has to make

As developer, one has to make a number of decisions when working with QIO algorithms:

Select a QIO algorithm, e.g. Parallel Tempering
Decide whether to use the parameter-free version of an algorithm or a parameterized version of the same
Decide on ‘hyperparameters’ of the algorithms in case of a parameterized solver
Decide on other quasi-hyperparameters in the QUBO/PUBO model itself: weights of cost function terms that represent e.g. penalties for different constraints. We referred to them as αᵤ in above.

A number of decisions a developer or modeler has to make when using QIO.

Challenges regarding quantum-inspired optimization

I note a number of aspects which I consider challenges for application developers:

There are several solvers, similarly to there being multiple optimizers in TensorFlow and PyTorch for machine learning. Which solver to select? The answer is not obvious. Some guidance is provided in Azure Quantum, however, chances are you end up experimenting with several ones.
The best solver for a given business problem is not known in advance. What the best solver is will depend on what we want to achieve (e.g. an approximate solution closest to the theoretical optimal one, or the best solution possible within a fixed permitted runtime for any algorithm, which would be relevant if an optimal solution had to be found in near real-time).
Selecting weights for constraint and penalty terms: In above example, we had to select a value for weight A, which corresponds to parameter α₀ of the constraint c₀. In general, it is not terribly clear, how to best find good values for one or more of the weights αᵤ in

Hardware and software environment for Microsoft QIO

Microsoft QIO solvers run on classical hardware, namely CPU, not on quantum devices. As per Oct 2022, use of FPGAs has been deprecated.
Regarding the software environment, the Microsoft QIO solvers can be used via a Python SDK. To be able to use any of the solvers, a developer needs to get access to Azure Quantum. In there, the Microsoft QIO provider is enabled per default.

Using different QIO solvers

Simulated Annealing in QIO as a starting point

One might say this method is a version of a Metropolis Monte Carlo approach.

Metropolis and others proposed an iterative algorithm to study the behaviour of large amounts of interactive atoms or molecules. For this, they took inspiration from the annealing process in metallurgy. Let’s have a look at that process for a moment.

Annealing in metallurgy is a heat treatment technique that involves heating a metal or alloy up beyond a predetermined, critical temperature and holding it there for a while. This causes property or structural changes to the material. Next, the material is cooled down slowly in a controlled way to retain those newly gained material properties or structural changes. The goal is to increase ductility, reduce hardness and brittleness of the material e.g. to make it more workable.

Heating metal to melting point and then cooling it down, an annealing process in metallurgy. — Metallurgy, Image by UnifiArt from Pixabay

This leads us to the notion of annealing schedule: This is the plan how we change the temperature T as part of the physical process (or algorithm). Sometimes, annealing schedules use the inverse of the temperature T, called beta. A schedule might also include the number of rearrangements of all variables {x} to seek equilibrium at each temperature T.

Back to Metropolis and his suggestions from research: To find the state of lowest energy of a solid substance, first heat it up to melting point, then very slowly cool it down, say in N discrete steps. After each temperature reduction like T₃ → T₄, one needs to wait for the substance to reach an interim equilibrium. Then, when T has dropped to 0, the system will be in a state or configuration of lowest energy. Metropolis and his colleagues, working in statistical mechanics, turned this physical approach into an algorithm to find the lowest energy state for a system of large numbers of atoms or molecules.

We can apply this same principle to combinatorial optimization: For this, we replace the energy of a system by the cost of our problem (represented by a cost function). What about the large number of atoms and molecules? We replace them by a large number of variables in a combinatorial optimization problem. Through this, we end up with an algorithm called Simulated Annealing. The notion of slow cooling is interpreted as a slow decrease in the probability of accepting worse solutions as the solution space (optimization landscape) is explored.

Pseudocode for simulated annealing

It is illustrative to understand simulated annealing and where some quantum-inspired enhancements can be made (in italic), using a bit of pseudo code.

‘Better’ in above code means, that the new candidate solution has a lower cost function value than the current solution. A loop as used in above algorithm is rather typical. In optimization, we aim for iterative improvement and in statistical mechanics this would correspond to microscopic rearrangement processes leading to lower energy states corresponding to lower cost function values in software algorithms.

Sometimes we come across the notion of a Monte Carlo sweep. This can refer e.g. to changing binary variable values one by one (single bit flips) for all variables from a given current configuration. Mₑ total sweeps would then be conducted to get the system, at a given temperature, into an equilibrium state. Only after that one would reduce the temperature according to an annealing schedule.

Above is reflected in the parameterized Simulated Annealing solver which has the following key parameters: (Monte Carlo) sweeps, beta_start (1/maximum starting temperature), beta_stop (1/lowest ending temperature), and restarts (number of annealing schedules with random start configurations to execute in parallel). In contrast, parameter-free solvers come with a single mandatory timeout parameter and stop conditioned on a timeout setting or when convergence on a solution is detected.

The parameterized solver is set up in a simple and elegant way:

from azure.quantum.optimization import SimulatedAnnealing

solver = SimulatedAnnealing(workspace, sweeps=4, beta_start=0.1, beta_stop=1, restarts=72, seed=21)

Meaning of Quantum Inspired

We may now ask: QIO stands for quantum-inspired optimization. What is meant with that? In above, quantum-inspired means that ideas from quantum physics like potential fields or quantum tunnelling across a barrier are used to implement a heuristic algorithm.
QIO takes a standard classical algorithm to begin with such as simulated annealing and then modifies the algorithm by using ideas borrowed and adapted from quantum physics.

Ways to turn algorithms like Simulated Annealing into quantum-inspired algorithms

Researchers and scientists active in this field could give you all the detailed answers. Here just some high-level indications:

The strategy for the evolution of the simulation: e.g. an annealing schedule, how to modify parameters as the simulation evolves (e.g. reducing temperature, increasing other parameters).
The number of simulation steps at each temperature step to achieve a steady state or new equilibrium at that given temperature.
The probability and mechanism to make exploratory transitions in a worse direction.
How many walkers to use to explore the optimization landscape in parallel.
How many replicas of a simulation to run with different random initializations.

Let’s briefly mention a couple of other algorithms available in Microsoft QIO.

Population Annealing

It is built on Simulated Annealing. The intent is to improve over it by using multiple walkers to explore the optimization landscape (i.e. a population). These walkers explore neighbourhoods of their current locations.

An example is:

solver = PopulationAnnealing(workspace, sweeps=300, beta=RangeSchedule(“linear”, 0, 7), population=64, seed=21)

A further algorithm is …

Parallel Tempering

It is also derived from Simulated Annealing, however it has no temperature annealing schedule. Instead, the idea is to create replicas of a system and run each at a different temperature. The algorithm still uses thermal energy jumps to help getting out of local minima.

Below is the solver definition for 4 replicas, 3 sweeps and particularly chosen betas for the replicas.

solver = ParallelTempering(workspace, sweeps=3, all_betas=[1.17, 2.20, 3.50, 4.32], replicas=4, seed=19)

Quantum Monte Carlo

This is a solver that takes inspiration from quantum physics indeed. It is related to quantum annealing, but it starts already at a low temperature. New to the approach is that it can search across cost function barriers emulating quantum tunnelling with a probability that depends on an external perturbation that is reduced in strength over time. The algorithm is based on the Wolff algorithm [11], which is an algorithm for magnetic spin simulation. Quantum Monte Carlo works apparently well, when the cost function has tall, narrow barriers (through which one can successfully tunnel). However, the crux of the problem is how to assess the (most often pretty unknown) cost function structure to take such well-meant advice into account.

An example solver might be defined like this:

solver = QuantumMonteCarlo(workspace, sweeps = 4, trotter_number = 10, restarts = 64, beta_start = 0.2, transverse_field_start = 12, transverse_field_stop = 0.1, seed = 20)

A further quantum-inspired algorithm is …

Substochastic Monte Carlo

It emulates some quantum tunnelling through diffusion and resampling. Like Population Annealing, it uses a population of walkers which is dynamic, i.e. walkers can get removed or duplicated based on the progress they are making.

From a programming point of view, the QIO solvers have a simple API. As we have seen from above code snippets, it provides an optimize method that expects a Problem object. The optimize method then submits a job to the cloud to solve the problem. It also polls the status until the job execution has finished and then returns a JobOutput object. This contains the results in a problem agnostic format. Parameter-free solvers aim for finding best parameter settings on their own without developer input. If we are curious to see what the parameter settings were that the algorithm has used, those settings are revealed in a return JSON data structure. We could then take these settings to perform some further fine-tuning, using the parameterized version of the algorithm.

High level summary of the process

Get an Azure Quantum workspace
Set up a business problem as a quantum-inspired optimization. Cast the business problem into the mathematical format that a QIO solver understands: PUBO, QUBO, or Ising.
Code the mathematical problem using Jupyter notebooks in Azure Quantum, Python and QIO programming artifacts.
Select a solver, set its parameters and if offered, select an execution environment.
Submit the problem to the selected solver for optimization.
Once the solver has finished, take the response configuration and translate it back into the business domain.
(Optionally) validate the result.
Interpret the result.

End-to-end high-level process diagram for quantum-inspired optimization.

How is QIO similar to and different from Quantum Annealing (QA)

We can solve QUBO problems with both QIO and Quantum Annealing.

Temperature in Simulated Annealing and in QIO algorithms derived from it plays a similar role as the transverse tunnelling field in QA. It helps to cross cost function and energy barriers. In addition, the effects of a transverse field in QA can be emulated in QIO.

With QIO, we don’t need to be quantum experts to solve an optimization problem. With QA, I argue, we don’t need to be quantum experts either thanks to great software tools, though having some understanding of quantum mechanics might be helpful. This is different to working with universal quantum computers and quantum algorithms, where a solid understanding of quantum information theory is a prerequisite.

Most importantly, for QIO, we don’t need a quantum device, while for quantum annealing we typically do. Regarding quantum versus classical hardware: Fair to say, that for example D-Wave supports a hybrid solver: part of the code runs on classical hardware, and another part runs on the QPU (quantum processing unit).

QA requires mapping of a QUBO problem onto a fixed hardware structure of qubits, which are interconnected in a very specific way. This embedding into the given hardware can be complex. In QIO, no embedding is required.

As of today, QIO might be better for quick response times (not necessarily real-time optimization challenges) compared to QA. In contrast to general cloud computing resources like CPUs, quantum devices are still a scarce resource and thus jobs will get submitted into a queue and that queue could be long if demand is high.

In QIO, we have a non-physical ‘system’ comprised of binary variables. Every system configuration (assignment of binary values to the variables) has an associated cost. Thus, the cost is a ‘landscape’ over the system configurations and we seek to minimize it. The reason why we want to minimize the cost function is directly related to our business optimization goal.

In QA, we have a physical system comprised of quantum objects (the qubits). Every system state (over the collection of qubits) has an associated energy. The energy is also a ‘landscape’ over quantum states, which we may want to minimize (in quantum terminology akin to finding the system’s ground state). However, this only makes sense, if we associate the ground state of the quantum system to an optimal solution of a business problem, which we hopefully can do.

An optimal solution in QIO is a binary assignment of values to N variables, like 0 and 1. An optimal solution in QA is the settling down of M qubits (which were originally and individually in superposition) into states |0⟩ and |1⟩.

Finding a global minimum versus a quantum system’s ground state

The question then is: How does QIO find a global minimum in a cost function, and how does QA find the ground state of a quantum system, whereby the system configurations at the global minimum and equally in the ground state correspond to the same optimal business solution?

Here is, where things become rather different.

In QIO, we set out on a search, exploring the optimization landscape, essentially by walking and hiking through this landscape. We climb down a hill in the hope to end up in the deepest valley holding up the flag “exploitation”, sometimes we peek over a small mountain holding up a flag “exploration” to see whether lower valleys are behind those mountains. Sometimes, with magical powers and inspiration from quantum-mechanical effects, we might dig a tunnel through a narrow steep mountain face to get to the other side (like tunnelling through energy barriers).

In QA, we apply a very different trick. We use what is called the Adiabatic Theorem. It roughly states, that when we perturbate a quantum system sufficiently slowly over time, it will adapt to this external influence through changing its configuration. Importantly, the system will remain in its original (instantaneous) eigenstate (like a minimum energy ground state) under some further conditions. So we look at an adiabatic process between times t₀ (when we start the annealing) and t₁ (when we finish the computation). For a video from D-Wave explaining this process, see [18].

From above, we said that we will associate an optimal business solution to the ground state the quantum device shall be in at time t₁. But we don’t know the optimal business solution or the desired ground state at time t₁ in advance. However, what we can know in advance is the ground state of a much simpler system configuration.

Thus, the trick is to let at time t₀ our quantum device take a configuration that describes a much simpler system in its ground state. Then we apply the adiabatic process, which slowly changes the system configuration of the same quantum device into the system configuration we target for time t₁. All the while during this process, our quantum system will hopefully remain in its (time-varying) ground state and avoid hopping into any excited states. At time t₁, the system configuration, being in the minimum energy state aka ground state represents the optimal solution to our business problem.

Side note: Hamiltonian

Above we said that there is a degree of analogy between the cost in QIO and the energy in quantum annealing. No wonder, that there is a small step towards drawing an analogy between the cost function and what is called a Hamiltonian H(t). H is a mathematical operator that corresponds to the total energy of a quantum system. We find it in the Schrödinger Equation. If we think of H as a matrix, its set of real-valued eigenvalues is the set of possible outcomes when we measure the total energy of the quantum system. The ground state of a quantum system has total energy that equals the lowest eigenvalue of the system’s Hamiltonian. So, the cost function of a general QUBO problem can be related to a Hamiltonian, which represents the energy of a system.

Final thoughts

Could QIO algorithms in the future also run on quantum devices?

Microsoft would say yes. The algorithms in QIO have typically a random walk in them (a walker explores the optimization landscape). In the future, one could replace such a random walk by what is called a Szegedy quantum walk and implement it on a quantum device. This could also lead to a quadratic speedup for the random walk. Thus, we might hope that QIO algorithms that get developed today can be executed quadratically faster on universal quantum computers in the future.

Where does the effort go when using QIO?

The following activities are likely time consuming:

Finding a good mathematical formulation of the overall cost function H.
Writing the code for post-processing of the solver results to translate back into your problem.
Writing the code to validate any solution suggested by the solver.
Finding (through experimentation) good parameter settings for:
+ The weights of the components in the cost function,
+ The QIO algorithms (hyperparameters) in order to find valid solutions.
Least effort: writing the code for the cost function once that one has been specified.

Before we end

For readers interested in some videos about this subject: See Microsoft on QIO [13], [14], Toshiba about Simulated Bifurcation Machines [15], [16], and 1QBit about QIO [17].

This finishes our high-level introductory excursion into Quantum-Inspired Optimization land!

So then: Give it a go, try out QIO and let the community know how you fared with it.

And: If you have made it through this article, congratulations, and thanks for reading this story.