Active Learning

Table of Contents


1. Active Learning

Active Learning is a machine learning approach where the model is allowed to select the most informative data points from an unlabeled dataset to be labeled by an oracle (e.g., a human expert). This strategy is especially useful when labeled data is expensive or time-consuming to obtain.


Key Concepts of Active Learning:

Goal: Achieve high accuracy with fewer labeled samples by prioritizing which examples to label.

Core idea: Not all data points are equally valuable for training.


Two Distinct Objectives of Active Learning

(1) Active Learning for Surrogate Model Construction (Design of Experiments, DOE)

The goal is to improve the accuracy and generalization of a surrogate model with minimal labeled data. This is achieved by selecting unlabeled data points that are expected to most significantly improve model performance when labeled and included in training.

  • Sampling focuses on areas where the model is most uncertain.

  • Useful in scientific computing, engineering simulations, and high-cost data acquisition settings.


(2) Active Learning for Optimal Solution Discovery (Optimization-Oriented)

Here, the focus shifts from model improvement to solution discovery. The objective is to identify unlabeled data points that are expected to have the highest target (label) values - in other words, to find optimal or near-optimal solutions efficiently.

  • Sampling emphasizes targeting regions likely to yield high-performance outcomes.

  • Common in design optimization, materials discovery, and experimental search for extremal properties.

  • Example: Sampling unlabeled process parameters to maximize system productivity, rather than just improving the surrogate model.




2. Active Learning for Surrogate Model Construction

2.1. Surrogate Model

In many real-world problems, obtaining labeled data can be expensive, time-consuming, or resource-intensive. This is particularly true in scientific experiments, engineering design, and simulation-based optimization. To alleviate this burden, surrogate models - also known as response surface models or emulators - are employed as approximations of the true function or system.


A surrogate model is a predictive model that serves as a computationally inexpensive substitute for a more expensive process (e.g., experiments, simulations, or human annotation).

Given:

  • A set of input-output pairs

$$ \mathcal{D} = \{(x_i, y_i)\}_{i=1}^n $$


  • Where $x_i \in \mathbb{R}^d$ and $y_i \in \mathbb{R}$

The goal is to construct an approximation function $\hat{f}(x)$ such that:


$$\hat{f}(x) \approx f(x)$$


Where $f(x)$ is the unknown true process (e.g., physical experiment or complex simulation).



2.2. Motivation for Active Learning

Suppose we are given a small set of labeled data points:


$$\mathcal{D}_L = \{(x_i, y_i)\}_{i=1}^{n}$$


From this data, we build a surrogate model $\hat{f}(x)$ that approximates the unknown true function $f(x)$.



However, acquiring new labeled data - through physical experiments, expert annotation, or costly simulations - is often expensive and time-consuming. Thus, we cannot afford to label data randomly or indiscriminately.


The process of selecting which unlabeled data point to label may appear simple and intuitive in low-dimensional spaces, as illustrated below.



  • You can visually estimate where information is sparse.

  • This visual intuition makes active learning look natural and easy in 2D or 3D examples.


However, in high-dimensional spaces, the situation changes dramatically:

  • It becomes difficult or impossible to visualize the data distribution or model uncertainty.

  • Human intuition often fails to capture the structure of high-dimensional input spaces.



2.3. Key Idea of Active Learning

Active learning is a strategy that aims to select the most informative unlabeled data points based on the current surrogate (or predictive) model.


This approach is particularly useful in domains where labeling is expensive, as it enables us to:

  • Maximize model improvement per labeled sample
  • Minimize the labeling cost
  • Focus learning on regions where the model is uncertain, inaccurate, or likely to yield high rewards

To accomplish this, it is essential that the surrogate model provides not only predictions but also uncertainty estimates. Many active learning algorithms rely on these uncertainty measures to identify data points that will most effectively improve model performance.

There are various strategies for selecting which data to label next. Before diving into specific algorithms, let's first examine the general iterative workflow of active learning with surrogate models..


2.4. Active Learning with a Surrogate Model: Iterative Workflow

This process proceeds in iterative cycles:



(1) Step 1: Train Surrogate Model

Train the surrogate (or AI) model using the currently available labeled dataset $\mathcal{D}_L$.


(2) Step 2: Predict on Unlabeled Data

Use the trained surrogate to predict labels for the unlabeled dataset $\mathcal{D}_U$.

  • Obtain both the predicted label values and uncertainty estimates.

(3) Step 3: Select Informative Samples

Evaluate the unlabeled points using a utility function, and select the most promising candidates for labeling.

  • Typically prioritizes points with high uncertainty, high predicted label values, or a diverse distribution.

  • The number of points to select can be user-defined or adaptive.


(4) Step 4: Label Selected Data

Acquire ground-truth labels for the selected samples via experiments, simulations, or expert annotation.


(5) Step 5: Update and Iterate

  • Add the newly labeled data to the training dataset.

  • Repeat Steps (1)-(5) until a stopping criterion is met


This iterative approach balances exploration (high uncertainty) and exploitation (high predicted value), enabling efficient discovery in expensive or data-scarce domains.


2.5. Which Unlabeled Data Should We Sample?

The success of active learning depends on the strategy used to select unlabeled data. Below are several common approaches:


2.5.1. Uncertainty-Based Sampling

Core idea: Select data points where the model’s prediction is least certain.


Consider a three-class classification task and two unlabeled instances, $d_1$ and $d_2$. Suppose a classifier predicts the following class probabilities:


We now evaluate which instance to label using two uncertainty-based sampling methods.


(1) Least Confidence

This method selects the instance with the lowest maximum predicted class probability.

  • $d_1: 0.9$

  • $d_2: 0.5$

$\implies$ $d_2$ has the lower confidence in its top label prediction and is therefore selected for labeling.


(2) Margin Sampling

This method selects the instance with the smallest difference between the top two predicted probabilities.

  • $d_1: 0.9 - 0.09 = 0.81$

  • $d_2: 0.5 - 0.3 = 0.2$

$\implies$ $d_2$ has the smallest margin, indicating ambiguity between labels A and B, and is thus selected for labeling.


These two examples demonstrate how active learning can quantify and leverage model uncertainty to prioritize the most informative samples for labeling.


2.5.2. Other Sampling Strategies

Uncertainty is not the only criterion for selecting informative samples. Several alternative approaches have been developed.


(1) Query-By-Committee (QBC)

Query-By-Committee (QBC) is an active learning strategy that selects unlabeled instances on which a group of models - referred to as a committee - most disagree.

Rather than relying on a single model, QBC maintains multiple models trained on the same labeled dataset (but possibly initialized differently or trained with different subsets). The key assumption is that when the committee members disagree on a prediction, the corresponding data point is informative and should be prioritized for labeling.


Intuition

  • For classification tasks: disagreement is measured based on predicted labels.

  • For regression tasks: disagreement is quantified using the variance of predicted values.


Classification Example

Consider a three-class classification problem with two unlabeled instances, $d_1$ and $d_2$, and predictions from three different models:



  • For $d_1$: the predictions are uniformly distributed (1 : 1 : 1 across A, B, C).

  • For $d_2$: the predictions show moderate agreement (2 : 0 : 1, favoring A).

$\implies$ Since $d_1$ shows maximum disagreement among committee members, select $d_1$ for labeling.


Regression Example

Suppose the same models predict continuous outputs for $d_1$ and $d_2$:



  • Variance($d_1$) = 0.0093

  • Variance($d_2$) = 0.0433

$\implies$ Since $d_2$ has greater variance in predictions, indicating higher uncertainty, select $d_2$ for labeling.


(2) Expected Model Change

Expected Model Change is an active learning strategy that selects unlabeled data points based on how much they are expected to change the current model if their labels were known.


Intuition

  • Some data points, once labeled, would cause the model to learn significantly more, leading to larger updates in its parameters.

How is "Change" Measured?

  • Change = Magnitude of the gradient of the loss function with respect to the model parameters, evaluated at the unlabeled point.

This method is particularly useful in deep learning models where gradients are naturally available and informative. It balances exploration and model-driven prioritization, often leading to faster convergence with fewer labeled samples.


(3) Core-Set Selection

Core-Set methods in active learning aim to select a representative subset of the data that provides broad coverage of the input space. The goal is to ensure that the labeled data set captures the underlying distribution of all (labeled and unlabeled) data, so that the learned model generalizes well.


Intuition

  • Instead of focusing only on uncertainty or disagreement, Core-Set methods ask: "Which subset of data, if labeled, would best represent the entire unlabeled dataset?"


The figure above illustrates the geometric intuition behind Core-Set active learning using the greedy k-center algorithm.

  • Gray dots represent the unlabeled data points.

  • Blue dots indicate the currently labeled set (i.e., already selected or queried).

  • Each blue circle shows the coverage radius from a labeled point to its nearest unlabeled neighbors.


The goal is to select the next unlabeled point that is farthest from any labeled point, maximizing coverage of the feature space.

  • This point (on the red arrow's tip) will be selected next to minimize the maximum distance between any unlabeled point and the labeled set.




3. Active Learning for Optimization

While active learning is often used to build accurate surrogate models with minimal labeled data, another important and practical objective is to use active learning to find optimal solutions - that is, to identify inputs that yield the highest (or lowest) label values.

This is commonly referred to as Bayesian Optimization (BO), and it's especially useful when:

  • Evaluations (experiments, simulations) are expensive or time-consuming,

  • Labels correspond to performance, efficiency, or quality metrics,

  • The goal is not to model the entire space accurately but to quickly find the best input.


Example: Sampling process parameters in a manufacturing system to maximize productivity.



3.1. Key Idea: Which Sample Should We Choose?

Let us examine the decision-making process through a few illustrative cases (as visualized in the figure below):



(1) Deterministic Model

If the AI model deterministically predicts output values, the sample B, which has the higher predicted value $\hat{y}$, is chosen.


(2) Probabilistic Model with Uncertainty

If the model provides probabilistic predictions - i.e., both the mean and uncertainty (variance) - we should consider the entire distribution of predictions.

  • In this case, B still appears more favorable based on both its mean and relatively low uncertainty.

(3) High Uncertainty Trade-Off

In this more subtle scenario:

  • Point B has a higher predicted mean value with low uncertainty.

  • Point A has a lower mean but a wider uncertainty distribution, meaning it might yield an even higher value than B in some cases.

The selection depends on the decision-maker's risk preference:

  • B is preferred under a conservative or risk-averse strategy (exploitation).

  • A may be chosen under a risk-taking or exploratory strategy, as it holds potential for higher returns (exploration).

  • The balance between mean prediction and uncertainty is crucial.


Two Key Components of Active Learning for Optimization

Active learning for optimization - such as in Bayesian optimization - relies on two critical components working together to guide the search for the optimal input:

(1) Surrogate model (Probabilistic Model)

  • Return predicted labels and uncertainties of such predictions

(2) Utility function (or Acquisition Functions)

  • Use the results of the surrogate model to measure which unlabeled data is more likely to have higher label values than currently known label values

  • Several utility functions are used to guide the selection


Before we explore surrogate models and acquisition strategies, it's important to understand the overall optimization loop. Active learning for optimization proceeds in an iterative manner, where each step strategically selects the next input to evaluate based on the current surrogate model and acquisition function.


3.2. Iterative Workflow for Active Learning

We proceed iteratively, using a surrogate model (e.g., Gaussian Process or Bayesian Neural Network) to approximate $f(\mathbf{x})$ and guide the search via an acquisition function.



(1) Step 1: Train the Surrogate Model

  • Train a probabilistic surrogate model (e.g., Gaussian Process or Bayesian Neural Network) using the current labeled dataset.

(2) Step 2: Predict on Unlabeled Data

  • Use the surrogate model to predict label values for all unlabeled data points.

  • Also obtain uncertainty estimates (e.g., standard deviation or confidence intervals) for each prediction.


(3) Step 3: Sample Using a Utility Function

  • Apply a utility function to score each unlabeled point based on the predicted value and uncertainty.

  • Select the top candidates that are most likely to yield higher label values than currently observed.


(4) Step 4: Acquire Labels

  • Conduct physical/computational experiments or consult domain experts to obtain true labels for the selected samples.

(5) Step 5: Update and Repeat

  • Add newly labeled data to the training dataset.

  • If any new labels exceed previously known maximums, record them.

  • Repeat the entire process (Step 1-5) until a predefined stopping condition is met (e.g., budget, performance, or label threshold).


3.3. Surrogate Model

In active learning for optimization, a surrogate model plays a central role by approximating the true objective function, which is often costly or time-consuming to evaluate.


Purpose:

  • To predict label values for unlabeled inputs.

  • To quantify uncertainty associated with each prediction.


These uncertainty estimates are critical for the acquisition function, which guides the selection of the next input to evaluate. For this reason, surrogate models are typically probabilistic in nature.



Common Surrogate Models:

(1) Gaussian Process Regression (GPR)

  • GPR models a distribution over functions that map input features to output labels.

  • It assumes that any finite set of function values follows a joint Gaussian distribution.

  • The shape and smoothness of the predicted function are controlled by a kernel function (e.g., RBF kernel).

  • GPR provides closed-form expressions for both:

    • The predicted mean $\mu(\mathbf{x})$

    • The predictive variance $\sigma^2(\mathbf{x})$


This makes GPR particularly well-suited for uncertainty-aware sampling strategies in Bayesian optimization.



(2) Bayesian Neural Network (BNN)

  • A BNN treats the weights of a neural network as probability distributions, rather than fixed parameters.

  • During inference, the model integrates over these distributions to produce:

    • Predictive mean (expected output)

    • Predictive uncertainty (variation across weight samples)

  • This enables BNNs to capture uncertainty in regions where data is sparse.


While less analytically tractable than GPR, BNNs are more scalable and flexible for high-dimensional or large-scale problems.



Note:

  • In this section, we will not go into the detailed algorithms behind GPR and BNN. Instead, we will focus on how to use these surrogate models in Python for active learning applications.

  • This is perfectly acceptable in the context of active learning, where the emphasis is on how these models support the decision-making process.


The key takeaway is:

  • Both GPR and BNN provide not only predictions but also uncertainty estimates, which are essential for guiding sample selection.

3.4. How Utility Functions Decide Which Unlabeled Data to Sample?

3.4.1. Two Key Concepts: Exploitation and Exploration

When performing active learning or Bayesian optimization, utility functions guide the selection of the next data point to sample. Two core strategies influence this decision:



(1) Exploitation: Searching the neighborhood of the point with the maximum function value among the points investigated so far in the next step

  • Focuses on areas near the currently known maximum predicted value.

  • Goal: Refine the estimate of where the best result lies.

  • Example: Choose the region around the teddy bear (known high value).

(2) Exploration: Searching the neighborhood of the point with the maximum standard deviation

  • Focuses on areas with the highest uncertainty (e.g., large standard deviation in predictions).

  • Goal: Discover potentially better results in less-known regions.

  • Example: Choose the unknown region where uncertainty is higher.


A good utility function balances these two strategies:

  • Too much exploitation risks missing better regions.

  • Too much exploration risks wasting resources on unpromising areas.



3.4.2. Utility Functions

Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both.

(1) Probability of Improvement

(2) Upper Confidence Bound

(3) Expected Improvement


(1) Probability of Improvement (PI)

Select the next data point that maximizes the probability of yielding an improvement over the best observed value so far.


Let:

  • $\mu(x)$ = predicted mean of the surrogate model at input $x$
  • $\sigma(x)$ = predicted standard deviation at $x$
  • $y^*$ = best label value observed so far
  • $\xi$ = small positive value to encourage exploration (e.g., $\xi = 0.01$)

Then, the Probability of Improvement (PI) is defined as:


$$ \begin{align*} \text{PI}(x) &= P \left(f(x) \geq y^* + \xi \right) \\ &= \Phi\left( \frac{\mu(x) - y^* - \xi}{\sigma(x)} \right) \end{align*} $$


Where:

  • $\Phi(\cdot)$ is the cumulative distribution function (CDF) of the standard normal distribution

Visual Illustration (Figure shown with $\xi = 0$)



Interpretation

  • PI quantifies the probability that sampling at $x$ will result in a label value greater than $y^* + \xi$.
  • It is high where $\mu(x)$ is large and/or where $\sigma(x)$ is large.
  • The parameter $\xi$ controls the exploration-exploitation tradeoff:
    • Larger $\xi$ favors exploration
    • Smaller $\xi$ favors exploitation

When to Use

  • When you want a conservative approach to improve on the best known result
  • Suitable for problems where risk of sampling bad points must be minimized


(2) Upper Confidence Bound (UCB)

Select the next data point where the upper bound of the predicted value is the highest - balancing mean prediction and uncertainty.


Let:

  • $\mu(x)$ = predicted mean of the surrogate model at input $x$
  • $\sigma(x)$ = predicted standard deviation (uncertainty) at $x$
  • $\kappa$ = a tunable parameter controlling exploration vs. exploitation

Then, the UCB acquisition function is defined as:


$$ \text{UCB}(x) = \mu(x) + \kappa \cdot \sigma(x) $$



Interpretation

  • $\mu(x)$ promotes exploitation: favoring points with high predicted value
  • $\sigma(x)$ promotes exploration: favoring uncertain regions
  • $\kappa$ balances the two:
    • Large $\kappa$: more exploration
    • Small $\kappa$: more exploitation

When to Use

  • UCB is a good general-purpose acquisition function
  • It's particularly useful when you want to explicitly control the trade-off between exploring unknown regions and exploiting known good ones

Intuition

  • Think of UCB as selecting points where the model says:

  • "I'm either quite confident this is good, or I'm uncertain - so it might be even better!"



(3) Expected Improvement (EI)

Objective:

Select the next data point that is expected to improve the best observed value, taking both the predicted mean and uncertainty into account.


Let:

  • $\mu(x)$ = predicted mean at input $x$
  • $\sigma(x)$ = predicted standard deviation (uncertainty) at $x$
  • $y^*$ = best observed label value so far
  • $\xi$ = small positive number (exploration parameter, often $\xi = 0.01$)

Then, the Expected Improvement at $x$ is:


$$ \text{EI}(x) = (\mu(x) - y^* - \xi) \cdot \Phi(Z) + \sigma(x) \cdot \phi(Z) $$


Where:

  • $Z = \dfrac{\mu(x) - y^* - \xi}{\sigma(x)}$
  • $\Phi(\cdot)$ = CDF of the standard normal distribution
  • $\phi(\cdot)$ = PDF of the standard normal distribution

Interpretation

  • The first term measures how much better $\mu(x)$ is than $y^*$

$$ \text{EI}(x) = \underbrace{(\mu(x) - y^* - \xi) \cdot \Phi(Z)}_{\text{exploitation} } + \sigma(x) \cdot \phi(Z) $$



  • The second term accounts for the uncertainty, allowing for the chance of improving even if $\mu(x)$ is not the best so far

$$ \text{EI}(x) = (\mu(x) - y^* - \xi) \cdot \Phi(Z) + \underbrace{\sigma(x) \cdot \phi(Z)}_{\text{exploration} } $$


  • $\xi$ increases exploration by encouraging sampling in uncertain regions

When to Use

  • EI is adaptive: it naturally balances exploration and exploitation
  • Useful when sampling is expensive and every new point matters

Intuition

  • "How much do I expect to improve over the current best if I sample here?"

  • Unlike Probability of Improvement (PI), EI considers not just the likelihood, but also the magnitude of the improvement.


Visual Illustration of EI



4. Lab


Load Python Packages

In [ ]:
import numpy as np
import pandas as pd
from scipy.stats import norm
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ConstantKernel as C, WhiteKernel as Wht, Matern as matk

from warnings import filterwarnings
filterwarnings('ignore')

Load Data into Pandas Dataframe

Download

In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [ ]:
train_data = pd.read_csv('/content/drive/MyDrive/ML/ML_data/AL_train.csv')
test_data = pd.read_csv('/content/drive/MyDrive/ML/ML_data/AL_test.csv')

print('Shape of training dataset =', train_data.shape)
print('Shape of test dataset =', test_data.shape)
Shape of training dataset = (500, 9)
Shape of test dataset = (530, 9)
In [ ]:
train_data
Out[ ]:
Cement Blast Furnace Slag Fly Ash Water Superplasticizer Coarse Aggregate Fine Aggregate Age Strength
0 0.342009 0.000000 0.499250 0.194089 0.385093 0.595930 0.767185 0.035714 33.36
1 0.085845 0.582638 0.000000 0.560703 0.000000 0.715116 0.534119 0.016484 14.59
2 0.305936 0.000000 0.000000 0.576677 0.000000 0.485465 0.730055 0.244505 21.95
3 0.541553 0.000000 0.000000 0.510383 0.000000 0.779651 0.402158 0.016484 21.18
4 0.399543 0.000000 0.000000 0.552716 0.000000 0.485465 0.657301 0.035714 21.26
... ... ... ... ... ... ... ... ... ...
495 0.538584 0.525876 0.000000 0.424121 0.295031 0.417733 0.405921 0.247253 67.80
496 0.623288 0.260991 0.000000 0.038339 0.726708 0.148547 1.000000 0.016484 45.70
497 0.664384 0.000000 0.000000 0.560703 0.000000 0.404070 0.411440 0.244505 48.79
498 0.594977 0.525876 0.000000 0.344249 0.360248 0.417733 0.405921 0.005495 35.30
499 0.335845 0.000000 0.493753 0.289936 0.397516 0.543023 0.740090 0.074176 30.85

500 rows × 9 columns

In [ ]:
train_X = train_data.iloc[:, :-1].to_numpy()
test_X = test_data.iloc[:, :-1].to_numpy()
train_Y = train_data.iloc[:, -1].to_numpy()
test_Y = test_data.iloc[:, -1].to_numpy()
In [ ]:
print(train_X.shape)
print(train_Y.shape)
(500, 8)
(500,)
In [ ]:
print(np.max(train_Y))
68.5

Define Surrogate Model

Gaussian Process Regression (GPR)

  • GPR derives a distribution of functions that can map input to label values based on training data
  • Herein, the forms of these functions are defined by a kernel function
  • We will use a kernel based on Matern kernel

Define Utility Function

Expected Improvement

  • Fuse exploration strategy into probability of improvement (PI)
  • Weighting PI value by the difference between the current max value and the mean value
  • Probability of obtaining data with larger label value than the existing points is important, but it is also very important how large a value is obtained

$$ \text{EI}(x) = (\mu(x) - y^* - \xi) \cdot \Phi(Z) + \sigma(x) \cdot \phi(Z) $$



In [ ]:
def upperConfidenceBound(xdata, gpr, epsilon):
    yu_pred, usigma = gpr.predict(xdata, return_std = True)
    ucb = np.empty(yu_pred.size, dtype = float)
    for ii in range(0, yu_pred.size):
        if usigma[ii] > 0:
            ucb[ii] = (yu_pred[ii] + epsilon * usigma[ii])
        else:
            ucb[ii] = 0.0
    return ucb

def probabilityOfImprovement(xdata, gpr, ybest, epsilon):
    yp_pred, psigma = gpr.predict(xdata, return_std = True)
    poI = np.empty(yp_pred.size, dtype = float)
    for ii in range(0, yp_pred.size):
        if psigma[ii] > 0:
            zzval = (yp_pred[ii] - ybest - epsilon) / float(psigma[ii])
            poI[ii] = norm.cdf(zzval)
        else:
            poI[ii] = 0.0
    return poI

def expectedImprovement(xdata, gpr, ybest, epsilon):
    ye_pred, esigma = gpr.predict(xdata, return_std = True)
    expI = np.empty(ye_pred.size, dtype = float)
    for ii in range(0, ye_pred.size):
        if esigma[ii] > 0:
            zzval = (ye_pred[ii] - ybest) / float(esigma[ii])
            expI[ii] = (ye_pred[ii] - ybest - epsilon) * norm.cdf(zzval) + esigma[ii] * norm.pdf(zzval)
        else:
            expI[ii] = 0.0
    return expI

4.1. Build the Active Learning Framework


Step 1: Train GPR Model and Make Predictions

In [ ]:
cmean = [1.0] * 8
cbound = [[1e-3, 1e3]] * 8
kernel = C(1.0, (1e-3, 1e3)) * matk(cmean, cbound, 1.5) + Wht(1.0, (1e-3, 1e3))

gp = GaussianProcessRegressor(kernel = kernel, n_restarts_optimizer = 40, normalize_y = False, random_state = 10)
gp.fit(train_X, train_Y)

yt_pred, tsigma = gp.predict(train_X, return_std = True)

Step 2: Evaluate The Utility Function and Suggest Next Data Point for Evaluation

In [ ]:
y_bestloc = np.argmax(yt_pred)
y_best_pred = yt_pred[y_bestloc]

uf_values = expectedImprovement(test_X, gp, y_best_pred, epsilon = 0.01)
#uf_values = probabilityOfImprovement(test_X, gp, y_best_pred, epsilon = 0.01)
#uf_values = upperConfidenceBound(test_X, gp, epsilon = 0.01)

uf_maxloc = np.argmax(uf_values)

print('Index of unlabeled data recommended to be sampled:', uf_maxloc)
print('\nRecommended experiment parameter:', test_X[uf_maxloc])
Index of unlabeled data recommended to be sampled: 513

Recommended experiment parameter: [1.         0.         0.         0.40894569 0.         0.94186047
 0.04766683 0.73901099]

Step 3: Add Real Label Values for The Sampled Unlabeled Data to The Training Data

  • Label values are obtained with experiments
In [ ]:
exp_parameter = test_X[uf_maxloc].reshape(1, -1)
exp_result = test_Y[uf_maxloc].reshape(1, -1)

print(exp_result)
[[74.17]]
In [ ]:
train_X = np.vstack([train_X, exp_parameter])
train_Y = np.append(train_Y, exp_result)
test_X = np.delete(test_X, uf_maxloc, axis = 0)
Ytest = np.delete(test_Y, uf_maxloc, axis = 0)

print('Number of training dataset =', train_X.shape[0])
print('Number of test dataset =', test_X.shape[0])
Number of training dataset = 501
Number of test dataset = 529
In [ ]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')