kpthesis/Methods.tex at master · robinjanette/kpthesis · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
\chapter{Methods}\label{MethodsChapter}
In this chapter, I will describe my methods for implementing the Kernel Perceptron, Budget Kernel Perceptron, and Description Kernel Perceptron in the MLCert framework. First, Section \ref{MLCertStruct} describes the pipeline in MLCert for building the specifications of machine learning algorithms. Next, Sections \ref{KPCoqImp}, \ref{KPBCoqImp}, and \ref{KPDCoqImp} outline the Coq sections for each implementation, which consist of a prediction section and an section for the entire algorithm. The Coq sections for the proofs of these implementations and their extraction to Haskell follow in Chapter \ref{ResultsChapter}.
\section{Structure of Perceptron Implementations in MLCert}\label{MLCertStruct}
The MLCert framework provides data structures and proofs that can be instantiated with the specifics of a machine learning algorithm, as well as extraction directives which facilitate the process of extracting Coq code to Haskell for execution. MLCert requires four sections in Coq for the complete implementation of a machine learning algorithm. Before discussing the four required sections, I will first give the structure and type signature MLCert uses to encode a machine learning algorithm. This module is located in the file ``learners.v'':

\begin{figure}
    \caption{Learner Module}
    \label{LearnerDef}
    \begin{lstlisting}
Module Learner.
Record t (X Y Hypers Params : Type) :=
  mk { predict : Hypers -> Params -> X -> Y;
       update : Hypers -> X*Y -> Params -> Params }.
End Learner.
    \end{lstlisting}
\end{figure}

The Learner module defines the general form of parameters and functions for machine learning algorithms. Four types are listed in the definition of Learner.t. X, Y, Hypers, and Params correspond to the types of training data, training labels, hyperparameters, and parameters, respectively. These four types are used to define the type signatures for the required predict and update functions. A prediction function requires Hypers, Params, and a training example to return the predicted label of type Y. The update function requires Hypers, an example paired with its label, and Params to return updated Params. This Learner module will later be instantiated with specific types and functions to implement the Perceptron family of algorithms.
\\The format of Learner.t is not universal for all machine learning algorithms. For example, unsupervised learning algorithms do not use training labels, as with the k-means clustering algorithm. Because the predict function is meant to return a predicted label as classification, unsupervised algorithms would not be able to be implemented using Learner.t. However, Learner.t is sufficient for implementing the Perceptron family of algorithms because Perceptrons are supervised, using labeled training data, and their prediction and update functions can be written using the specified type signature.
\\When implementing a machine learning algorithm, there are four required sections in Coq which organize and define its methods and data structures. The first required Coq section implements the prediction function according to the type signature of the predict function in Learner.t. The second section defines the update function for the machine learning algorithm, as well as instantiating Learner.t with the specific types, prediction, and update functions for the algorithm. This second section and its Learner.t instantiation are directly used by the third and fourth sections. Generalization proofs in the third section prove the cardinality of the Params used by the algorithm as well as the generalization bounds for the entire algorithm. Finally, the fourth Coq section defines how the algorithm should be extracted, a process which translates the algorithm in Coq to the functional language Haskell for experimental results. The rest of this chapter describes the first two Coq sections for each implementation.
\section{Kernel Perceptron Coq Implementation}\label{KPCoqImp}
The Kernel Perceptron implementation in MLCert is located in the file ``kernelperceptron.v''. Section KernelClassifier contains the predict function for the Kernel Perceptron. The definition of kernel\_predict is shown in Figure \ref{kernel_predictDef}:

\begin{figure}
    \caption{kernel\_predict function in KernelClassifier}
    \label{kernel_predictDef}
    \begin{lstlisting}
Definition kernel_predict (K : float32_arr n -> float32_arr n -> float32)
        (w : KernelParams) (x : Ak) : Bk :=
    let T := w.1 in
      foldable_foldM
        (fun xi_yi r =>
           let: ((i, xi), yi) := xi_yi in
           let: (j, xj) := x in
           let: wi := f32_get i w.2 in
           r + (float32_of_bool yi) * wi * (K xi xj))
        0 T > 0.
    \end{lstlisting}
\end{figure}

The kernel\_predict function takes three inputs: a kernel function $K$, the current Kernel Perceptron parameters $w$, and an example $x$ of type Ak. The type Ak representing training data is defined in KernelClassifier as an index paired with an array of 32 bit floating point values of size $n$. The type of labels Bk is defined as Boolean. In the kernel\_predict function, the kernel function can be specified as one of several kernel functions. KernelClassifier contains two kernel functions corresponding to the linear and quadratic kernels, as shown in Figure \ref{kernelfunctionsDef}.

\begin{figure}
    \caption{Kernel functions in KernelClassifier}
    \label{kernelfunctionsDef}
    \begin{lstlisting}[mathescape]
Definition linear_kernel {n} (x y : float32_arr n) : float32 :=
      f32_dot x y.
Definition quadratic_kernel (x y : float32_arr n) : float32 :=
      (f32_dot x y)$^2$.
    \end{lstlisting}
\end{figure}

The KernelParams for the Kernel Perceptron are defined as the training set paired with a float32 array of size $m$, where $m$ is the number of training examples. In the basic Kernel Perceptron, every training example in the training set is a support vector. The float32 array in KernelParams is used for the kernel\_predict calculation of the training example $x$'s label, as each float32 value corresponds to a support vector. The kernel\_predict function folds over the support vectors in KernelParams so that for each support vector, the result of the kernel function applied to the support vector and $x$ is multiplied with the float32 value for the support vector and the label for the support vector. The result of this calculation is compared with zero to return the predicted Boolean label for $x$.
\\The Coq section KernelPerceptron completes the Kernel Perceptron implementation, containing the kernel\_update function and instantiating Learner.t with the Kernel Perceptron parameters and functions. The kernel\_update function is defined in Figure \ref{kernel_updateDef}. Using the kernel\_predict function and kernel function $K$, kernel\_update compares the predicted label to the actual label. If the predicted label is correct, the parameters $p$ are returned without change. However, if the predicted label is incorrect, the float32 array is updated so that the float32 value for that training example is incremented by one.

\begin{figure}
    \caption{kernel\_update function in KernelPerceptron}
    \label{kernel_updateDef}
    \begin{lstlisting}
Definition kernel_update
      (K : float32_arr n -> float32_arr n -> float32)
          (h:Hypers) (example_label:A*B) (p:Params) : Params :=
      let: ((i, example), label) := example_label in
      let: predicted_label := kernel_predict K p (i, example) in
      if Bool.eqb predicted_label label then p
      else (p.1, f32_upd i (f32_get i p.2 + 1) p.2).
    \end{lstlisting}
\end{figure}

With kernel\_update implemented, Learner.t can be instantiated with the necessary types and functions. As the Kernel Perceptron does not use hyperparameters in its algorithm, the type Hypers is defined as the empty unit record. The Kernel Perceptron Learner can be defined as follows in Figure \ref{kpLearnerDef}. The variable $K$ is the generic type of the kernel function, and $F$ provides a proof that the support vectors are a Foldable type, compatible with the parameter definitions in the Kernel Perceptron functions. This Learner definition is used in the other two Kernel Perceptron sections which will be discussed in Sections \ref{KPProofs} and \ref{DetailsHaskellExtraction}.

\begin{figure}
    \caption{Learner Definition in KernelPerceptron}
    \label{kpLearnerDef}
    \begin{lstlisting}
Definition Learner : Learner.t A B Hypers Params :=
    Learner.mk
        (fun _ => @kernel_predict n m support_vectors F K)
        (kernel_update K).
    \end{lstlisting}
\end{figure}

\section{Budget Kernel Perceptron Coq Implementation}\label{KPBCoqImp}
The Budget Kernel Perceptron is also located in the file ``kernelperceptron.v'' in MLCert. The predict function is located in the section KernelClassifierBudget, and modifies the Kernel Perceptron predict function so that a budget on the size of the set of support vectors can be enforced. In KernelClassifierBudget, the variable $sv$ is the size of the support set. As opposed to the KernelParams which contain the entire training set and a float32 array, the parameters for the Budget Kernel Perceptron are built as an axiomatized vector of size $sv$ containing support vectors paired with a float32 value for that support vector. Axiomatized vectors are defined with extraction to Haskell lists in mind. The definition of the type of support vectors and the type of the support set are given in Figure \ref{KPBsupportDef}.

\begin{figure}
    \caption{Support Vector Definitions in KernelClassifierBudget}
    \label{KPBsupportDef}
    \begin{lstlisting}
Definition bsupport_vector: Type := Akb * Bkb.
Definition bsupport_vectors: Type := AxVec sv (float32 * (bsupport_vector)).
    \end{lstlisting}
\end{figure}

The predict function for the Budget Kernel Perceptron shown in Figure \ref{kernel_predict_budgetDef} is very similar to Kernel Perceptron prediction, with the main difference being the size of the support set. Again, the kernel function $K$ used in prediction can be specified as any kernel function with the correct type signature. Like kernel\_predict, kernel\_predict\_budget folds over the support set with the same calculation as in kernel\_predict. However, the type of the training data, Akb, is different for the Budget Kernel Perceptron. Akb is defined as a float32 array of size $n$, which is the dimensionality of the data, and does not need an index for each example for Budget classification.

\begin{figure}
    \caption{kernel\_predict\_budget function in KernelClassifierBudget}
    \label{kernel_predict_budgetDef}
    \begin{lstlisting}
Definition kernel_predict_budget
        (w: bsupport_vectors)
        (x: Akb) : Bkb :=
    foldable_foldM
        (fun wi_xi r =>
           let: (wi, (xi, yi)) := wi_xi in
           r + (float32_of_bool yi) * wi * (K xi x))
        0 w > 0.
    \end{lstlisting}
\end{figure}

The Budget Kernel Perceptron implementation is located in the module KernelPerceptronBudget, and this module contains several functions necessary for the budget update rule to maintain the size of the support set. The update rule for the Budget Kernel Perceptron is more complex than the Kernel Perceptron. If the current example has been misclassified, we first need to determine if this example is already a support vector. If the example is a support vector, then the float32 value associated with this support vector should be incremented by one. However, if this example is not a support vector, then we need to add this example to the support set and remove a support vector. As discussed in Section \ref{BudgetKernelPerceptronSection}, there are several methods for selecting the support vector to be removed. In our implementation, we choose the oldest support vector with respect to when it was added to the support set, as removing this support vector will likely have the least impact on the decision hyperplane. When a new example is added to the support set, it is added on to the front of the vector, while the support vector at the end of the vector is removed. This replacement procedure ensures that the oldest support vector is always stored at the end of the vector for safe removal. The logic for the kernel budget update rule is defined in the function budget\_update shown in Figure \ref{budget_updateDef}. In the update for the Budget Kernel Perceptron called kernel\_update shown in Figure \ref{kernel_update_budgetDef}, the update rule used is the generic function $U$ so that the budget update rule can be changed.

\begin{figure}
    \caption{budget\_update function in KernelPerceptronBudget}
    \label{budget_updateDef}
    \begin{lstlisting}
Definition budget_update (p: Params) (yj: A*B): Params :=
    if existing p yj then upd_weights p yj.1
    else add_new p yj.
    \end{lstlisting}
\end{figure}


\begin{figure}
    \caption{kernel\_update function in KernelPerceptronBudget}
    \label{kernel_update_budgetDef}
    \begin{lstlisting}
Definition kernel_update
    (K : float32_arr n -> float32_arr n -> float32)
          (h:Hypers) (example_label:A*B) (p:Params) : Params :=
    let: (example, label) := example_label in
    let: predicted_label := kernel_predict_budget K p example in
    if Bool.eqb predicted_label label then p
    else (U p example_label).
    \end{lstlisting}
\end{figure}

Finally, Learner.t can be instantiated using kernel\_predict\_budget and kernel\_update. No hyperparameters are used for the Budget Kernel Perceptron, again implemented as the empty record. $K$ and $F$ are defined in the same way as the Kernel Perceptron to specify the type of the kernel function and provide a proof that the support vectors are Foldable. Figure \ref{kpbLearnerDef} shows the Budget Kernel Perceptron's Learner definition that is used in the proof and extraction sections of \ref{KPBProofs} and \ref{DetailsHaskellExtraction}, respectively.

\begin{figure}
    \caption{Learner Definition in KernelPerceptronBudget}
    \label{kpbLearnerDef}
    \begin{lstlisting}
Definition Learner : Learner.t A B Hypers Params :=
    Learner.mk
        (fun _ => @kernel_predict_budget n (S sv) K F)
        (kernel_update K).
    \end{lstlisting}
\end{figure}

\section{Description Kernel Perceptron Coq Implementation}\label{KPDCoqImp}

The Description Kernel Perceptron is the final implementation located in ``kernelperceptron.v''. Many of its functions and definitions modify the Budget Kernel Perceptron, as both implementations use a fixed number of support vectors less than the number of training examples. The first major change for the Description Kernel Perceptron is the definition of its parameters. Figure \ref{KPDsupportDef} lists the definitions necessary to create dparams, the Description parameters. The definition of support vectors is the same as for the Budget Kernel Perceptron, relying on Akd and Bkd, which define the type of training examples and labels, respectively. However, the set of support vectors, dsupport\_vectors, is simply an axiomatized vector containing $des$ support vectors, where $des$ is the maximum number of misclassifications. The parameters for the Description Kernel Perceptron are the set of support vectors paired with a float32 value, which keeps track of the number of misclassifications.

\begin{figure}
    \caption{Parameter Definition in KernelClassifierDes}
    \label{KPDsupportDef}
    \begin{lstlisting}
Definition dsupport_vector: Type := Akd * Bkd.
Definition dsupport_vectors: Type := AxVec des dsupport_vector.
Definition dparams: Type := float32 * dsupport_vectors.
    \end{lstlisting}
\end{figure}

Prediction for the Description Kernel Perceptron is also very similar to the Budget Kernel Perceptron, but the calculation is simpler. The kernel\_predict\_des function is shown in Figure \ref{kernel_predict_desDef}. There are no float32 values paired with individual support vectors in dparams, as each support vector has the same weight. The reason for this will be shown in the kernel\_update\_des function. Once the float32 value is separated from the set of support vectors, the calculation over the support vectors is the same as the Budget Kernel Perceptron.

\begin{figure}
    \caption{kernel\_predict\_des function in KernelClassifierDes}
    \label{kernel_predict_desDef}
    \begin{lstlisting}
Definition kernel_predict_des
               (aw: dparams)
               (x: Akd) : Bkd :=
      let (a, w) := aw in
      foldable_foldM
        (fun wi_xi r =>
           let: (xi, yi) := wi_xi in
           r + (float32_of_bool yi) * (K xi x))
        0 w > 0.
    \end{lstlisting}
\end{figure}

The update procedure for the Description Kernel Perceptron is likewise simplified from the Budget Kernel Perceptron. The Description Kernel Perceptron can make a maximum of $des$ mistakes. When a misclassification occurs, the float32 value in dparams is compared to a hyperparameter $alpha$, which is set to the maximum number of misclassifications. If the Description Kernel Perceptron has already made the maximum number of mistakes, then the parameters are not updated for the rest of training. However, if the float32 value is not equal to $alpha$, then the des\_update function runs using the current parameters and the misclassified example. As described in Figure \ref{des_updateDef}, the des\_update function increments the float32 value by one to update the number of misclassifications. The misclassified example is added onto the front of the support set, and a zero example is removed. The parameters are initialized with $des$ zero vectors, and as misclassfications occur, these zero vectors are replaced. For this update procedure, if an example is misclassified multiple times, multiple copies of the support vector will be added to the support set, increasing its influence on the hyperplane without using a float32 value per support vector. The full Description Kernel Perceptron is in \ref{kernel_update_desDef}.

\begin{figure}
    \caption{des\_update function in KernelPerceptronDes}
    \label{des_updateDef}
    \begin{lstlisting}
Definition des_update (ap: Params) (yj: A*B): Params :=
    let (a, p) := ap in
    ((f32_add f32_1 a), (AxVec_cons yj) (AxVec_init p)).
    \end{lstlisting}
\end{figure}

\begin{figure}
    \caption{kernel\_update function in KernelPerceptronDes}
    \label{kernel_update_desDef}
    \begin{lstlisting}
Definition kernel_update
    (K : float32_arr n -> float32_arr n -> float32)
        (h:Hypers) (example_label:A*B) (ap:Params) : Params :=
    let: (example, label) := example_label in
    let: (a, p) := ap in
    let: predicted_label := kernel_predict_des K ap example in
    if Bool.eqb predicted_label label then ap
    else if f32_eq a (alpha h) then ap
        else des_update ap example_label.
    \end{lstlisting}
\end{figure}

Using these definitions, a Learner instantiation can be created, as shown in Figure \ref{kpdLearnerDef}.

\begin{figure}
    \caption{Learner Definition in KernelPerceptronDes}
    \label{kpdLearnerDef}
    \begin{lstlisting}
Definition Learner : Learner.t A B Hypers Params :=
    Learner.mk
        (fun _ => @kernel_predict_des n (S des) K F')
        (@kernel_update K).
    \end{lstlisting}
\end{figure}

\section{Chapter Summary}\label{MethodsChapterSummarySection}
This chapter describes the implementation of the Kernel Perceptron, Budget Kernel Perceptron, and Description Kernel Perceptron in Coq. The Learner.t definitions for each of the algorithms ensure that these implementations are correctly formatted for the rest of the MLCert framework. Using the definitions in this chapter, Chapter \ref{ResultsChapter} will discuss proofs and Haskell extraction.