-
Notifications
You must be signed in to change notification settings - Fork 61
XML model
All model is stored in XML format.
For example, a model generated by nn-init will look like this:
<transform type="Affine" input-dim="20" output-dim="64" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Sigmoid" input-dim="64" output-dim="64" />
<transform type="Affine" input-dim="64" output-dim="64" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Sigmoid" input-dim="64" output-dim="64" />
<transform type="Affine" input-dim="64" output-dim="2" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Softmax" input-dim="2" output-dim="2" />Don't worry about indentation like TAB or space. I use 3rd party library rapidxml, which is very robust.
BUT be careful, rapidxml is a lightweight XML parse. It can only parse
<transform type="Sigmoid" input-dim="2" output-dim="2" />NOT this (which is common in HTML)
<transform type=Sigmoid input-dim=2 output-dim=2 />Remember to quote attribute like "value" !!
Also, empty node like <weight></weight> and <bias></bias> is okay.
My program nn-train will try to fill them with random numbers (see normalized uniform distribution).
You can change activation functions by replacing the type attribute in <transfrom .. /> like this:
<transform type="tanh" input-dim="2" output-dim="2" />Case is insensitive. Either type="tanh" or type="Tanh" will be fine. It'll be converted it to lower case when parsed.
Here's the list of activation functions available now:
- Sigmoid
- Tanh
- ReLU
- Softplus
- Softmax (last layer only)
- Convolution
- SubSample
Take dropout for example, you can add it after activation functions (ex: Sigmoid) to the above model by inserting:
<transform type="Dropout" input-dim="64" output-dim="64" dropout-ratio="0.3"/>The attribute dropout-ratio means in what percentage do you want to dropout.
In this case dropout-ratio="0.3" means 30% of hidden nodes will be randomly turned off.
The results will be:
<transform type="Affine" input-dim="20" output-dim="64" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Sigmoid" input-dim="64" output-dim="64" />
<transform type="Dropout" input-dim="64" output-dim="64" dropout-ratio="0.3"/>
<transform type="Affine" input-dim="64" output-dim="64" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Sigmoid" input-dim="64" output-dim="64" />
<transform type="Dropout" input-dim="64" output-dim="64" dropout-ratio="0.3"/>
<transform type="Affine" input-dim="64" output-dim="2" momentum="0.100000" learning-rate="0.100000" >
<weight></weight>
<bias></bias>
</transform>
<transform type="Softmax" input-dim="2" output-dim="2" />