First discern the concept:
1. loss is the objective of the overall network optimization, and it is necessary to participate in the optimization algorithm to update the weights W.
2. metrics are only used as a "metric" to evaluate the performance of the network, e.g., accuracy, in order to visualize the effectiveness of the algorithms, acting as a view and not participating in the optimization process.
There are two ways to implement a customized loss in keras, one is to customize the loss function, and the other is to customize the loss function, and the other is to customize the loss function.
Example:
# Mode I def vae_loss(x, x_decoded_mean): xent_loss = objectives.binary_crossentropy(x, x_decoded_mean) kl_loss = - 0.5 * (1 + z_log_sigma - (z_mean) - (z_log_sigma), axis=-1) return xent_loss + kl_loss (optimizer='rmsprop', loss=vae_loss)
Or you can customize a keras layer as the last layer of the model, with loss=None in the final order:
# Mode II # Custom loss layer class CustomVariationalLayer(Layer): def __init__(self, **kwargs): self.is_placeholder = True super(CustomVariationalLayer, self).__init__(**kwargs) def vae_loss(self, x, x_decoded_mean_squash): x = (x) x_decoded_mean_squash = (x_decoded_mean_squash) xent_loss = img_rows * img_cols * metrics.binary_crossentropy(x, x_decoded_mean_squash) kl_loss = - 0.5 * (1 + z_log_var - (z_mean) - (z_log_var), axis=-1) return (xent_loss + kl_loss) def call(self, inputs): x = inputs[0] x_decoded_mean_squash = inputs[1] loss = self.vae_loss(x, x_decoded_mean_squash) self.add_loss(loss, inputs=inputs) # We don't use this output. return x y = CustomVariationalLayer()([x, x_decoded_mean_squash]) vae = Model(x, y) (optimizer='rmsprop', loss=None)
Customizing a metric in keras is very simple, you need to use y_pred and y_true as input parameters to the custom metric function Click to see the metric settings
Caveats:
1. keras defines a loss that returns a batch_size length tensor, not a scalar as in tensorflow.
2. In order to be able to save the customized loss to the model, and to be able to load the model smoothly afterwards, you need to copy the customized loss to the source code file, otherwise the relevant information will not be found at runtime and Keras will report an error.
Sometimes it is necessary to apply different weights to the losses of different samples, which requires the use of sample_weight, for example
# Class weights: # To balance the difference in occurences of digit class labels. # 50% of labels that the discriminator trains on are 'fake'. # Weight = 1 / frequency cw1 = {0: 1, 1: 1} cw2 = {i: self.num_classes / half_batch for i in range(self.num_classes)} cw2[self.num_classes] = 1 / half_batch class_weights = [cw1, cw2] # resulting in bothlossIt's just as important to be able to
discriminator.train_on_batch(imgs, [valid, labels], class_weight=class_weights)
Additional knowledge:keras model training and saving call_back settings
1. Model training
fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)
Parameters.
x: a Numpy array of training data (if the model has only one input), or a list of Numpy arrays (if the model has multiple inputs). If the input layers in the model are named, you can also pass a dictionary that maps the input layer names to Numpy arrays. If feeding data from a local framework tensor (such as the TensorFlow data tensor), x can be None (the default).
y: a Numpy array of target (labeled) data (if the model has only one output), or a list of Numpy arrays (if the model has more than one output). If the output layers in the model are named, you can also pass a dictionary that maps the output layer names to Numpy arrays. If feeding data from a local framework tensor (such as the TensorFlow data tensor), y can be None (the default).
batch_size: integer or None. number of samples per gradient update. If not specified, the default is 32.
epochs: integers. The number of iteration rounds to train the model. An epoch is a round of iterations over the whole x and y. Note that together with initial_epoch, epochs is interpreted as "final rounds". The model does not train for epochs, but stops training at the first epoch.
verbose: 0, 1 or 2. log display mode. 0 = quiet mode, 1 = progress bar, 2 = one line per round. 0 = quiet mode, 1 = progress bar, 2 = one line per turn.
callbacks: a set of instances. A set of callback functions that can be used during training.
validation_split: Float number between 0 and 1. The proportion of training data used as the validation set. The model will split a portion of the validation data that will not be trained and will evaluate the error and any other model metrics for this validation data at the end of each round. The validation data is the last portion of the sample in which the x and y data were mixed and washed before.
validation_data: tuple (x_val, y_val) or tuple (x_val, y_val, val_sample_weights) used to evaluate losses, and any model metrics at the end of each round. The model will not be trained on this data. This parameter overrides validation_split.
shuffle: Boolean (whether to shuffle the data before each iteration) or string (batch). batch is a special option for dealing with HDF5 data constraints, which shuffles the data inside a batch. This parameter is not valid when steps_per_epoch is not None.
class_weight: optional dictionary to map class indexes (integers) to weight (floating point) values for weighting the loss function (during training only). This may help tell the model to "pay more attention" to samples from underrepresented classes.
sample_weight: optional Numpy weight array of training samples to weight the loss function (during training only). You can pass flat (1D) Numpy arrays of the same length as the input samples (1:1 mapping between weights and samples), or in the case of time-series data, you can pass 2D arrays of size (samples, sequence_length) to apply a different weight to each sample for each time step. In this case, you should make sure to specify sample_weight_mode="temporal" in compile().
initial_epoch: integer. The number of rounds to start training (helps to recover previous training).
steps_per_epoch: integer or None. The total number of steps (sample batches) before stating that one round is complete and starting the next. When training with an input tensor such as the TensorFlow data tensor, the default value of None is equal to the number of samples in the dataset divided by the size of the batch, or 1 if this cannot be determined.
validation_steps: only useful if steps_per_epoch is specified. Total number of steps to validate before stopping (batch samples)
fit_generator(generator, steps_per_epoch=None, epochs=1, verbose=1, callbacks=None, validation_data=None, validation_steps=None, class_weight=None, max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True, initial_epoch=0)
Train the model batch by batch using data generated batch by batch by Python generator (or Sequence instance)
parameters
generator: A generator, or an instance of the Sequence () object, to avoid duplication of data when using multiple processes. The output of the generator should be one of the following:
A (inputs, targets) tuple
an (inputs, targets, sample_weights) tuple。
This tuple (the single output of the generator) forms a single batch. Therefore, all arrays in this tuple must be of the same length (equal to the size of this one batch). Different batch sizes may be different. For example, the last batch of an epoch will tend to be smaller than the other batch if the size of the dataset is not divisible by the batch size. The generator will loop over the dataset indefinitely. When it reaches steps_per_epoch, it marks the end of an epoch.
steps_per_epoch: The total number of steps (batch samples) generated from the generator before declaring one epoch complete and starting the next. It should normally be equal to the number of samples in your dataset divided by the batch size. For Sequence, it is optional: if not specified, len(generator) will be used as the number of steps.
epochs: integer. The total number of iterations to train the model. An epoch is one iteration of the entire data provided, as defined by steps_per_epoch. Note that when used with initial_epoch, epoch should be interpreted as "last round". The model is not trained through multiple iterations given by epochs, but only until the round of the index epoch is reached.
verbose: 0, 1 or 2. log display mode. 0 = quiet mode, 1 = progress bar, 2 = one line per round. 0 = quiet mode, 1 = progress bar, 2 = one line per round.
callbacks: list of instances. A list of callback functions that are called during training.
validation_data: It can be one of the following:
Generator or Sequence Instance for Validation Data
A (inputs, targets) tuple
an (inputs, targets, sample_weights) tuple。
Losses and any model metrics are evaluated at the end of each epoch. The model is not trained on this data.
validation_steps: only available if validation_data is a generator. The total number of steps (sample batches) generated by the generator before stopping. For Sequence, it is optional: if not specified, len(generator) will be used as the number of steps.
class_weight: optional dictionary mapping class index (integer) to weight (floating point) values for weighting the loss function (during training only). This can be used to tell the model to 'pay more attention' to samples from underrepresented classes.
max_queue_size: integer. Maximum size of the generator queue. If not specified, max_queue_size will default to 10.
workers: integer. The maximum number of processes to use, if using process-based multithreading. If not specified, workers will default to 1. If 0, the generator will be executed on the main thread.
use_multiprocessing: boolean. If True, process-based multithreading is used. If not specified, use_multiprocessing will default to False. Note that since this implementation relies on multiprocessing, non-passable arguments should not be passed to the generator, as they cannot be easily passed to child processes.
shuffle: Whether to shuffle the order of the batch before each iteration. Can only be used with Sequence () instances.
initial_epoch: number of rounds to start training (helps to recover previous training)
Both the fit and fit_generator functions return an object of History, whose attributes record the change in the value of the loss function and other metrics with epoch, and also contain the change in these metrics for the validation set if there is a validation set, which can be written to the text for subsequent viewing
2, save the model structure, trained weights, and optimizer state
(filepath, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)
Save the model after each training period
Parameters:
filepath: string, the path to save the model. E.g. epoch1.h5 or (save_weights_only=True)
monitor: The data being monitored.
verbose: detail mode, 0 or 1.
save_best_only: If save_best_only=True, the best model for the monitored data will not be overwritten.
mode: one of {auto, min, max}. If save_best_only=True, then the decision to overwrite the save file depends on the maximum or minimum value of the data being monitored. For val_acc, the mode would be max, for val_loss, the mode would need to be min, and so on. In auto mode, the direction is automatically determined from the name of the data being monitored.
save_weights_only: if True, then only the model's weights are saved (model.save_weights(filepath)), otherwise the whole model is saved ((filepath)).
period: interval between each checkpoint (number of training rounds)
Example:
checkpoint = ModelCheckpoint(filepath=model_weight_filepath, monitor='val_acc', verbose=0, save_best_only=True, save_weights_only=True, mode='max', period=1) (X_train, Y_train, callbacks=[checkpoint])
3. How do I discontinue training when the verification loss no longer continues to decrease? Discontinue training when monitoring values no longer improve
EarlyStopping callback function:
( monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto', baseline=None, restore_best_weights=False)
When the number being monitored is no longer elevated, the training is stopped
Parameters:
monitor: The data being monitored.
min_delta: the smallest change in the monitored data that is considered to be a lift, e.g. an absolute change of less than min_delta is considered to be no lift.
patience: the number of training rounds without progression, after which training is stopped.
verbose: Detailed information mode, 0 or 1.
mode: {auto, min, max} one of them. In min mode, training stops when the monitored data stops falling; in max mode, training stops when the monitored data stops rising; in auto mode, the direction is automatically determined from the name of the monitored data.
baseline: The baseline value for the quantity to be monitored. If the model does not show improvement in the baseline, training will stop.
restore_best_weights: whether to restore the model weights from the period with the best values for the monitored number. If False, the model weights obtained in the last step of training are used.
Example:
earlystopping = EarlyStopping(monitor='val_acc', verbose=1, patience=3)
(X_train, Y_train, callbacks=[earlystopping])
4. Dynamic adjustment of learning rates
( monitor='val_loss', factor=0.1, patience=10, verbose=0, mode='auto', min_delta=0.0001, cooldown=0, min_lr=0)
Decrease the learning rate when standardized assessments stop improving.
When learning stops, the model will always benefit from a 2-10x reduction in the learning rate. The callback function monitors a piece of data and when that data does not improve after a certain number of "patient" training rounds, then the learning rate is reduced.
parameters
monitor: The data being monitored.
factor: The factor by which the learning rate is reduced. New Learning Rate = Learning Rate * Factor
patience: the number of training rounds without progression, after which the training rate is reduced.
verbose: integer. 0: quiet, 1: update info.
mode: {auto, min, max} one of them. In min mode, the learning rate will be reduced if the monitored data has stopped falling; in max mode, the learning plastic will be reduced if the monitored data has stopped rising; in auto mode, the direction will be inferred automatically from the monitored data.
min_delta: For measuring the new optimized threshold, focus only on large changes.
cooldown: the number of training rounds to wait before resuming normal operation after the learning rate has been reduced.
min_lr: lower bound of the learning rate.
Example:
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.001) (X_train, Y_train, callbacks=[reduce_lr])
4. Tensorboard visualization
( log_dir='./logs', histogram_freq=0, batch_size=32, write_graph=True, write_grads=False, write_images=False, embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None, embeddings_data=None, update_freq='epoch')
The callback function writes a log for the Tensorboard so that you can visualize dynamic images of standard evaluations for testing and training, as well as histograms of activation values for different layers in the model.
If you have installed Tensorflow using pip, you should be able to start Tensorflow from the command line:
tensorboard --logdir=/full_path_to_your_logs
Parameters:
log_dir: The filename used to save the log files analyzed by TensorBoard.
histogram_freq: How often the histogram of activation values and model weights is computed for each layer in the model (in training rounds). If set to 0, histograms are not computed. Be sure to explicitly point out the validation data (or separation data) for the histogram visualization.
write_graph: Whether to visualize the image in TensorBoard. If write_graph is set to True, the log file will become very large.
write_grads: Whether to visualize histograms of gradient values in TensorBoard. histogram_freq must be greater than 0.
batch_size: The size of the input batch of the incoming neuron network to be used in the histogram calculation.
write_images: Whether to visualize model weights as images in TensorBoard.
embeddings_freq: The frequency at which the selected embedding layer will be saved (in the training round).
embeddings_layer_names: a list of names of layers that will be monitored. If None or an empty list, then all embeddings will be monitored.
embeddings_metadata: a dictionary that corresponds to the name of the layer to the name of the file that holds the metadata for this embedded layer. See details about the metadata data format. Strings can be passed in case the same metadata is used for the embedding layer used.
embeddings_data: The data to be embedded in the layers specified by embeddings_layer_names. Numpy array (if the model has a single input) or a list of Numpy arrays (if the model has multiple inputs). Learn ore about embeddings.
update_freq: 'batch' or 'epoch' or integer. When 'batch' is used, the loss and evaluation values are written to the TensorBoard after each batch. The same applies to 'epoch'. If you use an integer, such as 10000, this callback writes the loss and evaluation values to the TensorBoard after every 10000 samples. Note that frequent writes to the TensorBoard will slow down your training.
5、How to record the training/validation loss/accuracy for each epoch?
The function returns a History callback that has an attribute history containing an encapsulated continuous loss/accurate lists.
The code is as follows:
hist = (X, y,validation_split=0.2)
print()
How to save the values of loss, val output by Keras to text
The fit function in Keras returns a History object, which has properties that save all those previous values in it, and also contains the changes in those metrics for the validation set, if there is one, as written:
hist=(train_set_x,train_set_y,batch_size=256,shuffle=True,nb_epoch=nb_epoch,validation_split=0.1) with open('log_sgd_big_32.txt','w') as f: (str())
6, multiple callback functions separated by commas
Example:
model_weight_filepath = "./bert_classfition-test_model" + str(i) + ".weight" earlystopping = EarlyStopping(monitor='val_acc', verbose=1, patience=3) reducelronplateau = ReduceLROnPlateau(monitor="val_acc", verbose=1, mode='max', factor=0.5, patience=2) checkpoint = ModelCheckpoint(filepath=model_weight_filepath, monitor='val_acc', verbose=0, save_best_only=True, save_weights_only=True, mode='max', period=1) model.fit_generator( train_D.__iter__(), steps_per_epoch=len(train_D), epochs=epochs, validation_data=valid_D.__iter__(), validation_steps=len(valid_D), callbacks=[earlystopping, reducelronplateau, checkpoint])
Above this keras custom loss loss function,sample on the loss of weighting and metric details is all I share with you, I hope to give you a reference, and I hope you support me more.