3. Suppose you are using the MNIST dataset to train a neural network model to recognize handwritten digits. Which of the following settings is practical? (A) To accelerate convergence, you can set the learning rate to 256 initially. (B) To make the model more generalized, you can randomly apply image mirroring with a probability of 1/2. (C) To avoid gradient explosion, you can limit the gradient size to 2. (D) To avoid difficulty in training the model, the order of the images should be fixed during training. (E) Since the goal of training is to classify digits from 0 to 9, which is a regression problem, you should set the number of output nodes to 1 and use mean square error as the loss function.