Deep Learning Tips And Tricks

First, Why Tweak Models?

Deep learning models like the Convolutional Neural Network (CNN) have a massive number of parameters; we can actually call these hyper-parameters because they are not optimized inherently in the model. You could gridsearch the optimal values for these hyper-parameters, but you’ll need a lot of hardware and time. So, does a true data scientist settle for guessing these essential parameters?
One of the best ways to improve your models is to build on the design and architecture of the experts who have done deep research in your domain, often with powerful hardware at their disposal. Graciously, they often open-source the resulting modeling architectures and rationale.

Deep Learning Techniques
Here are a few ways you can improve your fit time and accuracy with pre-trained models:

  1. Research the ideal pre-trained architecture: Learn about the benefits of transfer learning, or browse some powerful CNN architectures. Consider domains that may not seem like obvious fits, but share potential latent features.

  2. Use a smaller learning rate: Since pre-trained weights are usually better than randomly initialized weights, modify more delicately! Your choice here depends on the learning landscape and how well the pre-training went, but check errors across epochs for an idea of how close you are to convergence.

  3. Play with dropout: As with Ridge and LASSO regularization for regression models, there is no optimized alpha or dropout for all models. It’s a hyper-parameter that depends on your specific problem, and must be tested. Start with bigger changes — a wider gridsearch span across orders of magnitude, like np.logspace() can provide— then drop down as with the learning rate above.

  4. Limit weight sizes: We can limit the max norm (absolute value) of the weights for certain layers in order to generalize our model

  5. Don’t touch the first layers: The first hidden layers of a neural network tend to capture universal and interpretable features, like shapes, curves, or interactions that are very often relevant across domains. We should often leave these alone, and focus on optimizing the meta² latent level further back. This may mean adding hidden layers so we don’t rush the process!

  6. Modify the output layer: Replace model defaults with a new activation function and output size that is appropriate for your domain. However, don’t limit yourself to the most obvious solution. While MNIST may seem like it wants 10 output classes, some numbers have common variations, and allowing for 12–16 classes may allow better settling of these variants and improved model performance! As with the tip above, deep learning models should be increasingly modified and tailored as we near output.

The future neural networks are not just "deep", like deeper than VGGNet, but instead, they might process information in a progressive way. An interesting example is the recurrent visual attention models. With proper design of the network, both efficiency of learning and inference are improved significantly, while inference accuracy is improved comparing to conventional feed forward networks. This is reasonable because less parameters are required to learn, and regularization tricks, like dropout, become less necessary.

Some Considerations About AI Developments

We Still Know Very Little About How AI Thinks

AI is becoming more and more ubiquitous, with reports of advancements or new applications coming almost daily. How much do we know about how it thinks, and how are we trying to find out more?
AI As We Understand It
Most of the AI we know today operates on a principle of deep learning: a machine is given a set of data and a desired output, and from that it produces its own algorithm to solve it. The system then repeats, perpetuating itself. This is called a neural network. It is necessary to use this method to create AI, as a computer can code faster than a human; it would take lifetimes to code it manually.
Professor of Electrical Engineering and Computer Science at MIT Tommi Jaakkola says, "If you had a very small neural network, you might be able to understand it. But once it becomes very large, and it has thousands of units per layer and maybe hundreds of layers, then it becomes quite un-understandable." We are at the stage of these large systems now. So, in order to make these machines explain themselves - an issue that will have to be solved before we can place any trust in them - what methods are we using?
1. Reversing the algorithms.
In image recognition, this involves programming the machine to produce or modify pictures when the computer recognizes a pattern it has learned. Take the example of a Deep Dream modification of The Creation of Adam, where the AI has been told to put dogs in where it recognizes them. From this, we can learn what constitutes a dog for the A.I: firstly, it only produces heads (meaning this is what largely characterizes a dog, according to it) and secondly, the patterns that the computer recognizes as dogs are clustered around Adam (on the left) and God (on the right).
2. Identifying the data it has used.
This process of understanding AI gives AI the command to record extracts and highlight the sections of text that it has used according to the pattern it was told to recognize. Developed first by Regina Barzilay, a Delta Electronics Professor at MIT, this type of understanding applies to AIs that search for patterns in data and make predictions accordingly. Carlos Guestrin, a Professor of Machine Learning at the University of Washington, has developed a similar system that presents the data with a short explanation as to why it was chosen.
3. Monitoring individual neurons.
Developed by Jason Yosinski, a Machine Learning Researcher at Uber A.I Labs, this involves using a probe and measuring which image stimulates the neuron the most. This allows us to deduce what the AI looks for the most through a process of deduction.
These methods, though, are proving largely ineffective; as Guestrin says, "We havenâ??t achieved the whole dream, which is where AI has a conversation with you, and it is able to explain. Weâ??re a long way from having truly interpretable AI."
And Why It Is Important To Know More
It is important to understand how these systems work, as they are already being applied to industries including medicine, cars, finance, and recruitment: areas that have fundamental impacts on our lives. To give this massive power to something we donâ??t understand could be a foolhardy exercise in trust. This is, of course, providing that the AI is honest, and does not suffer from the lapses in truth and perception that humans do.
At the heart of the problem with trying to understand the machines is a tension. If we could predict them perfectly, it would rob AI of the autonomous intelligence that characterizes it. We must remember that we donâ??t know how humans make these decisions either; consciousness remains a mystery, and the world remains an interesting place because of it.
Daniel Dennet warns, though, that one question needs to be answered before AI is introduced: "What standards do we demand of them, and of ourselves?" How will we design the machines that will soon control our world without us understanding them - how do we code our gods?

Understanding The Limits Of Deep Learning

Artificial intelligence has reached peak hype. News outlets report that companies have replaced workers with IBM Watson and algorithms are beating doctors at diagnoses. New A.I. startups pop up every day and claim to solve all your personal and business problems with machine learning.
Ordinary objects like juicers and wifi routers suddenly advertise themselves as "powered by AI". Not only can smart standing desks remember your height settings, they can also order you lunch.
Much of the A.I. hubbub is generated by reporters who've never trained a neural network and startups hoping to be acquihired for engineering talent despite not solving any real business problems. No wonder there are so many misconceptions about what A.I. can and cannot do.
Deep Learning Is Undeniably Mind-Blowing
Neural networks were invented in the 60s, but recent boosts in big data and computational power made them actually useful. A new discipline called "deep learning" arose and applied complex neural network architectures to model patterns in data more accurately than ever before.
The results are undeniably incredible. Computers can now recognize objects in images and video and transcribe speech to text better than humans can. Google replaced Google Translate's architecture with neural networks and now machine translation is also closing in on human performance.
The practical applications are mind-blowing as well. Computers can predict crop yield better than the USDA and indeed diagnose cancer more accurately than elite physicians.
John Launchbury, a Director at DARPA, describes three waves of artificial intelligence: 1) Handcrafted knowledge, or expert systems like IBM's Deep Blue or Watson, 2) Statistical learning, which includes machine learning and deep learning, and 3) Contextual adaption, which involves constructing reliable, explanatory models for real world phenomena using sparse data, like humans do.
As part of the current second wave of AI, deep learning algorithms work well because of what Launchbury calls the "manifold hypothesis." In simplified terms, this refers to how different types of high-dimensional natural data tend to clump and be shaped differently when visualized in lower dimensions.
By mathematically manipulating and separating data clumps, deep neural networks can distinguish different data types. While neural nets can achieve nuanced classification and predication capabilities they are essentially what Launchbury calls "spreadsheets on steroids."
But Deep Learning Has Also Deep Problems

At the recent AI By The Bay conference, Francois Chollet emphasized that deep learning is simply more powerful pattern recognition vs. previous statistical and machine learning methods. "The most important problem for A.I today is abstraction and reasoning," explains Chollet, an AI Researcher at Google and famed inventor of widely used deep learning library Keras. "Current supervised perception and reinforcement learning algorithms require lots of data, are terrible at planning, and are only doing straightforward pattern recognition."
By contrast, humans "learn from very few examples, can do very long-term planning, and are capable of forming abstract models of a situation and manipulate these models to achieve extreme generalization."
Even simple human behaviors are laborious to teach to a deep learning algorithm. Let's examine the task of not being hit by a car as you walk down the road. If you go the supervised learning route, you'd need huge data sets of car situations with clearly labeled actions to take, such as "stop" or "move". Then you'd need to train a neural network to learn the mapping between the situation and the appropriate action.
If you go the reinforcement learning route, where you give an algorithm a goal and let it independently determine the ideal actions to take, the computer would need to die thousands of times before learning to avoid cars in different situations.
"You cannot achieve general intelligence simply by scaling up today's deep learning techniques," warns Chollet.
Humans only need to be told once to avoid cars. We're equipped with the ability to generalize from just a few examples and are capable of imagining (i.e. modeling) the dire consequences of being run over. Without losing life or limb, most of us quickly learn to avoid being overrun by motor vehicles.
While neural networks achieve statistically impressive results across large sample sizes, they are "individually unreliable" and often make mistakes humans would never make, such as classify a toothbrush as a baseball bat.
Your results are only as good as your data. Neural networks fed inaccurate or incomplete data will simply produce the wrong results. The outcomes can be both embarrassing and damaging. In two major PR debacles, Google Images incorrectly classified African Americans as gorillas, while Microsoft's Tay learned to spew racist, misogynistic hate speech after only hours training on Twitter.
Undesirable biases may even be implicit in our input data. Google's massive Word2Vec embeddings are built off of 3 million words from Google News. The data set makes associations such as "father is to doctor as mother is to nurse" which reflect gender bias in our language. Researchers such as Tolga Bolukbasi of Boston University have taken to human ratings on Mechanical Turk to perform "hard de-biasing" to undo the associations.
Such tactics are essential since, according to Bolukbasi, "word embeddings not only reflect stereotypes but can also amplify them." If the term "doctor" is more associated with men than women, then an algorithm might prioritize male job applicants over female job applicants for open physician positions.
Finally, Ian Goodfellow, inventor of generative adversarial networks (GANs), showed that neural networks can be deliberately tricked with adversarial examples. By mathematically manipulating an image in a way that is undetectable to the human eye, sophisticated attackers can trick neural networks into grossly misclassifying objects.