Notes from the Wired

Notes from the Wired

This is a website where I write articles on various topics that interest me, carving out a bit of cyberspace for myself.

You shouldn't believe anything I talk about — I use words entirely recreationally.

Pinned

Most Recent

  • Oct. 28

    Deep Residual Learning for Image Recognition Paper Title: Deep Residual Learning for Image Recognition Link to Paper: https://arxiv.org/abs/1512.03385 Date: 10. Dec. 2015 Paper Type: Architecture, Learning Techniques, Deep Learning Short Abstract: Deeper networks are harder to train than shallower networks. In this paper, the authors introduce the technique of adding residual connections to the network, which drastically improves the performance of deeper networks. 1. Introduction Deep neural networks — that is, networks with many layers — have been used in many breakthroughs in machine learning. This raises the question: Is learning better networks as easy as stacking more layers?
  • Oct. 27

    AdamW: Decoupled Weight Decay Regularization Paper Title: AdamW: Decoupled Weight Decay Regularization Link to Paper: https://arxiv.org/abs/1711.05101 Date: 14. Nov. 2017 Paper Type: Optimizer, Learning Techniques, Deep Learning Short Abstract: This paper introduces the AdamW optimizer, an improvement on the Adam optimizer that additionally incorporates weight decay. 1. Introduction Adaptive, gradient-based optimizers such as AdaGrad, RMSprop, and Adam have become the default choice for training feed-forward neural networks. Still, state-of-the-art performance on many image datasets, such as CIFAR-10 and CIFAR-100, is often achieved using SGD.
  • Oct. 27

    Adam: A Method for Stochastic Optimization Paper Title: Adam: A Method for Stochastic Optimization Link to Paper: https://arxiv.org/abs/1412.6980 Date: 22. Dec. 2014 Paper Type: Optimizer, Learning Techniques, Deep Learning Short Abstract: In this paper, the famous Adam optimizer is introduced: a first-order, gradient-based optimization method that uses adaptive estimates of momentum. It generalizes well across different architectures and tasks and outperforms many optimizers that came before it. 1. Introduction Many problems in the fields of science and engineering can be formulated as the optimization of some scalar objective function requiring maximization or minimization.