Some Thoughts on Data Science

by Prof. Seungchul Lee
Industrial AI Lab

Table of Contents

1. Get Involved in Data Science Now

  • Most researchers will be interested in how data science and machine learning techniques can be applied to their domains
  • but you will need to spend substantial time learning the domain itself

Data Problems We Would Like to Solve

Solving with Deep Learning

  • When you come up against some machine learning problem with “traditional” features (i.e., human-interpretable characteristics of the data),
    • do not try to solve it by applying deep learning methods first
    • Instead, use
      • linear regression/classification,
      • linear regression/classification with non-linear features, or
      • gradient boosting methods
  • If you really want to squeeze out a 1-2% improvement in performance, then you can apply deep learning
    • However, it’s also undeniable that deep learning has made remarkable progress for structured data like images, audio, or text

What About “Superhuman” Machine Learning

  • It’s a common misconception that machine learning will outperform human experts on most tasks
    • No, it is supervised learning
    • Cannot be better than your training data
  • In reality, the benefit from machine learning often doesn’t come from superhuman performance in most cases,
    • it comes from the ability to scale up expert-level performance extremely quickly

Dealing with Impossible Problems

  • You’ve built a tool to manually classify examples, run through many cases (or had a domain expert run through them), and you get poor performance
  • What do you do?
    • You do not try to throw more, bigger machine learning algorithms at the problem
  • Instead you need to change the problem by:
    • 1) changing the input (i.e., the features),
    • 2) changing the output (i.e., decomposing it to smaller sub-problems)

Chainging Input (i.e., Adding Features)

  • Adding more data is good, but:

    • 1) Do spot checks (visually) to see if this new features can help you differentiate between what you were previously unable to predict

    • 2) Get advice from domain experts, see what sorts of data source they use in practice (if people are already solving the problem)

Changing Output (i.e., Changing the Problem)

  • Just make the problem easier!
  • Decompose it to smaller sub-problems

Machine Learning vs. Deep Learning

  • State-of-the-art until 2012

  • Deep supervised learning

  • Hyperparameters

    • Learning rate
    • # of iterations
    • # of hidden layers
    • # of hidden units
    • Choice of activation functions

2. Study Materials

Deep Learning for ME

  • 딥러닝은 인공지능 연구자보다 여러분에게 더 필요할 수 있습니다.
  • 새로운 기술을 어디에 적용해 볼 수 있을지 고민하세요.

인공지능 어떻게 공부할 것인가?

  • Deep learning 으로 인공지능을 처음 공부하면 안된다.
  • Linear algebra, Optimization, Statistics, Probability, Machine Learning
    • Then deep learning
  • (Numerical or Scientific) Computer Programming
    • MATLAB or Python
    • 개념, 수식, 코드

유용한 공부 자료

강의 대부분의 내용은 아래 연구자분들의 자료를 선택적으로 취합해서 만들어졌습니다.

1) Linear Algebra

2) Optimization and Linear Systems

3) Machine Learning

4) Deep Learning

University Lectures on Deep Learning

  • CMU

  • NYU

  • MIT

  • Toronto


한국어 강좌

In [1]: