Important announcements will be posted on Piazza. Calendar: Click here for detailed information of all lectures, office hours, and due dates. Contact: Please use Piazza for all questions related to lectures and coursework.
For SCPD students, please email scpdsupport stanford. Probabilistic graphical models are a powerful framework for representing complex domains using probability distributions, with numerous applications in machine learning, computer vision, natural language processing and computational biology. Graphical models bring together graph theory and probability theory, and provide a flexible framework for modeling large collections of random variables with complex interactions.
This course will provide a comprehensive survey of the topic, introducing the key formalisms and main techniques used to construct them, make predictions, and support decision-making under uncertainty. The aim of this course is to develop the knowledge and skills necessary to design, implement and apply these models to solve real problems.
The course will cover: 1 Bayesian networks, undirected graphical models and their temporal extensions; 2 exact and approximate inference methods; 3 estimation of the parameters and the structure of graphical models.
Prerequisites: Students are expected to have background in basic probability theory, statistics, programming, algorithm design and analysis. MIT Press. Lecture notes: Lecture notes are available here and will be periodically updated throughout the quarter.
Available online. Graphical models, exponential families, and variational inference by Martin J. Wainwright and Michael I. Each homework is centered around an application and will also deepen your understanding of the theoretical concepts. Homeworks will be posted on Piazza. Written Assignments: Homeworks should be written up clearly and succinctly; you may lose points if your answers are unclear or unnecessarily complicated.
You are encouraged to use LaTeX to writeup your homeworks here is a templatebut this is not a requirement. Collaboration Policy and Honor Code: You are free to form study groups and discuss homeworks and projects. However, you must write up homeworks and code from scratch independently without referring to any notes from the joint session. You should not copy, refer to, or look at the solutions in preparing their answers from previous years' homeworks. It is an honor code violation to intentionally refer to a previous year's solutions, either official or written up by another student.
Anybody violating the honor code will be referred to the Office of Judicial Affairs. We will be using the GradeScope online submission system. Students can typeset or scan their homeworks. Late Homework: You have 6 late days which you can use at any time during the term without penalty.
For a particular homework, you can use only two late days. Each late homework should be clearly marked as "Late" on the first page. Regrade Policy: You may submit a regrade request if you believe that the course staff made an error in grading. Any regrade requests should be submitted through Gradescope within one week of receiving your grade. Please try to be as specific as possible with your regrade request.
Announcements Important announcements will be posted on Piazza. Location: Nvidia Auditorium and Skilling Auditorium. Coursework Course Description: Probabilistic graphical models are a powerful framework for representing complex domains using probability distributions, with numerous applications in machine learning, computer vision, natural language processing and computational biology.
Assignments: Written Assignments: Homeworks should be written up clearly and succinctly; you may lose points if your answers are unclear or unnecessarily complicated. Submission Instructions: We will be using the GradeScope online submission system. Here are some tips for submitting through Gradescope. Due January Course Notes This year, we have started to compile a self-contained notes for this course, in which we will go into greater detail about material covered by the course.
Hopefully, this makes the content both more accessible and digestible by a wider audience. Throughout the quarter, we will be updating these lecture notes as we progress through the material.
Please make sure that you have the most updated version by either checking back at this webpage or compiling from source using our Github reposistory.
Since these notes are still under development, there may be some errors. If you find any errors, please see the section below to contribute to the course notes! Contributing to the Course Notes We are happy to have anyone contribute to the course notes. Making significant contributions to the course notes by students taking the course will fulfill the participation portion of the grade and in very exceptional cases potentially result in extra credit.
Any changes should be given as a pull request on Github. Please email the course staff if you have any major questions or concerns about the course notes.GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. If nothing happens, download GitHub Desktop and try again. If nothing happens, download Xcode and try again. If nothing happens, download the GitHub extension for Visual Studio and try again. These notes form a concise introductory course on probabilistic graphical models.
They are based on Stanford CStaught by Stefano Ermonand have been written by Volodymyr Kuleshovwith the help of many students and course staff. This course starts by introducing graphical models from the very basics and concludes by explaining from first principles the variational auto-encoder.
The compiled version is available here. This material is under construction! Although we have written up most of it, you will probably find several typos. If you do, please let us know, or submit a pull request with your fixes via Github. Please add your changes directly to the Markdown source code.
Skip to content. Dismiss Join GitHub today GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. Sign up. Lecture notes on probabilistic graphical modeling, based on Stanford CS work in progress! CSS Branch: master.
Find file. Sign in Sign up. Go back. Launching Xcode If nothing happens, download Xcode and try again. Latest commit Fetching latest commit…. Contributing This material is under construction! You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window.
Jan 17, Let us now look at parameter learning in undirected graphical models. Unfortunately, as in the case of inference, the higher expressivity of undirected models also makes them significantly more difficult to deal with.
Fortunately, maximum likelihood learning in these models can be reduced to repeatedly performing inference, which will allow us to apply all the approximate inference techniques that we have seen earlier.
Note that is different for each set of parameters ; therefore, we also refer to it as the partition function. Bayesian networks are constructed such that for all ; MRFs do not make this modeling assumption, which makes them more flexible, but also more difficult to learn. More generally, distributions of this form are members of the exponential family of distributions, about which you can learn more in CS Many other distributions are in the exponential family, including the Bernoulli, Multinomial, Normal, Poisson, and many other distributions.
Exponential families are widely used in machine learning We refer the reader to the CS course notes for a more thorough presentation of this topic. Suppose that you have an exponential family distribution of the form Actually, exponential families have a slightly more general form, but this one will be enough for our purposes. Exponential families are also very convenient to work with computationally. Their sufficient statistics can summarize arbitrary amounts of i.
In short, it definitely pays off to learn more about exponential families! Suppose now that we are given a dataset and we want to estimate via maximum likelihood. Since we are working with an exponential family, the maximum likelihood will be concave. The log-likelihood is:. Unlike the first term, this one does not decompose across.
It is not only hard to optimize, but it is hard to even evaluate that term, as we have already seen in previous chapters. Now consider the gradient of the log likelihood. Obtaining the gradient of the linear part is obviously very easy. However, the gradient of takes a more complicated form:.
Computing the expectation on the right hand side of the equation requires inference with respect to. For example, we could sample from and construct a Monte-Carlo estimate of the expectation. However, as we have seen, inference in general is intractable, and therefore so is computing the gradient. Since covariance matrices are always positive semi-definite, this proves that is convex and therefore that the log-likelihood is concave. Recall that this was one of the properties of exponential families that we stated earlier.
In summary, even though the log-likelihood objective is convex, optimizing it is hard; usually non-convexity is what makes optimization intractable, but in this case, it is the computation of the gradient. Interestingly, however, maximum-likelihood learning reduces to inference in the sense that we may repeatedly use inference to compute the gradient and determine the model weights using gradient descent.
This observation lets us apply to learning many of the approximate inference methods that we have seen in previous chapters, such as:. Another popular approach to learning is called pseudo-likelihood.
The pseudo-likelihood replaces the likelihood. Note that each term only involves one variable and hence its partition function is going to be tractable we only need to sum over the values of one variable.
However, it is not equal to the likelihood. Note that the correct way to expand the likelihood would involve the chain rule, i. Intuitively, the pseudo-likelihood objective assumes that depends mainly on its neighbors in the graph, and ignores the dependencies on other, more distant variables. Observe also that if pseudo-likelihood succeeds in matching all the conditional distributions to the data, a Gibbs sampler run on the model distribution will have the same invariant distribution as a Gibbs sampler run on the true data distribution, ensuring that they are the same.
More formally, we can show that the pseudo-likelihood objective is concave. Assuming the data is drawn from an MRF with parameterswe can show that as the number of data points gets large. We will end with an interesting observation about the maximum likelihood estimate of the MRF parameters.UVM CS228, Garcia Carlos, Deliverable 10
Taking the gradient, and using our expression for the gradient of the partition function, we obtain the expression. Note that this is precisely the difference between the expectations of the natural parameters under the empirical i.In this chapter, we are going to use various ideas that we have learned in the class in order to present a very influential recent probabilistic model called the variational autoencoder.
Variational autoencoders VAEs are a deep learning technique for learning latent representations. They have also been used to draw imagesachieve state-of-the-art results in semi-supervised learningas well as to interpolate between sentences. There are many online tutorials on VAEs. Our presentation will probably be a bit more technical than the average, since our goal will be to highlight connections to ideas seen in class, as well as to show how these ideas are useful in machine learning research.
Consider a directed latent-variable model of the form. The learned latent space can be used to interpolate between facial expressions. To make things concrete, you may think of as being an image e. For example, one coordinate of can encode whether the face is happy or sad, another one whether the face is male or female, etc. We may also be interested in models with many layers, e.
These are often called deep generative models and can learn hierarchies of latent representations. In this chapter, we will assume for simplicity that there is only one latent layer.
Suppose now that we are given a dataset. We are interested in the following inference and learning tasks:. We have learned several techniques in the class that could be used to solve our three tasks. The EM algorithm can be used to learn latent-variable models. Recall, however, that performing the E step requires computing the approximate posteriorwhich we have assumed to be intractable. In the M step, we learn the by looking at the entire dataset Note, however, that there exists a generalization called online EM, which performs the M-step over mini-batches.
To perform approximate inference, we may use mean field. Recall, however, that one step of mean field requires us to compute an expectation whose time complexity scales exponentially with the size of the Markov blanket of the target variable. What is the size of the Markov blanket for?
If we assume that at least one component of depends on each component ofthen this introduces a V-structure into the graph of our model thewhich are observed, are explaining away the differences among the. Thus, we know that all the variables will depend on each other and the Markov blanket of some will contain all the other -variables. They also discuss the limitations of EM and sampling methods in the introduction and the methods section.
An equivalent and simpler explanation is that will contain a factorin which all the variables are tied. Another approach would be to use sampling-based methods. In addition, techniques such as Metropolis-Hastings require a hand-crafted proposal distribution, which might be difficult to choose.
We will now going to learn about Auto-encoding variational Bayes AEVBan algorithm that can efficiently solve our three inference and learning tasks; the variational auto-encoder will be one instantiation of this algorithm. AEVB is based on ideas from variational inference.
Recall that in variational inference, we are interested in maximizing the evidence lower bound ELBO. Since is fixed, we can define to be conditioned on. This means that we are effectively choosing a different for everywhich will produce a better posterior approximation than always choosing the same. How exactly do we optimize over? We may use mean fieldbut this turns out to be insufficiently accurate for our purposes.
The assumption that is fully factored is too strong, and the coordinate descent optimization algorithm is too simplistic. The first important idea used in the AEVB algorithm is a general purpose approach for optimizing that works for large classes of that are more complex than in mean field. We later combine this algorithm with specific choices of.GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
If nothing happens, download GitHub Desktop and try again. If nothing happens, download Xcode and try again. If nothing happens, download the GitHub extension for Visual Studio and try again. These notes form a concise introductory course on probabilistic graphical models. They are based on Stanford CStaught by Stefano Ermonand have been written by Volodymyr Kuleshovwith the help of many students and course staff.
This course starts by introducing graphical models from the very basics and concludes by explaining from first principles the variational auto-encoder.
The compiled version is available here.
This material is under construction! Although we have written up most of it, you will probably find several typos. If you do, please let us know, or submit a pull request with your fixes via GitHub. Please add your changes directly to the Markdown source code. This repo is configured without any extra Jekyll plugins so it can be compiled directly by GitHub Pages. Thus, any changes to the Markdown files will be automatically reflected in the live website.
To make any changes to this repo, first fork this repo. Make the changes you want and push them to your own forked copy of this repo. If you want to test your changes locally before pushing your changes to the master branch, you can run Jekyll locally on your own machine. Then, do the following from the root of your cloned version of this repo:. Avoid vertical bars in any inline math equations ie. Skip to content. Dismiss Join GitHub today GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign up. CSS Time and Location : Monday, Wednesday pmpm, links to lecture are on Canvas. Note : This is being updated for Spring The dates are subject to change as we figure out deadlines. Please check back soon. Linear Regression. Logistic Regression.
Netwon's Method Perceptron. Exponential Family. Generalized Linear Models. Naive Bayes.
Laplace Smoothing. Support Vector Machines. GMM non EM. Expectation Maximization. Factor Analysis. Value function approximation. Previous projects: A list of last quarter's final projects can be found here. Data: Here is the UCI Machine learning repositorywhich contains a large collection of standard datasets for testing learning algorithms. Slides Introduction slides [ pptx ] Introduction slides [ pdf ].
Problem Set 0. Weighted Least Squares. Class Notes Live lecture notes [ pdf ]. Problem Set 1. Class Notes Generative Algorithms [ pdf ].
CS 228 - Probabilistic Graphical Models
Class Notes Support Vector Machines [ pdf ]. Notes Python Tutorial. Class Notes Deep Learning Backpropagation. Problem Set 2. Notes Evaluation Metrics. Class Notes Regularization and Model Selection.
Notes Deep Learning. Class Notes ML Advice. Problem Set 3. Class Notes Midterm review. Class Notes Factor Analysis. Class Notes Decision trees Decision tree ipython demo Boosting algorithms and weak learning.