Are "risk factors" the independent variables or the dependent variables in our model?
Risk factors are the independent variables in our model, and are what we will use to predict the dependent variable.
In many situations, a dataset is handed to you and you are tasked with discovering which variables are important. But for the Framingham Heart Study, the researchers had to collect data from patients. In a situation like this one, where data needs to be collected by the researchers, should the potential risk factors be defined before or after the data is collected?
The researchers should first hypothesize potential risk factors, and then collect data corresponding to those risk factors. Of course, they could always define more risk factors later and collect more data, but this data would take longer to collect.