Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process of model development when missing data exist. Methods: Complete data on 2720 prisoners was available. A logistic regression model was fitted and served as gold standard. We then randomly omitted 20%, and 50% of data. Missing date were imputed 10 times, applying multiple imputation by chained equations (MICE). Rubin’s rule (RR) was applied to select candidate variables and to combine the results across imputed data sets. In S1, S2, and S3 methods, variables retained significant in one, five, and ten imputed data sets and were candidate for the multifactorial model. Two weighting approaches were also applied. Findings: Age of onset of drug use, recent use of drug before imprisonment, being single, and length of imprisonment were significantly associated with drug injection among prisoners. All variable selection schemes were able to detect significance of these variables. Conclusion: We have seen that the performances of easier variable selection methods were comparable with RR. This indicates that the screening step can be used to select candidate variables for the multifactorial model.