Occupations on the map: Using a super learner algorithm to downscale labor statistics

Abstract

Detailed and accurate labor statistics are fundamental to support social policies that aim to improve the match between labor supply and demand, and support the creation of jobs. Despite overwhelming evidence that labor activities are distributed unevenly across space, detailed statistics on the geographical distribution of labor and work are not readily available. To fill this gap, we demonstrated an approach to create fine-scale gridded occupation maps by means of downscaling district-level labor statistics, informed by remote sensing and other spatial information. We applied a super-learner algorithm that combined the results of different machine learning models to predict the shares of six major occupation categories and the labor force participation rate at a resolution of 30 arc seconds (1x1 km) in Vietnam. The results were subsequently combined with gridded information on the working-age population to produce maps of the number of workers per occupation. The super learners outperformed (n 6) or had similar (n 1) accuracy in comparison to best-performing single machine learning algorithms. A comparison with an independent high-resolution wealth index showed that the shares of the four low-skilled occupation categories (91 of the labor force), were able to explain between 28 and 43 of the spatial variation in wealth in Vietnam, pointing at a strong spatial relationship between work, income and wealth. The proposed approach can also be applied to produce maps of other (labor) statistics, which are only available at aggregated levels.

Publication
PLOS ONE, 17(12), pp. e0278120