Occupations on the map: Using a super learner algorithm to downscale labor statistics

Abstract

Detailed and accurate labor market statistics are fundamental to support social policies that aim to improve the match between labor supply and demand, and support the creation of jobs. Despite overwhelming evidence that labor activities are distributed unevenly across space, detailed statistics on the geographical distribution of labor and work are not readily available. To fill this gap, we demonstrated an approach to create fine-scale gridded occupation maps by means of downscaling district-level labor statistics informed by remote sensing and other spatial information. We applied a super learner algorithm that combines the results of different machine learning models to predict the shares of six major occupation categories and the labor force participation rate at a resolution of 30 arc seconds (1x1 km) in Vietnam. The results are subsequently combined with gridded information on the working-age population to produce maps of the number of workers per occupation. The super learners outperform (n 4) or have similar (n 3) accuracy in comparison to best-performing single machine learning algorithms. A comparison with an independent high-resolution wealth index showed that the shares of the four low-skilled occupation categories (91 of the labor force), were able to explain between 27 and 45 of the spatial variation in wealth in Vietnam, pointing at a strong spatial relationship between work, income and wealth. The proposed downscaling approach can also be applied to produce maps of other (labor) statistics, which are only available at aggregated levels