Investigating The Use of Machine Learning in Health Science
The benefits in regular physical activity (PA) participation are well documented with irrefutable evidence of the effectiveness of regular PA in the prevention of several chronic diseases (e.g., cardiovascular disease, diabetes, obesity, depression, etc.) and premature death (Warburton et al., 2006). In Ireland, the Children’s Sport Participation and Physical Activity (CSPPA) study found just 17% of Irish children engaged in the recommended one hour per day of moderate to vigorous physical activity, which is a drop from the 19% recorded in 2010. (Woods et al., 2018). Therefore, it seems important to explore factors that influence PA participation and a means to measure impact.
Machine learning (ML) has become a common way to measure physical activity (Narayanan et al., 2020). Accelerometer data processing techniques based on pattern recognition have been shown to provide accurate predictions of physical activity type and more accurate assessments of physical activity intensity (Trost et al., 2012, 2018; Ellis et al., 2016). Nonetheless, the uptake of machine learning methods by physical activity researchers has been slow, in part due to the difficulties of implementation, and the consistent finding that models trained on accelerometer data from laboratory-based activity trials do not generalize well to free-living environments (Sasaki et al., 2016; Lyden et al., 2014; Bastian et al., 2015). Recently, it has been argued that ML has failed physical activity research in four important ways: a lack of benchmark data, priority in methods development, limited software integration and absence of training (Fuller, Ferber and Stanley, 2022). To improve the use of ML methods in physical activity research, it has been proposed that as a discipline, practitioners must use and publish benchmark datasets to allow for increased opensource methods development. Many datasets currently exist on health-related fitness, wellbeing, confidence and motivation towards PA amongst the Irish youth demographic, a recent example being the Moving Well-Being Well project, a national study which assessed a range of variables linked to PA participation in over 2,000 Irish children. One finding from the study highlighted the low levels of fundamental movement skills (FMS) mastery in Irish primary school children (Behan et al., 2019). FMS proficiency has shown positive associations with increased physical activity in both children and adolescents (Barnett et al., 2009; Lubans et al., 2010).
Current PA-related datasets are limited in terms of the analyses undertaken, due to a multitude of factors, a lack of training being one of them (Fuller, Ferber and Stanley, 2022). Given the size and complexities of these datasets, as well as the societal need to create a healthier Ireland, it would appear necessary that relevant ML techniques are explored in an effort to uncover further unknown patterns and correlations/disparities that exist between specific age groups and genders. This research aims to explore the use of ML in health science, specifically focusing upon children and youth with a further aim of utilizing ML in an appropriate manner towards PA to have a societal impact on Irish health.