Accurately predicting faulty software units helps practitioners target faulty units and prioritize their efforts to maintain software quality. Prior studies use machine-learning models to detect faulty software code. We revisit past studies and point out potential improvements. Our new study proposes a revised benchmarking configuration. The configuration considers many new dimensions, such as class distribution sampling, evaluation metrics, and testing procedures. The new study also includes new datasets and models. Our findings suggest that predictive accuracy is generally good. However, predictive power is heavily influenced by the evaluation metrics and testing procedure (frequentist or Bayesian approach). The classifier results depend on the software project. While it is difficult to choose the best classifier, researchers should consider different dimensions to overcome potential bias.
Objective: Although researchers and policy makers have often considered the U.S.–Mexico border region to be at high risk for substance use problems, epidemiological studies of this region have been hard to interpret because of their modest geographic coverage, reliance on self-report, and mixed results. The current study addresses limitations of existing studies and extends the knowledge base by comparing alcohol- and drug-related mortality in counties on versus off the border across all four U.S. border states. Method: Data were from the 2008–2017 Centers for Disease Control and Prevention WONDER Multiple Causes of Death data set, American Community Survey, and Rural Urban Continuum Codes, including all four border states. Spatial lag models tested differences across on- and off-border counties in total alcohol- and drug-related mortality (“total mortality”), alcohol-related mortality, and drug-related mortality. Results: In multivariate models, mortality rates were significantly higher in off- versus on-border counties for all three outcomes (ps < .05). Rates for total mortality, alcohol-related mortality, and drug-related mortality were 28%, 82%, and 30% higher, respectively, off versus on the border. Border effects were similar, excluding California; robust over time; and stronger for Latinx versus White decedents. Conclusions: Results suggest a revised understanding of the border, revealing that residents of interior counties of border states are at highest risk of severe substance use consequences. Results are consistent with other research finding that border counties were protected against drug overdose deaths, particularly for Latinx residents.
In this paper we use a semi-supervised learning model to predict whether a person thinks buying a specific product online is appropriate. As input, information is used about the channels one deems appropriate to find product information or to find suppliers. Both online and offline channel preferences are found to be valuable to predict e-commerce adoption. The practical consequence of the work is that (binary) data about a user’s preferred channel for information retrieval can be helpful to estimate the probability the person is interested to buy a specific product online so that publicity for an online shop is only shown to people who actually believe buying that product online is appropriate. The predictive performance of our approach is considerably better than that reported in earlier research. Our results also show that semi-supervised learning has advantages in terms of predictive performance compared to supervised learning.
Open innovation in data science generally takes the form of public competitions where teams exchange messages and solutions by competing and collaborating simultaneously. Team behaviours are widely heterogeneous in terms of the performance of their solutions and the participation in knowledge creation. We present a novel research framework for open innovation by integrating system dynamics and structural topic modelling to extract open factors and adopting a machine learning-based difference-in-differences estimator to understand the impact of team behaviour on their performance using data from Kaggle's competition. Our results identify four team behaviour categories—active, learner, lurker, and passive— in data science open innovation competitions which depend on the performance of their solutions and actions related to posting and reading messages in the forum. Furthermore, the activities of model evaluation, community support, and business understanding are the top three most positive and significant factors affecting team performance. Our research contributes to the literature by highlighting the value of forum feedback and exploring the data science activities in the forum discussion, in relation to innovation performance, to enrich the empirical understanding of open innovation. Research implications for researchers and practitioners participating in, organising, and supporting data science open innovation activities are provided.
Asian Americans and Pacific Islanders (AAPIs) are often portrayed as a healthy group with minor substance use problems. Using data from two studies of patients treated in 44 community-based substance use treatment sites located in three states, 298 AAPIs and a matched comparison group of 298 non-AAPI patients were compared on demographic characteristics, treatment experiences, and 1-year outcomes. At treatment entry, more AAPIs reported recent drug use and fewer injected drugs, AAPIs had less severe medical and alcohol problems, and AAPIs reported worse general health but less desire for medical and alcohol services. After controlling for baseline problem severity, there were no differences in treatment retention, completion, or outcomes. Contrary to the model minority stereotype, AAPIs have mostly similar treatment needs, experiences, and outcomes as other racial/ethnic groups in drug treatment.
The meaning of life scale to investigate the meaning of life of 362 college students,and found that the college students' meaning of life is low,the lowest enthusiasm for life,in addition to the outside of all the scores in the sense of ownership are significantly lower than the highvocational students;the meaning of life of poor health and the general student is lower than the state of good health and very good students;health status good sense of the meaning of life of boys than girls;student groups' sense of ownership higher than other students;junior's meaning of life is higher than the freshman and sophomore.