In recent years, large-scale international assessments have been increasingly used to evaluate and compare the quality of education across regions and countries. However, measurement variance between different versions of these assessments often posts threats to the validity of such cross-cultural comparisons. In this study, we investigated the cross-language, cross-cultural validity of the Programme for International Student Assessment 2006 Science assessment via three differential item functioning (DIF) analyses between the USA and Canada, Chinese Hong Kong and mainland China, and between the USA and mainland China. Furthermore, we explored three plausible causes of DIF via content analysis, namely language, curriculum and cultural differences. Our results revealed that differential curriculum coverage was the most serious cause of DIF among the three factors we investigated in this study, and differential content familiarity also contributed to DIF here. We discussed the implications of the findings for future international assessment development, and for how to best define 'scientific literacy' for students around the world.[Copyright of Educational Psychology is the property of Routledge. Full article may be available at the publisher's website: http://dx.doi.org/10.1080/01443410.2014.946890]