Adopting a phenomenographic approach, Learning Study utilises diagnostic approaches to identify students’ ways of understanding before teaching to tailor-make teaching to cater for students’ needs and after teaching to evaluate the effectiveness of pedagogical designs in research lessons. Yet in the existing publications, there was little discussion on the methodology of designing and analysing pre- and post-tests in Learning Study to ensure the test validity and reliability. This paper reports a study that attempted to fill the knowledge gap by analysing the pre- and post-tests adopted in 16 Learning Study projects in Hong Kong (4 from each of the four key learning areas, i.e., Chinese, English, mathematics and science, 2 from primary schools and 2 from secondary schools respectively in each area). It reviewed the frameworks of test validity and reliability by Heffner (2014) and designed a comprehensive checklist of test validity and reliability for analysing the 16 sets of pre- and post-tests in terms of purposes, contents, structures, types of questions and ways of test analysis. It was found that the application of Variation Theory helped identify the critical features and their structure, which enhanced construct and content validity of the tests. The use of pilot tests and team meetings promote Inter-Rater Reliability. Observations of teaching and pre- and post-lesson interviews were often employed to triangulate with test data to increase the credibility of evaluation. The paper concludes with recommendations for developing valid and reliable pre- and post-tests for diagnostic purposes in schools. Copyright © 2017 World Association of Lesson Studies.