What are raters estimating: How much do ratings on scale criteria really reflect the characteristics of student performances in terms of the various components of the criteria
- 能力評審員如何作出判斷:評審準則的等級如何能在各項評審範疇中反映學生不同的表現
- This paper aims to explore rater behaviour in assessing oral presentations using verifiable quantitative measures (VQM) as an external validity check on ratings. Twelve raters from a range of backgrounds were recruited to rate 115 Secondary 3 student oral performances in 'individual presentations'. These performances were drawn from a sample of 10 schools participating in a pretest conducted for the commencement of the oral component of Hong Kong's Territory-wide System Assessment in 2006. About 20 students were drawn from each school in three ability categories: low, medium and high. Students were selected based on their internal examination results. Fifty-eight of the 115 student performances were transcribed and assessed on VQM for 'ideas and organisation', 'vocabulary and language patterns' and 'pronunciation and delivery'. VQM results were correlated (Spearman's and Pearson's 'r') against fair average scores derived from Rasch analysis of ratings. The resultant correlations ranged from 0.6 to 0.9. It was concluded that raters were estimating values for constructs highly similar to those measured in VQM. 本研究是以可驗證的量化量度方法,探究英文科說話評審員在說話評估「個人短講」中的評審表現。是次研究,邀請了12位具有不同學歷及經驗的人士擔任說話評審員,對115位中學三年級學生進行說話能力測試,評審學生的說話能力。這些學生是來自於參加2006 年「全港性系統評估」預試的學校。參加預試的學校共有10所,其中包括不同能力組別的學生。每所學校都是根據校內的英文科成績(高、中、低),挑選20位學生參加說話能力測試。是次研究是從115位參加測試的學生中,抽取了58位學生進行研究,以文字紀錄了這些學生在個人短講中的內容,再以可驗證的量化量度方法(VQM),評估學生在「內容和組織」、「詞彙和句式」和「發音和表達」三方面的表現。本研究亦利用史比爾曼「」及皮爾遜「r」計算VQM 數據和羅許平均值的相關系數。是次研究所獲得的相關系數為0.6 至0.9 。總而言之,說話評審員給予評級的建構值和VQM 所得的結果甚為相近。[Copyright of Hong Kong Teachers' Centre Journal is the property of Hong Kong Teachers' Centre at http://www.edb.org.hk/hktc]
