Below, you will find download info, tutorials, readings and links to other useful websites – but first: an analogy that I found helpful in my teaching, for instance when students forget to load a package after installing it, forget to check data sets for typos etc. before carrying out analyses, …
P-Values, Replication, Reproducibility, and Open Science
Currently, there is an ongoing debate about the way in which we can avoid abuse of statistics and ensure that research is as unbiased as possible. You can find some discussion in a recent articles and more references and links on another pages of the Experimentalfieldlinguistics-Blog:
- Amrhein V, Greenland S, MsShane B. 2019 Retire statistical significance. Nature 567, 305–307.(doi:10.1038/d41586-019-00857-9)
- Colquhoun, D. (2017). The reproducibility of research and the misinterpretation of p-values. Royal society open science, 4(12), 171085.
- Gigerenzer, G. (2018). Statistical rituals: The replication delusion and how we got there. Advances in Methods and Practices in Psychological Science, 1(2), 198-218.
- The Open Science Page on this blog, with further readings and links to relevant webpages: https://experimentalfieldlinguistics.wordpress.com/open-science/
R(studio): Downloads and Info
- R: http://www.r-project.org/
- R Studio: http://www.rstudio.com/
- Wikipedia article about R http://en.wikipedia.org/wiki/R_(programming_language)
Resource Sites, Blogs, and Groups
- R Seek: targeted searches for R resources and functions http://rseek.org/
- R-statistics Net: An educational resource for all things related to R language and its applications in advanced statistical computing and machine learning. http://rstatistics.net/
- Quick R / Statsmethods Net: a resource site aimed at people with some statistical background who want to learn R http://www.statmethods.net/
- R-Bloggers: http://www.r-bloggers.com/
- Data Science Central: an online resource for big data practitioners http://www.datasciencecentral.com/
- Shravan Vasishth’s website: http://www.ling.uni-potsdam.de/~vasishth/
- Statistics 545: http://stat545-ubc.github.io/index.html
- Data Sharkie blog: https://datasharkie.com
- R-Tutorials on the LingMethodsHub: https://lingmethodshub.github.io/content/R/
MOOCs, Youtube, Webinars
-
Dr. Morton Anne Gernsbacher’s online Intro to Stats class
https://notawfulandboring.blogspot.com/2021/04/dr-morton-anne-gernsbachers-online.html - Youtube Channels and Playlists:
- Data Camp: https://www.youtube.com/channel/UC79Gv3mYp6zKiSwYemEik9A/featured
- MarinStatsLectures https://www.youtube.com/user/marinstatlectures/featured
- R-Programming: https://www.youtube.com/user/pradeeppandu
- LearnR https://www.youtube.com/user/TheLearnR
- The New Boston https://www.youtube.com/playlist?list=PL6gx4Cwl9DGCzVMGCPi1kwvABu7eWv08P
- Christoph Scherber
https://www.youtube.com/channel/UCREyQL8aE7mLWkb6_KOMKIg
- Data Science Central Webinars: http://www.datasciencecentral.com/video/video/listFeatured
- MOOCs
- EdX (online courses from MITx, HarvardX, BerkeleyX, UTx, etc.) https://www.edx.org/
- Khan Academy: https://www.khanacademy.org/
- Coursera: https://www.coursera.org/
- Udacity: https://www.udacity.com/
- Data Camp: https://www.datacamp.com/
- Searchable MOOC lists:
General Introduction, Cheat Sheets, and Overview
- RStudio Cheat Sheets: https://www.rstudio.com/resources/cheatsheets/
- Datascience Cheat Sheet (including info about data formats, tools, tutorial links, etc.) https://www.datasciencecentral.com/profiles/blogs/20-cheat-sheets-python-ml-data-sciencet
- Intro Tutorial: http://rstatistics.net/r-tutorial-exercise-for-beginners/
- List of Tutorials:
http://www.datasciencecentral.com/profiles/blogs/17-short-tutorials-all-data-scientists-should-read-and-practice - R-Bloggers: a list of books about R
- Absolute Beginners’ Guide to R: https://ajwills72.github.io/rminr/
- PsyBSc 2 Statistikeinführung mit R: https://pandar.netlify.app/lehre/#bsc2
Useful Packages for Linguistics & Psychology
- Tidyverse packages (Hadley Wickham): https://www.tidyverse.org/packages/
note: the tidyverse is based on the following principles: Each variable is a column; each observation is a row, and each type of observational unit is a table. For instance:- dplyr for work with dataframes
manual: https://cran.r-project.org/web/packages/dplyr/dplyr.pdf
Introduction to dplyr: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
dyplyr tutorial:http://genomicsclass.github.io/book/pages/dplyr_tutorial.htm - ggplot2 for graphics. manual: https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf
- tidyr for tidying up data, data wrangling: https://www.youtube.com/watch?v=RbUWwuJeUC8
- stringr for work with strings (very useful for corpus work and work on production experiment results, for more info, see below)
- dplyr for work with dataframes
- List of useful packages:
- languageR (with psycholinguistic data sets): https://cran.r-project.org/web/packages/languageR/languageR.pdf
- PraatR, a package for controlling Praat: http://allthingslinguistic.com/post/103840914592/praatr-an-r-package-for-controlling-praat
- The childes-db project is an open database storing data from the Child Language Database (CHILDES) in an easily accessible, tabular format. Researchers can interface with CHILDES through interactive visualizations or the childesr R package: http://childes-db.stanford.edu/. For some worked examples, see this publication.
- lme4 for mixed effects regression models (for more info, see below)
- prettyR or gmodels (crosstabs, etc. for people with SPSS withdrawal symptoms)
- for importing data: foreign, readr.
Working with Strings and Regular Expressions (RegEx)
- Regular Expression Tutorial: http://www.regular-expressions.info/tutorial.html
- Regular Expressions with The R Language http://www.regular-expressions.info/rlanguage.html
- Handling and Processing Strings in R (Gaston Sanchez):http://gastonsanchez.com/Handling_and_Processing_Strings_in_R.pdf
- Introduction to String Matching and Modification Using R and Regular Expressions (Svetlana Eden): http://biostat.mc.vanderbilt.edu/wiki/pub/Main/SvetlanaEdenRFiles/regExprTalk.pdf
- stringr for string location and manipulation
Importing, Visualizing, and Exploring Data
- see info about tidyverse above
- This shows you how to import a data set in txt-format and gives hints for other formats. It also demonstrates how you can use notepad to create a txt-file with data in colums that can be easily imported into R: https://www.r-bloggers.com/importing-data-into-r/
- This is a really comprehensive guide to data import (with a link to further tutorials, especially for xls):
http://www.r-bloggers.com/this-r-data-import-tutorial-is-everything-you-need/ - Comprehensive Guide For Data Exploration in R | R Tutorial | Learn R
https://www.analyticsvidhya.com/blog/2015/04/comprehensive-guide-data-exploration-r/ - An introduction to sorting, merging, etc. : http://www.r-bloggers.com/working-with-the-data-frame-in-r/
- A course on visualizing data with R: https://wilkelab.org/SDS375/syllabus.html
Saving and Exporting
- In RStudio, you should create a project and save it when you leave RStudio (you will be asked whether you want to save). This will save your workspace and keep the objects that you have created (e.g. through data import). It will also save your history (your list of commands, which you can see in the “history” window). For more info about projects, see: https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects. This tutorial can be helpful: https://www.stat.ubc.ca/~jenny/STAT545A/block01_basicsWorkspaceWorkingDirProject.html#workspace-and-working-directory
- In R, you can save the console your commands and the output as a text-file quite straightforwardly, using save file.
- in RStudio, things are a bit more complicated:
- You can copy and paste the console content to a text editor and save it there.
- You can use the save option in your history window for your list of commands; and you can use “sink” for your outputs, e.g.
> sink("sink-examp.txt") > 3+4 > sink()
This will create a text file with your output. For this example, this is a single line ([1] 7), but it could also be a list of lines or a table etc.
- You can use Markdown to publish your data: https://rmarkdown.rstudio.com/.
Power Analysis and Determining Sample Size
Tools for Poweranalysis:
- G*Power: https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower
- Power analysis in R: https://www.statmethods.net/stats/power.html
Readings
*recommended for beginners: Brysbaert (2019, 2020) and Sullivan (2012)
- Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of Experimental Social Psychology. https://doi.org/10.1016/j.jesp.2017.09.004
-
Brysbaert, M. (2020). Power considerations in bilingualism research: Time to step up our game. Bilingualism: Language and Cognition, 1-6. https://doi.org/10.1017/S1366728920000437
- Brysbaert, M. (2019). How Many Participants Do We Have to Include in Properly Powered Experiments ? A Tutorial of Power Analysis with Reference Tables, 2(1), 1–38. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6640316/
-
Brysbaert, M., & Stevens, M. (2018). Power Analysis and Effect Size in Mixed Effects Models: A Tutorial. Journal of Cognition, 1(1), 1–20. https://doi.org/10.5334/joc.10
- Coppock, A. (2013). 10 Things to Know About Statistical Power. Retrieved September 20, 2018, from http://egap.org/methods-guides/10-things-you-need-know-about statistical-power
- DeBruine, L. M., & Barr, D. J. (2021). Understanding mixed-effects models through data simulation. Advances in Methods and Practices in Psychological Science. https://doi.org/10.1177/2515245920965119
- Kumle, L., Võ, M.LH. & Draschkow, D. 2021. Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R. Behavior Research Methods.
- Green, P., & MacLeod, C. J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498. https://doi.org/10.1111/2041-210X.12504
- Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. American Statistician, 55(1), 19–24. https://doi.org/10.1198/000313001300339897
- Judd, C. M., Westfall, J., & Kenny, D. A. (2017). Experiments with More Than One Random Factor: Designs, Analytic Models, and Statistical Power. Annual Review of Psychology, 68(January), 601–625. https://doi.org/10.1146/annurev-psych-122414-033702
- Kain, M. P., Bolker, B. M., & McCoy, M. W. (2015). A practical guide and power analysis for GLMMs: detecting among treatment variation in random effects. PeerJ, 3, e1226. https://doi.org/10.7717/peerj.1226
-
Konstantopoulos, S., & Taylor, P. (2020). Power Analysis in Two-Level Unbalanced Designs, 78(3), 291–317. https://doi.org/10.1080/00220970903292876
-
Kumle, L., Võ, M. L.-H., & Draschkow, D. (2018). Mixedpower: a library for estimating simulation-based power for mixed models in R. https://doi.org/10.5281/zenodo.1341047
- LeBeau, B. (2019). Power Analysis by Simulation using R and simglm. Retrieved from https://ir.uiowa.edu/pq_pubs/3/
-
Magnusson, K. (2018). Powerlmm: Power analysis for longitudinal multilevel models.
-
Martin, J. (2012). PAMM: power analysis for random effects in mixed models.
-
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315. https://doi.org/10.1016/j.jml.2017.01.001
-
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample Size Planning for Statistical Power and Accuracy in Parameter Estimation. Annual Review of Psychology, 59(1), 537–563. https://doi.org/10.1146/annurev.psych.59.103006.093735
- Nakagawa, S., & Foster, T. M. (2004). The case against retrospective statistical power analyses with an introduction to power analysis, 103–108. https://doi.org/10.1007/s10211-004-0095-z
- Nicklin, C., & Vitta, J. P. (2021). Effect‐Driven Sample Sizes in Second Language Instructed Vocabulary Acquisition Research. The Modern Language Journal, 105(1), 218-236.
- Sullivan, G. M., & Feinn, R. (2012). Using Effect Size-or Why the P Value Is Not Enough. Journal of graduate medical education, 4(3), 279–282. https://doi.org/10.4300/JGME-D-12-00156.1
- Szucs, D., & Ioannidis, J. P. A. (2017). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15(3), 1–18. https://doi.org/10.1371/journal.pbio.2000797
- von Oertzen, T. (2010). Power equivalence in structural equation modelling. British Journal of Mathematical and Statistical Psychology, 63(2), 257–272. https://doi.org/10.1348/000711009X441021
- Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143(5), 2020–2045. https://doi.org/10.1037/xge0000014
Descriptive Statistics and Basic Test Statistics
- See info about tidyverse above
- Descriptive Statistics: http://www.statmethods.net/stats/descriptives.html
- Frequencies and Crosstabs: http://www.statmethods.net/stats/frequencies.html
- The “apply-family” of grouping functions:
- Basic Statistics: http://rstatistics.net/statistical-tests-with-r/
- Basic Statistics: http://www.statmethods.net/stats/
(Mixed Effects) Regression Models
-
Baayen, R. Harald, Douglas J. Davidson, and Douglas M. Bates. “Mixed-effects modeling with crossed random effects for subjects and items.” Journal of memory and language 59.4 (2008): 390-412.
-
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823.
- Bodo Winter Mixed Effects Regression tutorials:
http://www.bodowinter.com/tutorials.html - Cunnings, I. (2012). An overview of mixed-effects statistical models for second language researchers. Second Language Research, 28(3), 369-382.
- Journal of Memory and Language, special issue, 2008
- Slopes: Barr, D.J., Levy, R., Scheepers, C., & Tily, H.J. (2013) Random effects structure for confirmatory hypothesis testing: keep it maximal. Journal of Memory and Language, 68(3), 255-278.
- Dale Barr’s Funfact: Functions for planning and analyzing factorial experiments in R https://github.com/dalejbarr/funfact
- Kumle, L., Võ, M.LH. & Draschkow, D. Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R. Behav Res (2021).
- Karen Grace-Martin: The Difference Between Random Factors and Random Effects
- lme4 convergence warnings: troubleshooting
Rasch Analysis
“Classic” Texts:
- Fischer, G., & Molenaar, I. (eds.) (1995), Rasch models: Foundations, recent developments, and applications. Springer.
- Rasch, G. (1961). On general laws and the meaning of measurement in Psychology. University of California Press.
- Rost, J. (1996). Logistic mixture models. In: W. van der Linden & Hambleton (eds.), Handbook of modern item response theory. Springer.
Further Readings
- Boone, W. J., & Noltemeyer, A. (2017). Rasch analysis: A primer for school psychology researchers and practitioners. Cogent Education, 4(1), 1416898.
- Boone, W. J., Staver, J.R., & Yale, M. S. (2014). Rasch Analysis in the Human Sciences. Springer.
- Fan, J., Knoch, U., & Bond, T. (2019). Application of Rasch measurement theory in language assessment: Using measurement to enhance language assessment research and practice. Papers in Language Testing and Assessment, 8(2).
- Kean, J., Bisson, E. F., Brodke, D. S., Biber, J., & Gross, P. H. (2018). An introduction to item response theory and Rasch analysis: Application using the eating assessment tool (EAT-10). Brain Impairment, 19(1), 91-102.
- McNamara, T., & Knoch, U. (2012). The Rasch wars: The emergence of Rasch measurement in language testing. Language Testing, 29(4), 555-576.
Rasch Analysis in R
- Katz, D. Clairmont, A., Wilton, M. (2021). Measuring what Matters: Introduction to Rasch Analysis in R. Online-Buch (TAM package)
- Relevante R-Packages:
- TAM
- mixRasch
- ltm
- eRm
Rasch Analysis using Winsteps
- Winsteps: https://winsteps.com/index.htm
- short introductory video: https://www.youtube.com/watch?v=SIivnjxbHqs&list=PLdIfV_F-cZPAuC-FKNPgIaiqnhYvqjsfD&index=3
Vowel Analysis
- J. Stanley’s Tutorial: http://joeystanley.com/blog/making-vowel-plots-in-r-part-1
- Thomas Kettig and Bodo Winter’s Canadian vowel shift analysis:
paper (2017): https://www.cambridge.org/core/journals/language-variation-and-change/article/producing-and-perceiving-the-canadian-vowel-shift-evidence-from-a-montreal-community/A45A2F348CBC7AA652035F17177AFE30
materials and scripts: https://github.com/bodowinter/canadian_vowel_shift_analysis -
Kirby, J., & Sonderegger, M. (2018). Mixed-effects design analysis for experimental phonetics. Journal of Phonetics, 70, 70-85.
Random Forests
- The Random Forest algorithm
Online R and Statistics textbooks
See also my Statistics Reading List
R vs. Python: Free books etc.
http://ucanalytics.com/blogs/r-vs-python-comparison-and-awsome-books-free-pdfs-to-learn-them/
- Baayen, R. H. (2008). Analyzing linguistic data. Cambridge, UK: Cambridge University Press
http://www.langtoninfo.com/web_content/9780521709187_frontmatter.pdf - Teetor, P. (2011). R cookbook. Sebastopol, CA: O’Reilly Media, Inc.
http://128.95.149.81/trilobite/sr320_labnotebook_060113.enex/Cookbook%20for%20R.resources/r_cookbook.pdf - Vasishth, S. & Nicenboim, B. (2016). Statistical methods for linguistic research: Foundational Ideas – Part I. University of Postdam
- Nicenboim, B. & Vasishth, S.(2016). Statistical methods for linguistic research: Foundational Ideas – Part II. University of Postdam
- Crump, M. J. C. (2020): Reproducible statistics for psychologists with Rhttps://crumplab.github.io/rstatsforpsych/index.html
- R-Bloggers: a list of books about R, with a 2017 addition: https://www.r-bloggers.com/books-i-liked-in-2017-by-ellis2013nz/
You can also look at my Pinterest or teaching material list, but first – how about some cooking with R?
Pingback: Updates: Corpora, Statistics and R links | Experimental Linguistics in the Field