Below, you will find download info, tutorials, readings and links to other useful websites – but first: an analogy that I found helpful in my teaching, for instance when students forget to load a package after installing it, forget to check data sets for typos etc. before carrying out analyses, …
P-Values, Replication, Reproducibility, and Open Science
Currently, there is an ongoing debate about the way in which we can avoid abuse of statistics and ensure that research is as unbiased as possible. You can find some discussion in a recent articles and more references and links on another pages of the Experimentalfieldlinguistics-Blog:
- Amrhein V, Greenland S, MsShane B. 2019 Retire statistical significance. Nature 567, 305–307.(doi:10.1038/d41586-019-00857-9)
- Colquhoun, D. (2017). The reproducibility of research and the misinterpretation of p-values. Royal society open science, 4(12), 171085.
- Gigerenzer, G. (2018). Statistical rituals: The replication delusion and how we got there. Advances in Methods and Practices in Psychological Science, 1(2), 198-218.
- The Open Science Page on this blog, with further readings and links to relevant webpages: https://experimentalfieldlinguistics.wordpress.com/open-science/
R(studio): Downloads and Info
- R: http://www.r-project.org/
- R Studio: http://www.rstudio.com/
- Wikipedia article about R http://en.wikipedia.org/wiki/R_(programming_language)
Resource Sites, Blogs, and Groups
- R Seek: targeted searches for R resources and functions http://rseek.org/
- R-statistics Net: An educational resource for all things related to R language and its applications in advanced statistical computing and machine learning. http://rstatistics.net/
- Quick R / Statsmethods Net: a resource site aimed at people with some statistical background who want to learn R http://www.statmethods.net/
- R-Bloggers: http://www.r-bloggers.com/
- Data Science Central: an online resource for big data practitioners http://www.datasciencecentral.com/
- Shravan Vasishth’s website: http://www.ling.uni-potsdam.de/~vasishth/
- Statistics 545: http://stat545-ubc.github.io/index.html
- Data Sharkie blog: https://datasharkie.com
MOOCs, Youtube, Webinars
- Youtube Channels and Playlists:
- Data Camp: https://www.youtube.com/channel/UC79Gv3mYp6zKiSwYemEik9A/featured
- MarinStatsLectures https://www.youtube.com/user/marinstatlectures/featured
- R-Programming: https://www.youtube.com/user/pradeeppandu
- LearnR https://www.youtube.com/user/TheLearnR
- The New Boston https://www.youtube.com/playlist?list=PL6gx4Cwl9DGCzVMGCPi1kwvABu7eWv08P
- Christoph Scherber
- Data Science Central Webinars: http://www.datasciencecentral.com/video/video/listFeatured
- Data Camp: https://www.datacamp.com/
- Searchable MOOC lists:
General Introduction, Cheat Sheets, and Overview
- RStudio Cheat Sheets: https://www.rstudio.com/resources/cheatsheets/
- Datascience Cheat Sheet (including info about data formats, tools, tutorial links, etc.) https://www.datasciencecentral.com/profiles/blogs/20-cheat-sheets-python-ml-data-sciencet
- Intro Tutorial: http://rstatistics.net/r-tutorial-exercise-for-beginners/
- List of Tutorials:
- R-Bloggers: a list of books about R
- Absolute Beginners’ Guide to R: https://ajwills72.github.io/rminr/
Useful Packages for Linguistics & Psychology
- Tidyverse packages (Hadley Wickham): https://www.tidyverse.org/packages/
note: the tidyverse is based on the following principles: Each variable is a column; each observation is a row, and each type of observational unit is a table. For instance:
- dplyr for work with dataframes
Introduction to dplyr: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
- ggplot2 for graphics. manual: https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf
- tidyr for tidying up data, data wrangling: https://www.youtube.com/watch?v=RbUWwuJeUC8
- stringr for work with strings (very useful for corpus work and work on production experiment results, for more info, see below)
- dplyr for work with dataframes
- List of useful packages:
- languageR (with psycholinguistic data sets): https://cran.r-project.org/web/packages/languageR/languageR.pdf
- PraatR, a package for controlling Praat: http://allthingslinguistic.com/post/103840914592/praatr-an-r-package-for-controlling-praat
- The childes-db project is an open database storing data from the Child Language Database (CHILDES) in an easily accessible, tabular format. Researchers can interface with CHILDES through interactive visualizations or the childesr R package: http://childes-db.stanford.edu/. For some worked examples, see this publication.
- lme4 for mixed effects regression models (for more info, see below)
- prettyR or gmodels (crosstabs, etc. for people with SPSS withdrawal symptoms)
- for importing data: foreign, readr.
Working with Strings and Regular Expressions (RegEx)
- Regular Expression Tutorial: http://www.regular-expressions.info/tutorial.html
- Regular Expressions with The R Language http://www.regular-expressions.info/rlanguage.html
- Handling and Processing Strings in R (Gaston Sanchez):http://gastonsanchez.com/Handling_and_Processing_Strings_in_R.pdf
- Introduction to String Matching and Modification Using R and Regular Expressions (Svetlana Eden): http://biostat.mc.vanderbilt.edu/wiki/pub/Main/SvetlanaEdenRFiles/regExprTalk.pdf
- stringr for string location and manipulation
Importing and Exploring Data
- see info about tidyverse above
- This shows you how to import a data set in txt-format and gives hints for other formats. It also demonstrates how you can use notepad to create a txt-file with data in colums that can be easily imported into R: https://www.r-bloggers.com/importing-data-into-r/
- This is a really comprehensive guide to data import (with a link to further tutorials, especially for xls):
- Comprehensive Guide For Data Exploration in R | R Tutorial | Learn R
- An introduction to sorting, merging, etc. : http://www.r-bloggers.com/working-with-the-data-frame-in-r/
Saving and Exporting
- In RStudio, you should create a project and save it when you leave RStudio (you will be asked whether you want to save). This will save your workspace and keep the objects that you have created (e.g. through data import). It will also save your history (your list of commands, which you can see in the “history” window). For more info about projects, see: https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects. This tutorial can be helpful: https://www.stat.ubc.ca/~jenny/STAT545A/block01_basicsWorkspaceWorkingDirProject.html#workspace-and-working-directory
- In R, you can save the console your commands and the output as a text-file quite straightforwardly, using save file.
- in RStudio, things are a bit more complicated:
- You can copy and paste the console content to a text editor and save it there.
- You can use the save option in your history window for your list of commands; and you can use “sink” for your outputs, e.g.
> sink("sink-examp.txt") > 3+4 > sink()
This will create a text file with your output. For this example, this is a single line ( 7), but it could also be a list of lines or a table etc.
- You can use Markdown to publish your data: https://rmarkdown.rstudio.com/.
Descriptive Statistics and Basic Test Statistics
- See info about tidyverse above
- Descriptive Statistics: http://www.statmethods.net/stats/descriptives.html
- Frequencies and Crosstabs: http://www.statmethods.net/stats/frequencies.html
- The “apply-family” of grouping functions:
- Basic Statistics: http://rstatistics.net/statistical-tests-with-r/
- Basic Statistics: http://www.statmethods.net/stats/
(Mixed Effects) Regression Models
- Bodo Winter Mixed Effects Regression tutorials:
- Cunnings, I. (2012). An overview of mixed-effects statistical models for second language researchers. Second Language Research, 28(3), 369-382.
- Journal of Memory and Language, special issue, 2008
- Slopes: Barr, D.J., Levy, R., Scheepers, C., & Tily, H.J. (2013) Random effects structure for confirmatory hypothesis testing: keep it maximal. Journal of Memory and Language, 68(3), 255-278.
- Dale Barr’s Funfact: Functions for planning and analyzing factorial experiments in R https://github.com/dalejbarr/funfact
- J. Stanley’s Tutorial: http://joeystanley.com/blog/making-vowel-plots-in-r-part-1
- Thomas Kettig and Bodo Winter’s Canadian vowel shift analysis:
paper (2017): https://www.cambridge.org/core/journals/language-variation-and-change/article/producing-and-perceiving-the-canadian-vowel-shift-evidence-from-a-montreal-community/A45A2F348CBC7AA652035F17177AFE30
materials and scripts: https://github.com/bodowinter/canadian_vowel_shift_analysis
Online R and Statistics textbooks
See also my Statistics Reading List
R vs. Python: Free books etc.
- Baayen, R. H. (2008). Analyzing linguistic data. Cambridge, UK: Cambridge University Press
- Teetor, P. (2011). R cookbook. Sebastopol, CA: O’Reilly Media, Inc.
- Vasishth, S. & Nicenboim, B. (2016). Statistical methods for linguistic research: Foundational Ideas – Part I. University of Postdam
- Nicenboim, B. & Vasishth, S.(2016). Statistical methods for linguistic research: Foundational Ideas – Part II. University of Postdam
- R-Bloggers: a list of books about R, with a 2017 addition: https://www.r-bloggers.com/books-i-liked-in-2017-by-ellis2013nz/