Sorting columns and adding new variables using setdiff in R

I am a researcher and R novice working with a large dataset consisting of many small excel files to be read together. I have imported these into R using the read_excel functions and have them all in a large table, but am running into issues trying to format the data appropriately for analysis.

As a small description of the dataset, I have different subject IDs who were tested on two different days and who were exposed to different terms in different conditions. Basically, I have a variable "Term", a variable "Condition", and a variable "Origin". The origin variable is a result of using rbindlist(idcol = "Origin", fill=T), so each is a (long) filepath, which I have redacted below except for the identifying information: the first part "SUBXX" represents my subject number and the second part "SOX" represents the day (either 1 or 2) they were tested on. See a small example dataset below:

df <- data.frame(Origin = c("C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB01S01xxxx.xlsx", "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB01S01xxxx.xlsx", "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB01S01xxxx.xlsx",                   "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB01S02xxxx.xlsx", "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB01S02xxxx.xlsx", "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB01S01xxxx.xlsx",                   "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB02S01xxxx.xlsx", "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB02S01xxxx.xlsx", "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB02S01xxxx.xlsx",                       "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB02S02xxxx.xlsx", "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB02S02xxxx.xlsx", "C:/Users/xxxxx/xxxxxxxx/xxxxxxx/xxxx/xxxxxxxxxxxx/xxxxxxxxxxxxxxx/SUB02S02xxxx.xlsx"))df$Term <- c("Owl", "Dog", "Rat", "Fox", "Cat", "Cow", "Dog", "Bug", "Cow","Mouse", "Bat", "Cat")df$Condition <- c("M", "L", "L", "L", "M", "M", "L", "L", "M", "M", "M", "M")> df   Origin(shortened) Term Condition1  SUB01_S01          Owl     M2  SUB01_S01          Dog     L3  SUB01_S01          Rat     L4  SUB01_S02          Fox     L5  SUB01_S02          Cat     M6  SUB01_SO2          Cow     M7  SUB02_S01          Dog     L8  SUB02_S01          Bug     L9  SUB02_S01          Cow     M10 SUB02_S02          Mouse   M11 SUB02_S02          Bat     M12 SUB02_S02          Cat     M

Lastly, I have a list of all possible terms:termList <- c("Cat", "Dog", "Cow", "Rat", "Fox", "Bug", "Owl", "Bat", "Mouse", "Bear")

What I want to do is to 1) order the dataframe so that the terms appear in the same order as the termList, and 2) add the terms that do not appear for each participant, noting their Day as 0 and their condition as "U". Additionally, I want to replace the "origin" column with two separate columns, one containing participant ID and the other containing day.

Desired result:

 ParNum Term Condition   Day1   1   Cat         M      22   1   Dog         L      13   1   Cow         M      24   1   Rat         L      15   1   Fox         L      26   1   Bug         U      07   1   Owl         M      18   1   Bat         U      0 9   1   Mouse       U      010  1   Bear        U      011  1   Cat         M      212  1   Dog         L      113  1   Cow         M      114  1   Rat         U      015  1   Fox         U      016  1   Bug         L      117  1   Owl         U      018  1   Bat         M      219  1   Mouse       M      220  1   Bear        U      0

I am not a CompSci person so I usually build my way up from little problems to the larger ones. Starting small, I tried to use R's inbuilt apply() functions, as well as setdiff, to find the concepts which don't appear for each SubID. The following code:

df%>%  group_by(Origin) %>%  tapply(setdiff(termList, df$Term))

only returned a single 1, which is confusing. Shouldn't setdiff() return a character variable (i.e. whatever term is missing?) Trying the other options lapply() and sapply() both returned the message "object 'Bear' of mode 'function' was not found".

I also attempted a for loop, again by starting small and just trying to find the missing terms for each SubId. The following:

mismatch <- character()for (i in df$Origin) {  mismatch <- setdiff(termList, tbl$origin)}

Returned

[1] "Cat"   "Dog"   "Cow"   "Rat"   "Fox"   "Bug"   "Owl"   "Bat"   "Mouse"[10] "Bear"

But I was expecting a subset of terms for each SubID. Could anyone give any advice?

EDIT: I used the solution proposed by Edward below, namely:

#3 replace the "origin" column with two separate columns, one # containing participant ID and the other containing day.separate_wider_position(df, Origin,                         widths=c(69, ParNum=2, 1, Day=2, 9)) |>   mutate(Term=factor(Term, levels=termList),          Day=as.numeric(Day)) |>#2 add the terms that do not appear for each participant, noting # their Day as 0 and their condition as "U".   complete(ParNum, Term, fill = list(Day=0, Condition="U")) |>#1 order the dataframe so that the terms appear in the same order # as the termList,   arrange(ParNum, Term)

Which works. However, there is one other problem I forgot to mention: in my full dataset, each concept appears twice in the spreadsheet (same condition each time). So the sorted list using the above method doubles any concept which isn't in condition "U", like so:

   ParNum Term    Day Condition<chr>  <fct> <dbl> <chr>     1 1    Cat       0 U         2 1    Dog       1 L 3 1    Dog       1 L        4 1    Cow       0 U         5 1    Rat       1 L 6 1    Rat       1 L         7 1    Fox       2 L   8 1    Fox       2 L       9 1    Bug       0 U        10 1    Owl       1 M 11 1    Owl       1 M       12 1    Bat       0 U        13 1    Mouse     0 U        14 1    Bear      0 U

There is no reason for me to retain these doubles so I'd just like to get rid of them. Is such a thing possible?

Sorting columns and adding new variables using setdiff in R

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112