Interview CSV Filter

This R script takes in a CSV file with M-protein test results, adds a column converting all concentrations to g/dL, and then marks which data is usable and which is unusable. It outputs two separate CSV files:

[filename]_flagged.csv also adds a column called error_flag which contains TRUE or FALSE values. A TRUE value in error_flag means the data is unusable.
[filename]_filtered.csv has all unusable data completely removed from the file.

It is simple to run. Either download both files and run in RStudio, or you can run it directly from the command line with:

Rscript sort_valid_ehr_results.R

It has the filepath for the CSV file sent to me hard-coded, but if you would like to use a different CSV file, just pass it in as the first argument from the command line like this:

Rscript sort_valid_ehr_results.R [your_file].csv

How it Works

There are a few steps outlined in comments in the code:

Read the CSV file into a data frame.
Add a column for the error flag.
Flag rows that have an empty value in necessary rows (which you can specify/change in the script).
Flag rows which have a negative concentration.
Add another row to store all concentrations in g/dL, then convert the values.
Flag any test result higher than the upper bound, the "absurd" cutoff*.
Create the two output CSV files.

Upper Bound

The upper bound is tricky. Ideally I would have a database of known m-protein tests to use as a reference, but in the absence of said database I did my best. I calculated an upper bound with a desire to keep all potential diagnoses of multiple myeloma, while rejecting inaccurately high data.

The cutoff for a M.M. diagnosis is 3 g/dL
The total weight of blood is ~100 g/dL We can filter for just the data between those points and set the cutoff at the 90th percentile: 24 g/dL. This number also makes some intuitive sense as the combined weight of all protein in the plasma is 6-8 g/dL, so without knowing anything else I assume that cancer could potentially push that up 3-4x.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
bquxjob_6307f8ce_1905abc1f18.csv		bquxjob_6307f8ce_1905abc1f18.csv
sort_valid_ehr_results.R		sort_valid_ehr_results.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Interview CSV Filter

How it Works

Upper Bound

About

Uh oh!

Releases

Packages

Languages

BeeTeeBeats/HealthTreeInterviewCoding

Folders and files

Latest commit

History

Repository files navigation

Interview CSV Filter

How it Works

Upper Bound

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages