class documentation
class Preprocessor(object):
Base class for preprocessing.
| Static Method | check |
Check all files in file_list exist or not. |
| Static Method | deduplicate |
Find duplicate files under a subdirectory. |
| Static Method | load |
Load the results from the given files. |
| Method | __init__ |
No summary |
| Method | generate |
Generate the file lists. |
| Method | parse |
Parse a single gnd(xml) file. |
| Method | save |
Save the files to the output directory. |
| Method | save |
Save all the info and data to the pkl file (using pickle). |
| Instance Variable | age |
Undocumented |
| Instance Variable | duplicates |
Undocumented |
| Instance Variable | gender |
Undocumented |
| Instance Variable | logger |
Undocumented |
| Instance Variable | name |
Undocumented |
| Instance Variable | output |
Undocumented |
| Instance Variable | root |
Undocumented |
| Instance Variable | total |
Undocumented |
| Instance Variable | unique |
Undocumented |
| Method | _is |
Check whether the sample is good. |
| Method | _keep |
Keep the latest audiograms. |
| Method | _parse |
Parse the age of the person. |
| Method | _parse |
Parse the time of this examination is done. |
| Method | _parse |
Parse the data of the person. |
| Method | _parse |
Parse the gender of the person. |
| Method | _parse |
Parse the name of the person. |
Parse a single gnd(xml) file.
We use a few _parse_xxx() functions to extract the following information: 1. name 2. gender 3. age 4. create_time, the time of this examination is done. 5. data, report data. 6. good_flag, True if this sample is a good sample
| Parameters | |
path:str | path of the gnd(xml) file |
| Returns | |
Tuple[str, str, float, datetime, dict, bool] | name, gender, age, create_time, data, good_flag |
Keep the latest audiograms.
Some patients may do multiple times audiogram examinations in one gnd file (because of misoperation, system error, etc) We only keep the audiograms of the latest examination.
| Parameters | |
examinations:list | audiograms of all examinations |
| Returns | |
list | audiograms of latest examination |
Parse the time of this examination is done.
| Parameters | |
soup:BeautifulSoup | file content in BeautifulSoup format |
| Returns | |
datetime | examination time match_flag (bool): whether the gender is matched |
Parse the data of the person.
| Parameters | |
soup:BeautifulSoup | file content in BeautifulSoup format |
| Returns | |
dict | person's data match_flag (bool): whether the data is matched |