You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: py-src/data_formulator/agents/agent_data_clean.py
+10-2Lines changed: 10 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -45,6 +45,8 @@
45
45
- the csv table should have the same number of cells for each line, according to the title. If there are some rows with missing values, patch them with empty cells.
46
46
- if the raw data has some rows that do not belong to the table, also remove them (e.g., subtitles in between rows)
47
47
- if the header row misses some columns, add their corresponding column names. E.g., when the header doesn't have an index column, but every row has an index value, add the missing column header.
48
+
* clean up messy column names:
49
+
- if the column name contains special characters like "*", "?", "#", "." remove them.
48
50
* clean up columns with messy information
49
51
- if a column is number but some cells has annotations like "*" "?" or brackets, clean them up.
50
52
- if a column is number but has units like ($, %, s), convert them to number (make sure unit conversion is correct when multiple units exist like minute and second) and include unit in the header.
0 commit comments