Functions

Multi-threading support

The following functions will try to use multiple threads if possible when there are at least 2 columns and 1 million rows:

  • CleanTable constructor when copycols=true
  • All compact functions
  • delete_const_columns, delete_const_columns! and delete_const_columns_ROT
  • reinfer_schema, reinfer_schema! andreinfer_schema_ROT
  • get_all_repeated
  • level_distribution

Index

Summarize information

Cleaner.sizeFunction
size(table::CleanTable)

Returns a tuple containing the number of rows and columns of the given CleanTable.

Cleaner.get_all_repeatedFunction
get_all_repeated(table, columns::Vector{Symbol})

Returns a CleanTable with row indexes containing only the selected columns and only the rows that were repeated.

Cleaner.category_distributionFunction
category_distribution(table, columns::Vector{Symbol}; round_digits=1, bottom_prct=0, top_prct=0)

Returns a CleanTable only taking into account the selected columns and containing unique rows and the percentage they represent out of the total rows. The percentage is rounded with up to round_digits. bottom_prct can be specified to have the least represented categories up to bottom_prct percentage become Bottom_other. top_prct can be specified to have the most represented categories up to top_prct percentage become Top_other.

Cleaner.compare_table_columnsFunction
compare_table_columns(tables...; dupe_sanitize=true)

Returns a CleanTable comparing all column names and column types from the tables passed. By default sanitizes duplicated column names when found in the same table but the keyword argument dupe_sanitize=false can be passed to opt-out on this behavior.

Working with column names

Cleaner.renameFunction
rename(table, names::Vector{Symbol})

Creates a CleanTable with copied columns and changes its column names to be names.

Cleaner.rename!Function
rename!(ct::CleanTable, names::Vector{Symbol})

Changes in-place the column names of a CleanTable to be names.

Cleaner.rename_ROTFunction
rename_ROT(table, names::Vector{Symbol})

Returns a new table of the original table type where its column names have been changed to be names.

Cleaner.generate_polished_namesFunction
generate_polished_names(names; style::Symbol=:snake_case)

Return a vector of symbols containing new names that are unique and formated using the style selected.

Cleaner.polish_namesFunction
polish_names(table; style=:snake_case)

Create and return a CleanTable with copied columns having column names replaced to be unique and formated using the style selected.

Styles

  • snake_case
  • camelCase
Cleaner.polish_names!Function
polish_names!(table::CleanTable; style::Symbol=:snake_case)

Return a CleanTable where column names have been replaced to be unique and formated using the style selected.

Styles

  • snake_case
  • camelCase
Cleaner.polish_names_ROTFunction
polish_names_ROT(table; style::Symbol=:snake_case)

Returns a new table of the original table type where column names have been replaced to be unique and formated using the style selected.

Styles

  • snake_case
  • camelCase
Cleaner.row_as_namesFunction
row_as_names(table, i::Int; remove::Bool=true)

Creates a CleanTable with copied columns and renames the table using row i as new names and removes in-place all the rows above row i if remove=true.

Default behavior is to remove rows above row i.

Cleaner.row_as_names!Function
row_as_names!(table::CleanTable, i::Int; remove::Bool=true)

Renames the table using row i as new names and removes in-place all the rows above row i if remove=true.

Default behavior is to remove rows above row i.

Cleaner.row_as_names_ROTFunction
row_as_names_ROT(table, i::Int; remove::Bool=true)

Returns a new table of the original table type that has been renamed using row i as new names and removes in-place all the rows above row i if remove=true.

Row/Column removal

Cleaner.compact_columnsFunction
compact_columns(table; empty_values::Vector=[])

Creates a CleanTable with copied columns and removes from it all columns filled entirely by missing and empty_values.

Cleaner.compact_columns!Function
compact_columns!(table::CleanTable; empty_values::Vector=[])

Removes in-place from a CleanTable all columns filled entirely by missing and empty_values.

Cleaner.compact_columns_ROTFunction
compact_columns_ROT(table; empty_values::Vector=[])

Returns a new table of the original table type where all columns filled entirely by missing and empty_values have been removed.

Cleaner.compact_rowsFunction
compact_rows(table; empty_values::Vector=[])

Creates a CleanTable with copied columns and removes from it all rows filled entirely by missing and empty_values.

Cleaner.compact_rows!Function
compact_rows!(table::CleanTable; empty_values::Vector=[])

Removes in-place from a CleanTable all rows filled entirely by missing and empty_values.

Cleaner.compact_rows_ROTFunction
compact_rows_ROT(table; empty_values::Vector=[])

Returns a new table of the original table type where all rows filled entirely by missing and empty_values have been removed.

Cleaner.compact_tableFunction
compact_table(table; empty_values::Vector=[])

Creates a CleanTable with copied columns and removes from it all rows and columns filled entirely by missing and empty_values.

Cleaner.compact_table!Function
compact_table!(table::CleanTable; empty_values::Vector=[])

Removes in-place from a CleanTable all rows and columns filled entirely by missing and empty_values.

Cleaner.compact_table_ROTFunction
compact_table_ROT(table; empty_values::Vector=[])

Returns a new table of the original table type where all rows and columns filled entirely by missing and empty_values have been removed.

Cleaner.delete_const_columnsFunction
delete_const_columns(table)

Creates a CleanTable with copied columns and removes each column filled with just a constant value.

Cleaner.delete_const_columns!Function
delete_const_columns!(table::CleanTable)

Removes in-place from a CleanTable each column filled with just a constant value.

Cleaner.delete_const_columns_ROTFunction
delete_const_columns_ROT(table)

Returns a new table of the original table type where all columns filled with just a constant value have been removed.

Cleaner.drop_missingFunction
drop_missing(table; missing_values::Vector=[])

Creates a CleanTable with copied columns and removes from it all rows where missing or missing_values have been found.

Cleaner.drop_missing!Function
drop_missing!(table::CleanTable; missing_values::Vector=[])

Removes in-place from a CleanTable all rows where missing or missing_values have been found.

Cleaner.drop_missing_ROTFunction
drop_missing_ROT(table; missing_values::Vector=[])

Returns a new table of the original table type where all rows where missing or missing_values have been found were removed.

Modifiying table schema

Cleaner.reinfer_schemaFunction
reinfer_schema(table; max_types::Int=3)

Creates a CleanTable with copied columns and tries to minimize the amount of element types for each column without making the column type Any.

For this, will try to make the column of type Union with up to maxtypes and internally use `Base.promotetypejoin` on all numeric types. If not possible, leaves the column as-is.

Cleaner.reinfer_schema!Function
reinfer_schema!(table::CleanTable; max_types::Int=3)

Tries to minimize the amount of element types for each column without making the column type Any.

For this, will try to make the column of type Union with up to max_types and internally use Base.promote_typejoin on all numeric types. If not possible, leaves the column as-is.

Cleaner.reinfer_schema_ROTFunction
reinfer_schema_ROT(table; max_types::Int=3)

Returns a new table of the original table type where it has been tried to minimize the amount of element types for each column without making the column type Any.

For this, will try to make the column of type Union with up to maxtypes and internally use `Base.promotetypejoin` on all numeric types. If not possible, leaves the column as-is.

Cleaner.add_indexFunction
add_index(table)

Creates a CleanTable with copied columns and adds to it a new column being the row index for the table passed.

Cleaner.add_index!Function
add_index!(table::CleanTable)

Adds in-place a column being the row index for the CleanTable table.

Cleaner.add_index_ROTFunction
add_index_ROT(table)

Returns a new table of the original table type where a new column being the row index for the table passed have been added.