For a better schema
Any type would be better than type Any
If you have worked with data in Julia
before, you might have ended up with a column of element type Any
more than once. This usually happens when your column has a mix of different types that can't be promoted or converted to one and other.
Now, the main problem is that element type Any
doesn't gives us any relevant information about what is being stored in our column.
julia> [1, 2.0]
2-element Vector{Float64}:
1.0
2.0
julia> ["1", 2.0]
2-element Vector{Any}:
"1"
2.0
To solve this problem we have the reinfer_schema
, reinfer_schema!
and reinfer_schema_ROT
functions that will try to make the column of type Union
with, by default, up to 3 types stored in Union
while also internally using Base.promote_typejoin
on numeric types to reduce the final amount of numeric types.
The optional keyword argument max_types
can be used to change the maximum amount of types in Union
as, if there would be more than max_types
on the final Union
, this functions just will give up and let the column stay with element type Any
.
julia> ct = CleanTable([:A, :B, :C], [[1, 2, 3, 4], [5, missing, "z", 2.0], ["6", "7", "8", "9"]])
┌─────┬─────────┬─────┐
│ A │ B │ C │
│ Any │ Any │ Any │
├─────┼─────────┼─────┤
│ 1 │ 5 │ 6 │
│ 2 │ missing │ 7 │
│ 3 │ z │ 8 │
│ 4 │ 2.0 │ 9 │
└─────┴─────────┴─────┘
julia> reinfer_schema(ct)
┌───────┬──────────────────┬────────┐
│ A │ B │ C │
│ Int64 │ U{Real, String}? │ String │
├───────┼──────────────────┼────────┤
│ 1 │ 5 │ 6 │
│ 2 │ missing │ 7 │
│ 3 │ z │ 8 │
│ 4 │ 2.0 │ 9 │
└───────┴──────────────────┴────────┘
julia> reinfer_schema(ct; max_types=2)
┌───────┬─────────┬────────┐
│ A │ B │ C │
│ Int64 │ Any │ String │
├───────┼─────────┼────────┤
│ 1 │ 5 │ 6 │
│ 2 │ missing │ 7 │
│ 3 │ z │ 8 │
│ 4 │ 2.0 │ 9 │
└───────┴─────────┴────────┘
Index preferred
For the cases when you might want to add a row index to your table, we have the add_index
, add_index!
and add_index_ROT
functions that will add a row index as the first column of your table.
julia> ct = CleanTable([:A, :B], [[:a, :b, :c], ["x", "y", "z"]])
┌────────┬────────┐
│ A │ B │
│ Symbol │ String │
├────────┼────────┤
│ a │ x │
│ b │ y │
│ c │ z │
└────────┴────────┘
julia> add_index(ct)
┌───────────┬────────┬────────┐
│ row_index │ A │ B │
│ Int64 │ Symbol │ String │
├───────────┼────────┼────────┤
│ 1 │ a │ x │
│ 2 │ b │ y │
│ 3 │ c │ z │
└───────────┴────────┴────────┘