|
| 1 | +# MS Excel compatible formats. |
| 2 | + |
| 3 | +ROAPI supports loading a few Microsoft Excel compatible formats like xls, xlsx, xlsb, ods. |
| 4 | + |
| 5 | +## Configuration |
| 6 | +To load MS Excel compatible files the config should be specified like: |
| 7 | +```yaml |
| 8 | +tables: |
| 9 | + - name: "<table name>" |
| 10 | + uri: "<files path>" |
| 11 | + option: |
| 12 | + format: "<file format>" |
| 13 | + sheet_name: "Sheet1" |
| 14 | + rows_range_start: 2 |
| 15 | + rows_range_end: 5 |
| 16 | + columns_range_start: 1 |
| 17 | + columns_range_end: 6 |
| 18 | + schema_inference_lines: 3 |
| 19 | +``` |
| 20 | +* **format** - name of file format. Currently supported files format: |
| 21 | + * xls (Microsoft Excel 5.0/95 Workbook) |
| 22 | + * xlsx (Excel Workbook) |
| 23 | + * xlsb (Excel Binary Workbook) |
| 24 | + * ods (OpenDocument Spreadsheet) |
| 25 | +* **sheet_name** - the name of the spread sheet with table data. By default, most files initially use Sheet1 as the `sheet_name`. Be sure to change this `sheet_name` as needed if your spreadsheet uses a different name. |
| 26 | + |
| 27 | +If no `sheet_name` is specified, ROAPI will use first spreadsheet. |
| 28 | +* **Table range options** |
| 29 | + * **rows_range_start** - the first row of the table. It contains column names. By default, `rows_range_start` is 0 (the first raw in spreadsheet) |
| 30 | + * **rows_range_end** - the last row of the table. By default, ROAPI reads all data. |
| 31 | + * **columns_range_start** - the column of the table. By default, `columns_range_start` is 0 (first column in spreadsheet) |
| 32 | + * **columns_range_end** - the last column of the table. By default, ROAPI reads all columns. |
| 33 | +For example, to take only selected data: |
| 34 | +  |
| 35 | + the config file looks like: |
| 36 | +```yaml |
| 37 | +tables: |
| 38 | + - name: "<table name>" |
| 39 | + uri: "<files path>" |
| 40 | + option: |
| 41 | + format: "<file format>" |
| 42 | + sheet_name: "Sheet1" |
| 43 | + rows_range_start: 1 |
| 44 | + rows_range_end: 4 |
| 45 | + columns_range_start: 1 |
| 46 | + columns_range_end: 3 |
| 47 | +``` |
| 48 | +* **schema_inference_lines** - the number of rows (inside table range) to use in schema inference. This number includes the row with column names, so, for example, `schema_inference_lines: 3` means ROAPI will use first row for column names inference and 2 rows for column types inference. If this option is not specified then ROAPI reads all rows for column data types inference. |
| 49 | + |
| 50 | +## Schema inference. |
| 51 | +ROAPI can infer schema of data automatically. The first row of data range is a row with column names. After column names inference ROAPI will infer data types by scanning all remaining rows or limited number of rows specified in `schema_inference_lines` option. |
| 52 | +If column contains more than one data type (for exaple, float and int) then ROAPI use Utf8 datatype. |
| 53 | + |
| 54 | +Also, it is possible to specify schema in configuration file. This allows to avoid schema inference from data and loading of table will be faster. |
| 55 | + |
| 56 | +```yaml |
| 57 | +tables: |
| 58 | + - name: "excel_table" |
| 59 | + uri: "path/to/file.xlsx" |
| 60 | + option: |
| 61 | + format: "xlsx" |
| 62 | + schema: |
| 63 | + columns: |
| 64 | + - name: "int_column" |
| 65 | + data_type: "Int64" |
| 66 | + nullable: true |
| 67 | + - name: "string_column" |
| 68 | + data_type: "Utf8" |
| 69 | + nullable: true |
| 70 | + - name: "float_column" |
| 71 | + data_type: "Float64" |
| 72 | + nullable: true |
| 73 | + - name: "datetime_column" |
| 74 | + data_type: !Timestamp [Seconds, null] |
| 75 | + nullable: true |
| 76 | + - name: "duration_column" |
| 77 | + data_type: !Duration Second |
| 78 | + nullable: true |
| 79 | + - name: "date32_column" |
| 80 | + data_type: Date32 |
| 81 | + nullable: true |
| 82 | + - name: "date64_column" |
| 83 | + data_type: Date64 |
| 84 | + nullable: true |
| 85 | + - name: "null_column" |
| 86 | + data_type: Null |
| 87 | + nullable: true |
| 88 | +``` |
0 commit comments