You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: HISTORY.rst
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,11 +3,15 @@
3
3
History
4
4
=======
5
5
6
+
1.0.10 (2022-01-29)
7
+
-------------------
8
+
* Added encoding and delimiter detection for commands: uniq, select, frequency and headers. Completely rewrote these functions. If options for encoding and delimiter set, they override detected. If not set, detected delimiter and encoding used.
9
+
* Added support of .parquet files to convert to. It's done in a simpliest way using pandas "to_parquet" function.
10
+
6
11
1.0.9 (2022-01-18)
7
12
------------------
8
13
* Added support for CSV and BSON files for "stats" command
9
14
10
-
11
15
1.0.8 (2021-07-14)
12
16
------------------
13
17
* Replaced json with orjson for some operations. Keep looking on performance changes and going to replace or json lib calls to orjson
Field value frequency calculator. Returns frequency table for certain field
179
+
Field value frequency calculator. Returns frequency table for certain field.
180
+
This command autodetects delimiter and encoding of CSV files and encoding of JSON lines files by default. You may override it providng "-d" delimiter and "-e" encoding parameters
180
181
181
182
Get frequencies of values for field *GovSystem* in the list of Russian federal government domains from `govdomains repository <https://github.com/infoculture/govdomains/tree/master/refined>`_
182
183
@@ -192,6 +193,7 @@ Uniq command
192
193
193
194
Returns all unique files of certain field(s). Accepts parameter *fields* with comma separated fields to gets it unique values.
194
195
Provide single field name to get unique values of this field or provide list of fields to get combined unique values.
196
+
This command autodetects delimiter and encoding of CSV files and encoding of JSON lines files by default. You may override it providng "-d" delimiter and "-e" encoding parameters
195
197
196
198
197
199
Returns all unique values of field *regions* in selected JSONl file
@@ -210,7 +212,7 @@ Returns all unique combinations of fields *status* and *regions* in selected JSO
210
212
Convert command
211
213
---------------
212
214
213
-
Converts data from one format to another.
215
+
Converts data from one format to another. Supports most common data files
214
216
Supports conversions:
215
217
216
218
* XML to JSON lines
@@ -221,6 +223,8 @@ Supports conversions:
221
223
* CSV to BSON
222
224
* XLS to BSON
223
225
* JSON lines to CSV
226
+
* CSV to Parquet
227
+
* JSON lines to Parquet
224
228
225
229
Conversion between XML and JSON lines require flag *tagname* with name of tag which should be converted into single JSON record.
Returns fieldnames of the file. Supports CSV, JSON, BSON file types.
262
272
For CSV file it takes first line of the file and for JSON lines and BSON files it processes number of records provided as *limit* parameter with default value 10000.
273
+
This command autodetects delimiter and encoding of CSV files and encoding of JSON lines files by default. You may override it providng "-d" delimiter and "-e" encoding parameters
263
274
264
275
Returns headers of JSON lines file with top 10 000 records (default value)
265
276
@@ -403,4 +414,4 @@ Data types
403
414
JSONl
404
415
-----
405
416
406
-
JSON lines is a replacement to CSV and JSON files, with JSON flexibility and ability to process data line by line, without loading everithing into memory.
417
+
JSON lines is a replacement to CSV and JSON files, with JSON flexibility and ability to process data line by line, without loading everything into memory.
0 commit comments