Import contacts from CSV #350
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The big-box address-book providers (Outlook, Google, Apple) all support importing contacts from CSV files. Khard does not support this, nor to my knowledge does any console address-book software. Because contact data is often distributed in CSV form, it is useful to be able to import contacts from CSV, and it's practically indispensable if you need to manage a large number of contacts. For example, I'm a teacher at the university level, and my students' contact info (most importantly, their emails, to which I frequently need to send group messages) is given to me by the University in CSV format. Because I usually have fifty students at a time, it is not feasible to to import that many contacts into khard without some kind of scripting solution. Since I was using khard's API anyway, I decided to have a go at implementing the feature. The feature was non-trivial to implement, so apologies in advance for the length of this PR. Thanks for reviewing it, and for your work on khard!
CLI
The new feature is implemented via a new subcommand,
import(csvis also provided as an alias).khard importis designed to be as consistent withkhard newas possible.khard importtakes the same options askhard new:-a,--addressbookspecifies the address book into which the contacts should be imported. The user is asked to specify an address book if this option is not supplied.-i,--input-filespecifies the CSV file from which the contacts should be imported (stdin by default).--open-editor,--editgives the user the option to review/edit the contact after the successful creation of each contact (not unlike Apple Contacts, which asks you to review each imported contact).--vcard-versionis the same as forkhard new.khard importtakes one additional option,-dor--delimiter, which allows you to specify what field delimiter is used in your CSV file (',' by default).Like
khard new, if no input is supplied tokhard import, the user's text editor is opened to edit a temporary file containing a template -- in this case, a CSV template. I don't think there's really a use case for this, since CSV files are not very user-friendly to edit in a text editor. But I thought it was important both for consistency withkhard newand also with UNIX tools in general -- if you don't provide any input tocat, for example, it will just hang, because anything you type after executing the command still counts as stdin. So even though opening a CSV template when the user fails to supply any input is not very useful, it at least will be unsurprising to the user.I modified
khard templateto be able to show the CSV template if-cor--csvis passed to it. It still shows the YAML template by default (or, superfluously, with the-yor--yamloption), so this change doesn't break anything. (This functionality could be assigned to a subcommand other thantemplate, but I thought it was neater to fold it intotemplate.)Implementation
There are two main obstacles to implementing the ability to import contacts from CSV. The first is that CSV is a (very) simple data format, and that contacts are complex (evidenced by the fact that khard models them with YAML, which is a complex data format). The solution to this is to specify a clear standard for validly formatting CSV files. Fortunately, Google (https://support.google.com/contacts/answer/15147365?hl=en-GB&co=GENIE.Platform%3DDesktop#zippy=%2Cuse-a-template-spreadsheet-to-create-a-csv-file-to-import) and Outlook (https://support.microsoft.com/en-us/office/create-or-edit-csv-files-to-import-into-outlook-4518d70d-8fe9-46ad-94fa-1494247193c7) provide some clues as to how to specify a standard in a way that might be more or less familiar to people.
The details of this are discussed in a compressed and dry way in the API documentation, and will need to be spelled out in a more friendly way in user-facing documentation (which I am happy to write if you are interested in merging this PR). Here's a quick overview of how column headers need to be specified in order to get certain data structures:
To get something equivalent to the YAML structure
First name: Bruce, the column header should be 'First name' (where 'Bruce' is a value, in that column, in one of the subsequent rows of the CSV file).To get something equivalent to the YAML structure
'Justice League' should be in a column named 'Organisation 1' and 'Wayne Enterprises should be in a column named 'Organisation 2'.
To get something equivalent to the YAML structure
'work' should be in a column named 'Email 1 - type', and 'thebat@justice.org' should be in a column named 'Email 1 - value', in the same row. In the same row, 'home' should be in a column named 'Email 2 - type', and 'bruce@gmail.com' should be in a column named 'Email 2 - value', also in the same row.
The same idea as in 3 applies to addresses. To get something equivalent to the YAML structure
'home' should be in a column named 'Address 1 - type', '1007 Mountain Drive' should be in a column named 'Address 1 - Street', 'Gotham City' should be in a column named 'Address 1 - City', and 'USA' should be in a column named 'Address 1 - Country'.
Finally, in structures like those in 3 and 4, lists are supported. To get something equivalent to the YAML structure
'work' should be in a column named 'Email 1 - type' and in a column named 'Email 2 - type'. 'thebat@justice.org' should be in a column named 'Email 1 - value', and 'bruce@wayne.com' should be in a column named 'Email 2 - value'.
Note that the numbers ('Email 1', 'Email 2') are more or less arbitrary; they are just meant to say that different values are associated with each other (which CSV is not capable of conveying on its own). But the numbers need to start with 1, and they need to be in a sequence. For example, if there's an 'Email 3', there must also be an 'Email 2' and 'Email 1'. (The order that they're presented in the CSV file doesn't matter, though.)
Although the above standard is straightforward enough, processing the data into a form that
khard.YAMLEditable.update()can read (to avoid any duplication of the contact-creation logic that's already performed by that method) can get messy fast. To keep things organized, I ended up implementing the CSV parsing code as a separate module. (The specifics are documented, in hopefully a clear way, within the module.)The second obstacle is that khard's API (both public and private) doesn't readily support creating Contacts out of anything but YAML input. So, although in general I aimed for my PR to only add things, and not change things, there were a couple of places where I had to pry the API open a bit to allow Contacts to be created from the data returned by the CSV parser. The alternative would have been to convert the CSV parser data into YAML, just to be converted back into the same dict -- which seemed pretty silly. However, none of the changes I made were breaking, in the sense that no one who calls into the API will need to make any changes to their code on account of the changes I made.
In particular:
khard.YAMLEditable._parse_yaml()method, so that I could use the same validation logic when creating a new Contact from a dict.khard.YAMLEditable.update()to be either a string or a dictionary. If it's a string, it's parsed as YAML input just like it was before. If it's a dictionary, the dictionary is validated, and then Contact creation proceeds as normal.khard.Contact.from_dict().khard.helpers.interactive.Editor.write_temp_file().Again, these changes should not be noticed.
Tests
The PR includes tests verifying that YAML files and a CSV file containing the same data produce equivalent contacts. The tests also verify that the order of columns in the CSV file does not change the validity of the final result.
To-do
If you are interested in merging this, the remaining tasks I can think of are
importsubcommand.I'm happy to do all of these, of course.
I hope this PR can be helpful. Thanks again for your work on khard!