Skip to content

Conversation

@rhelder
Copy link

@rhelder rhelder commented Jan 20, 2026

The big-box address-book providers (Outlook, Google, Apple) all support importing contacts from CSV files. Khard does not support this, nor to my knowledge does any console address-book software. Because contact data is often distributed in CSV form, it is useful to be able to import contacts from CSV, and it's practically indispensable if you need to manage a large number of contacts. For example, I'm a teacher at the university level, and my students' contact info (most importantly, their emails, to which I frequently need to send group messages) is given to me by the University in CSV format. Because I usually have fifty students at a time, it is not feasible to to import that many contacts into khard without some kind of scripting solution. Since I was using khard's API anyway, I decided to have a go at implementing the feature. The feature was non-trivial to implement, so apologies in advance for the length of this PR. Thanks for reviewing it, and for your work on khard!

CLI

The new feature is implemented via a new subcommand, import (csv is also provided as an alias). khard import is designed to be as consistent with khard new as possible. khard import takes the same options as khard new:

  1. -a, --addressbook specifies the address book into which the contacts should be imported. The user is asked to specify an address book if this option is not supplied.
  2. -i, --input-file specifies the CSV file from which the contacts should be imported (stdin by default).
  3. --open-editor, --edit gives the user the option to review/edit the contact after the successful creation of each contact (not unlike Apple Contacts, which asks you to review each imported contact).
  4. --vcard-version is the same as for khard new.

khard import takes one additional option, -d or --delimiter, which allows you to specify what field delimiter is used in your CSV file (',' by default).

Like khard new, if no input is supplied to khard import, the user's text editor is opened to edit a temporary file containing a template -- in this case, a CSV template. I don't think there's really a use case for this, since CSV files are not very user-friendly to edit in a text editor. But I thought it was important both for consistency with khard new and also with UNIX tools in general -- if you don't provide any input to cat, for example, it will just hang, because anything you type after executing the command still counts as stdin. So even though opening a CSV template when the user fails to supply any input is not very useful, it at least will be unsurprising to the user.

I modified khard template to be able to show the CSV template if -c or --csv is passed to it. It still shows the YAML template by default (or, superfluously, with the -y or --yaml option), so this change doesn't break anything. (This functionality could be assigned to a subcommand other than template, but I thought it was neater to fold it into template.)

Implementation

There are two main obstacles to implementing the ability to import contacts from CSV. The first is that CSV is a (very) simple data format, and that contacts are complex (evidenced by the fact that khard models them with YAML, which is a complex data format). The solution to this is to specify a clear standard for validly formatting CSV files. Fortunately, Google (https://support.google.com/contacts/answer/15147365?hl=en-GB&co=GENIE.Platform%3DDesktop#zippy=%2Cuse-a-template-spreadsheet-to-create-a-csv-file-to-import) and Outlook (https://support.microsoft.com/en-us/office/create-or-edit-csv-files-to-import-into-outlook-4518d70d-8fe9-46ad-94fa-1494247193c7) provide some clues as to how to specify a standard in a way that might be more or less familiar to people.

The details of this are discussed in a compressed and dry way in the API documentation, and will need to be spelled out in a more friendly way in user-facing documentation (which I am happy to write if you are interested in merging this PR). Here's a quick overview of how column headers need to be specified in order to get certain data structures:

  1. To get something equivalent to the YAML structure First name: Bruce, the column header should be 'First name' (where 'Bruce' is a value, in that column, in one of the subsequent rows of the CSV file).

  2. To get something equivalent to the YAML structure

    Organisation:
        - Justice League
        - Wayne Enterprises
    

    'Justice League' should be in a column named 'Organisation 1' and 'Wayne Enterprises should be in a column named 'Organisation 2'.

  3. To get something equivalent to the YAML structure

    Email:
        work: thebat@justice.org
        home: bruce@gmail.com
    

    'work' should be in a column named 'Email 1 - type', and 'thebat@justice.org' should be in a column named 'Email 1 - value', in the same row. In the same row, 'home' should be in a column named 'Email 2 - type', and 'bruce@gmail.com' should be in a column named 'Email 2 - value', also in the same row.

  4. The same idea as in 3 applies to addresses. To get something equivalent to the YAML structure

    Address:
        home:
            Street: 1007 Mountain Drive
            City: Gotham City
            Country: USA
    

    'home' should be in a column named 'Address 1 - type', '1007 Mountain Drive' should be in a column named 'Address 1 - Street', 'Gotham City' should be in a column named 'Address 1 - City', and 'USA' should be in a column named 'Address 1 - Country'.

  5. Finally, in structures like those in 3 and 4, lists are supported. To get something equivalent to the YAML structure

    Email:
        work:
            - thebat@justice.org
            - bruce@wayne.com
    

    'work' should be in a column named 'Email 1 - type' and in a column named 'Email 2 - type'. 'thebat@justice.org' should be in a column named 'Email 1 - value', and 'bruce@wayne.com' should be in a column named 'Email 2 - value'.

Note that the numbers ('Email 1', 'Email 2') are more or less arbitrary; they are just meant to say that different values are associated with each other (which CSV is not capable of conveying on its own). But the numbers need to start with 1, and they need to be in a sequence. For example, if there's an 'Email 3', there must also be an 'Email 2' and 'Email 1'. (The order that they're presented in the CSV file doesn't matter, though.)

Although the above standard is straightforward enough, processing the data into a form that khard.YAMLEditable.update() can read (to avoid any duplication of the contact-creation logic that's already performed by that method) can get messy fast. To keep things organized, I ended up implementing the CSV parsing code as a separate module. (The specifics are documented, in hopefully a clear way, within the module.)

The second obstacle is that khard's API (both public and private) doesn't readily support creating Contacts out of anything but YAML input. So, although in general I aimed for my PR to only add things, and not change things, there were a couple of places where I had to pry the API open a bit to allow Contacts to be created from the data returned by the CSV parser. The alternative would have been to convert the CSV parser data into YAML, just to be converted back into the same dict -- which seemed pretty silly. However, none of the changes I made were breaking, in the sense that no one who calls into the API will need to make any changes to their code on account of the changes I made.

In particular:

  1. I factored the validation of the data out of the khard.YAMLEditable._parse_yaml() method, so that I could use the same validation logic when creating a new Contact from a dict.
  2. I allowed the argument to khard.YAMLEditable.update() to be either a string or a dictionary. If it's a string, it's parsed as YAML input just like it was before. If it's a dictionary, the dictionary is validated, and then Contact creation proceeds as normal.
  3. I wrapped these changes up in a new public API method, khard.Contact.from_dict().
  4. Finally, I allowed (but did not require) the specification of a suffix other than '.yml' in khard.helpers.interactive.Editor.write_temp_file().

Again, these changes should not be noticed.

Tests

The PR includes tests verifying that YAML files and a CSV file containing the same data produce equivalent contacts. The tests also verify that the order of columns in the CSV file does not change the validity of the final result.

To-do

If you are interested in merging this, the remaining tasks I can think of are

  1. Write user-facing documentation,
  2. And implement command-line completion for the new import subcommand.

I'm happy to do all of these, of course.

I hope this PR can be helpful. Thanks again for your work on khard!

This will make it possible to create contacts from data formats other
than YAML without rendering the data as a YAML string.
This module reads CSV files and returns data that can be read by the
'khard.contacts' module. The module will be used to create new contacts
from CSV.
By default, 'khard template' will still print a YAML template. If the
user passes the '-c' or '--csl' option, though, print a CSV template to
be saved or just for the sake of example.
This is not particularly useful on its own, but it will be used as a
fallback when importing contacts from CSV, in the event that the user
doesn't supply any input (i.e., like 'khard new' does).
Check whether or not YAML files and CSV files containing the same data
produce equivalent Contact objects. Make one of the CSV files 'jumbled'
to show that column order doesn't matter to getting the right result.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant