Skip to content

"tutorial on differential privacy for people who learn best by doing" #930

@mccalluc

Description

@mccalluc

Feedback from Kaitlyn Webb:

I think both a more directed tutorial and more information about what the DP mechanisms actually do to the numbers. For the online tool synthetic data, it would also be helpful if there was more columns available (I collaborate a lot with social scientists, and there are often atleast 10 columns of interest in thier data and most contain sensitive information) and more discussion about:

  • how choosing bounds that don't contain all the data can introduce bias
  • how the number of bins can effect the analyze (i.e. you are discretizing data, while less bins means less noise, less bins can complicate analysis on the synthetic data after the fact and lead to less statistical power)
    what the weights do

I would also recommend an additional tab before download results, where the user can view the original and synthetic data. Some general tables/plots to demonstrate utility (for example histograms comparing the original and synthetic, and general summary statistics table for each column) would be helpful. If there is an easy way to interpret the privacy budget, that would also be useful. That way if the user is can see a little for themselves how changing privacy budgets, number of bins, weights, ect. can effect the utility.

Basically if the online wizard could serve more as a walkthrough tutorial on differential privacy for people who learn best by doing, that would be an incredibly powerful tool.

If we follow-up on this, split into smaller issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    Status

    Pending

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions