- Add overview in package documentation.
- Add examples to all functions.
- Explicitly document return values to
ml_fit().
- Set default maximum number of iterations for HIPF to 2000.
- Convert documentation to Markdown.
- Add "Driven by" and "Related work" sections to the README.
- Use a sparse matrix for the flattened reference sample.
- Status messages with
verbose = TRUEare prepended with a time stamp. - Fail if
NAgroup ID found.
- Reorganized and renamed internal datasets.
- Fitting result contains
iterationsandtolmembers (#28). - Fixed model matrix of "separate" type if only grand totals are given.
ml_fit()gainstolargument, which determines the success of a fitting operation.ml_fitobjects have new memberssuccess,rel_residuals, andflat_weighted_values(#28).- HIPF and IPU stop iterating if tolerance is reached.
- IPU and HIPF abort iteration when the weights do not change measurably between two iterations (#27).
-
Features
- New algoritms: HIPF (#2) and IPU.
-
Interface
- New
as.flat_ml_fit_problem()is used to coerce input for theml_fit_functions. format()andprint()methods for classesfitting_problem,flat_ml_fit_problemandml_fit.- Flattened reference sample now contains observations in rows, and controls in columns (#26).
flatten_ml_fit_problem()gains newmodel_matrix_typeargument that allows selecting an alternative model matrix building method where all cross-classifications are allocated to a column, regardless of overlaps. Flattened problems store the type of model matrix used, it is also shown with theformat()andprint()methods.
- New
-
Improvements
- Reference sample doesn't need to be ordered by group ID anymore.
- Remove
individualsPerGroupspecial variable. - Allow problems with individual-only controls.
- Check for correspondence of levels between sample and controls.
- Check for
NAvalues in controls.
-
Technical changes
- Use
grakepackage again for calibration, because the alternatives are worse:samplinguses a too low tolerance,surveyforcibly loadsMASS, andlaekencould work but is unrelated (which is the reasongrakehas been started in the first place). - Duplicate rows are kept in the reference sample.
- Rename
control_totalstotarget_values. - New
toy_example()allows easier access to bundled examples, load withreadRDS(). - Move legacy format (IPAF) and related functions to
data-rawdirectory. - Use factors internally.
- Use
-
Performance
- Use
dplyrfunctions instead ofaggregate().
- Use
-
Tests
- Specific test for households with the same signature.
-
Documentation
- Enhance example.
- Include flat example problem (group size = 1 for all groups).
-
Cleanup
- Cleanup and split of
flatten_ml_fit_problem().
- Cleanup and split of
- New functions
compute_margins()andmargins_to_df()for validation - Support specification of prior weights in construction of fitting problems
- Use
survey::grake()instead ofgrake::calibWeights(). - Adapt to change of undocumented behavior in base R.
- Don't alter column names of controls if they are of type
data.table(explicitly convert todata.frame) - Proper handling of corner cases (reference sample with one row, and grand total controls and dummy controls with only one category)
- Allow character variables (in addition to factors) as control variables
- Explicit error message if reference sample is not sorted
- If name of count column in controls is not specified, it is determined automatically (with a message in verbose mode)
- Expansion of weights loads
Matrixpackage if necessary - Clarify documentation
- Straighten out imports, use
importFrominstead of::
- new functions
fitting_problem,is.fitting_problem,special_field_names - all fitting functions now expect an object of class
fitting_problem(as returned by thefitting_problemandimport_IPAF_problemfunctions); former calls likeml_fit(ref_sample, controls, field_names)now need to be written asml_fit(fitting_problem(...))
- use
grakepackage instead oflaeken - new argument
ginvtoml_fit_dss, passed down to calibWeights
- fix example for
ml_fit_dss
-
new function
ml_fit_dsswith an implementation very close to the paper by Deville et al. (1993); implementation in thelaekenpackage -
normalize weights to get rid of precision problems
-
allow partly uncontrolled attributes and controls without observations in the reference sample (with a warning, #24)
-
better error reporting for non-factor controls and existence of group ID column
-
improve warning and progress messages
-
return correct weights -- regression introduced in v0.0.9
-
rewrite transformation of weights using sparse matrices and a home-grown Moore-Penrose inverse for our (very special) transformation matrix (#17)
-
warn on missing observations for nonzero controls (#20)
-
ml_fit_entropy_oalso returns flat weights -
allow arbitrary order in control total tables (#19)
-
remove observations that correspond to zero-valued control totals, with warning; don't warn if no corresponding observations need to be removed (#16)
-
support multiple controls at individual or group level, also detect conflicting control totals
-
support fitting one-dimensional problems (where only group-level controls are given)
-
new function
flatten_ml_fit_problem: transform representation as returned byimport_IPAF_resultinto a matrix, a control vector and a weights vector -
function
ml_fit_entropy_o: useBB::dfsaneinstead ofBB::BBsolvefor solving the optimization problem; rename argumentBBsolve_argstodfsane_args -
function
ml_fit: new parameterverbose -
aggregate identical household types, implement prior weights (so far only internally)
-
Add example for
ml_fit(#11) -
allow additional arguments for the algorithms;
ml_fit_entropy_onow accepts a named listBBsolve_argsthat contains additional arguments toBB::BBsolve -
Faster internal data preparation for
ml_fit_entropy_o
-
Fix dependency issues (#13, #14)
-
Add example for
ml_fit_entropy_o(#11) -
Print more helpful error message if control totals and reference sample categories do not overlap (#11)
import_IPAF_resultsnow returns a class of typeIPAF_results- New functions
ml_ipfandml_ipf_entropy_o, implementation does not yet return the same weights as the Python code - Convert control columns to factors
-
Fix importing configuration files with more than one control of any type and with comments in the control definition
-
New parameter
config_nametoimport, defaults toconfig.xml
- Parameter
all_weightstoimportthat allows importing also intermediate weights. The output format ofimporthas changed, the weights for each algorithm are now always a list of weight vectors, even in the default caseall_weights == FALSE(#5).
- Import results of old Python code (#1).
- Initial setup