Skip to content

Commit c76be9e

Browse files
committed
cli: add a find-resources command
This seems like a useful debugging tool to help triage issues with our custom resource scanner. The output could be improved. But this seems like a good enough start.
1 parent 8132806 commit c76be9e

5 files changed

Lines changed: 272 additions & 4 deletions

File tree

docs/history.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,10 @@ New Features
4848
* A ``print(*args)`` function is now exposed to Starlark. This function is
4949
documented as a Starlark built-in but isn't provided by the Rust Starlark
5050
implementation by default. So we've implemented it ourselves. (#292)
51+
* The new ``pyoxidizer find-resources`` command can be used to invoke
52+
PyOxidizer's code for scanning files for resources. This command can be
53+
used to debug and triage bugs related to PyOxidizer's custom code for
54+
finding and handling resources.
5155

5256
Bug Fixes
5357
^^^^^^^^^

docs/managing_projects.rst

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,3 +189,54 @@ are governed by the X11, and GPL-3.0 licenses::
189189
can be wrong. They do not constitute a legal promise. Paranoid
190190
individuals may want to double check the license annotations by
191191
verifying with source code distributions, for example.
192+
193+
.. _cli_find_resources:
194+
195+
Debugging Resource Scanning and Identification with ``find-resources``
196+
======================================================================
197+
198+
The ``pyoxidizer find-resources`` command can be used to scan for
199+
resources in a given source and then print information on what's found.
200+
201+
PyOxidizer's packaging functionality scans directories and files and
202+
classifies them as Python resources which can be operated on. See
203+
:ref:`packaging_resource_types`. PyOxidizer's run-time importer/loader
204+
(:ref:`oxidized_importer`) works by reading a pre-built index of known
205+
resources. This all works in contrast to how Python typically works,
206+
which is to put a bunch of files in directories and let the built-in
207+
importer/loader figure it out by dynamically probing for various files.
208+
209+
Because PyOxidizer has introduced structure where it doesn't exist
210+
in Python and because there are many subtle nuances with how files
211+
are classified, there can be bugs in PyOxidizer's resource scanning
212+
code.
213+
214+
The ``pyoxidizer find-resources`` command exists to facilitate
215+
debugging PyOxidizer's resource scanning code.
216+
217+
Simply give the command a path to a directory or Python wheel archive
218+
and it will tell you what it discovers. e.g.::
219+
220+
$ pyoxidizer find-resources dist/oxidized_importer-0.1-cp38-cp38-manylinux1_x86_64.whl
221+
parsing dist/oxidized_importer-0.1-cp38-cp38-manylinux1_x86_64.whl as a wheel archive
222+
PythonExtensionModule { name: oxidized_importer }
223+
PythonPackageDistributionResource { package: oxidized-importer, version: 0.1, name: LICENSE }
224+
PythonPackageDistributionResource { package: oxidized-importer, version: 0.1, name: WHEEL }
225+
PythonPackageDistributionResource { package: oxidized-importer, version: 0.1, name: top_level.txt }
226+
PythonPackageDistributionResource { package: oxidized-importer, version: 0.1, name: METADATA }
227+
PythonPackageDistributionResource { package: oxidized-importer, version: 0.1, name: RECORD }
228+
229+
Or give it the path to a ``site-packages`` directory::
230+
231+
$ pyoxidizer find-resources ~/.pyenv/versions/3.8.6/lib/python3.8/site-packages
232+
...
233+
234+
This command needs to use a Python distribution so it knows what file
235+
extensions correspond to Python extensions, etc. By default, it will
236+
download one of the
237+
:ref:`built-in distributions <packaging_python_distributions>` that is
238+
compatible with the current machine and use that. You can specify a
239+
``--distributions-dir`` to use to cache downloaded distributions::
240+
241+
$ pyoxidizer find-resources --distributions-dir distributions /usr/lib/python3.8
242+
...

docs/packaging_pitfalls.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,21 @@ like so::
4141

4242
if getattr(sys, 'oxidized', False):
4343
print('running in PyOxidizer!')
44+
45+
.. _pitfall_incorrect_resource_identification:
46+
47+
Incorrect Resource Identification
48+
=================================
49+
50+
PyOxidizer has custom code for scanning for and indexing files as specific
51+
Python resource types. This code is somewhat complex and nuanced and there
52+
are known bugs that will cause PyOxidizer to fail to identify or classify a
53+
file appropriately.
54+
55+
To help debug problems with this code, the ``pyoxidizer find-resources``
56+
command can be employed. See :ref:`cli_find_resources` for more.
57+
58+
.. important::
59+
60+
Please `file a bug <https://github.com/indygreg/PyOxidizer/issues/new>`_
61+
to report problems!

pyoxidizer/src/cli.rs

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,25 @@ This command executes the functionality to derive various artifacts and
7272
emits special lines that tell the Rust build system how to consume them.
7373
";
7474

75+
const RESOURCES_SCAN_ABOUT: &str = "\
76+
Scan a directory or file for Python resources.
77+
78+
This command invokes the logic used by various PyOxidizer functionality
79+
walking a directory tree or parsing a file and categorizing seen files.
80+
81+
The directory walking functionality is used by
82+
`oxidized_importer.find_resources_in_path()` and Starlark methods like
83+
`PythonExecutable.pip_install()` and
84+
`PythonExecutable.read_package_root()`.
85+
86+
The file parsing logic is used for parsing the contents of wheels.
87+
88+
This command can be used to debug failures with PyOxidizer's code
89+
for converting files/directories into strongly typed objects. This
90+
conversion is critical for properly packaging Python applications and
91+
bugs can result in incorrect install layouts, missing resources, etc.
92+
";
93+
7594
pub fn run_cli() -> Result<()> {
7695
let env = crate::environment::resolve_environment()?;
7796

@@ -121,6 +140,34 @@ pub fn run_cli() -> Result<()> {
121140
.help("The config file target to resolve"),
122141
),
123142
)
143+
.subcommand(
144+
SubCommand::with_name("find-resources")
145+
.about("Find resources in a file or directory")
146+
.long_about(RESOURCES_SCAN_ABOUT)
147+
.setting(AppSettings::ArgRequiredElseHelp)
148+
.arg(
149+
Arg::with_name("distributions_dir")
150+
.long("distributions-dir")
151+
.takes_value(true)
152+
.value_name("PATH")
153+
.help("Directory to extract downloaded Python distributions into"),
154+
)
155+
.arg(
156+
Arg::with_name("scan_distribution")
157+
.long("--scan-distribution")
158+
.help("Scan the Python distribution instead of a path"),
159+
)
160+
.arg(
161+
Arg::with_name("target_triple")
162+
.long("target-triple")
163+
.takes_value(true)
164+
.default_value(env!("HOST"))
165+
.help("Target triple of Python distribution to use"),
166+
)
167+
.arg(Arg::with_name("path").value_name("PATH").help(
168+
"Filesystem path to scan for resources. Must be a directory or Python wheel",
169+
)),
170+
)
124171
.subcommand(
125172
SubCommand::with_name("init-config-file")
126173
.setting(AppSettings::ArgRequiredElseHelp)
@@ -313,6 +360,33 @@ pub fn run_cli() -> Result<()> {
313360
)
314361
}
315362

363+
("find-resources", Some(args)) => {
364+
let path = if let Some(value) = args.value_of("path") {
365+
Some(Path::new(value))
366+
} else {
367+
None
368+
};
369+
let distributions_dir = if let Some(value) = args.value_of("distributions_dir") {
370+
Some(Path::new(value))
371+
} else {
372+
None
373+
};
374+
let scan_distribution = args.is_present("scan_distribution");
375+
let target_triple = args.value_of("target_triple").unwrap();
376+
377+
if path.is_none() && !scan_distribution {
378+
Err(anyhow!("must specify a path or --scan-distribution"))
379+
} else {
380+
projectmgmt::find_resources(
381+
&logger_context.logger,
382+
path,
383+
distributions_dir,
384+
scan_distribution,
385+
target_triple,
386+
)
387+
}
388+
}
389+
316390
("init-config-file", Some(args)) => {
317391
let code = args.value_of("python-code");
318392
let pip_install = if args.is_present("pip-install") {

pyoxidizer/src/projectmgmt.rs

Lines changed: 125 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,22 @@
77
use {
88
crate::project_building::find_pyoxidizer_config_file_env,
99
crate::project_layout::{initialize_project, write_new_pyoxidizer_config_file},
10-
crate::py_packaging::standalone_distribution::StandaloneDistribution,
10+
crate::py_packaging::{
11+
distribution::{default_distribution_location, resolve_distribution, DistributionFlavor},
12+
standalone_distribution::StandaloneDistribution,
13+
},
1114
crate::starlark::eval::{eval_starlark_config_file, EvalResult},
1215
anyhow::{anyhow, Result},
13-
std::fs::create_dir_all,
14-
std::io::{Cursor, Read},
15-
std::path::Path,
16+
python_packaging::{
17+
filesystem_scanning::find_python_resources,
18+
resource::{DataLocation, PythonResource},
19+
wheel::WheelArchive,
20+
},
21+
std::{
22+
fs::create_dir_all,
23+
io::{Cursor, Read},
24+
path::Path,
25+
},
1626
};
1727

1828
/// Attempt to resolve the default Rust target for a build.
@@ -146,6 +156,117 @@ pub fn run(
146156
res.context.run_target(target)
147157
}
148158

159+
/// Find resources given a source path.
160+
pub fn find_resources(
161+
logger: &slog::Logger,
162+
path: Option<&Path>,
163+
distributions_dir: Option<&Path>,
164+
scan_distribution: bool,
165+
target_triple: &str,
166+
) -> Result<()> {
167+
let distribution_location =
168+
default_distribution_location(&DistributionFlavor::Standalone, target_triple, None)?;
169+
170+
let mut temp_dir = None;
171+
172+
let extract_path = if let Some(path) = distributions_dir {
173+
path
174+
} else {
175+
temp_dir.replace(tempdir::TempDir::new("python-distribution")?);
176+
temp_dir.as_ref().unwrap().path()
177+
};
178+
179+
let dist = resolve_distribution(logger, &distribution_location, extract_path)?;
180+
181+
if scan_distribution {
182+
println!("scanning distribution");
183+
184+
for ext in dist.iter_extension_modules() {
185+
print_resource(&PythonResource::from(ext));
186+
}
187+
for source in dist.source_modules()? {
188+
print_resource(&PythonResource::from(source));
189+
}
190+
for data in dist.resource_datas()? {
191+
print_resource(&PythonResource::from(data));
192+
}
193+
} else if let Some(path) = path {
194+
if path.is_dir() {
195+
println!("scanning directory {}", path.display());
196+
for resource in
197+
find_python_resources(path, dist.cache_tag(), &dist.python_module_suffixes()?)
198+
{
199+
print_resource(&resource?);
200+
}
201+
} else if path.is_file() {
202+
if let Some(extension) = path.extension() {
203+
if extension.to_string_lossy() == "whl" {
204+
println!("parsing {} as a wheel archive", path.display());
205+
let wheel = WheelArchive::from_path(path)?;
206+
207+
for resource in
208+
wheel.python_resources(dist.cache_tag(), &dist.python_module_suffixes()?)?
209+
{
210+
print_resource(&resource)
211+
}
212+
213+
return Ok(());
214+
}
215+
}
216+
217+
println!("do not know how to find resources in {}", path.display());
218+
} else {
219+
println!("do not know how to find resources in {}", path.display());
220+
}
221+
} else {
222+
println!("do not know what to scan");
223+
}
224+
225+
Ok(())
226+
}
227+
228+
fn print_resource(r: &PythonResource) {
229+
match r {
230+
PythonResource::ModuleSource(m) => println!(
231+
"PythonModuleSource {{ name: {}, is_package: {}, is_stdlib: {}, is_test: {} }}",
232+
m.name, m.is_package, m.is_stdlib, m.is_test
233+
),
234+
PythonResource::ModuleBytecode(m) => println!(
235+
"PythonModuleBytecode {{ name: {}, is_package: {}, is_stdlib: {}, is_test: {}, bytecode_level: {} }}",
236+
m.name, m.is_package, m.is_stdlib, m.is_test, i32::from(m.optimize_level)
237+
),
238+
PythonResource::ModuleBytecodeRequest(_) => println!(
239+
"PythonModuleBytecodeRequest {{ you should never see this }}"
240+
),
241+
PythonResource::PackageResource(r) => println!(
242+
"PythonPackageResource {{ package: {}, name: {}, is_stdlib: {}, is_test: {} }}", r.leaf_package, r.relative_name, r.is_stdlib, r.is_test
243+
),
244+
PythonResource::PackageDistributionResource(r) => println!(
245+
"PythonPackageDistributionResource {{ package: {}, version: {}, name: {} }}", r.package, r.version, r.name
246+
),
247+
PythonResource::ExtensionModule(em) => {
248+
println!(
249+
"PythonExtensionModule {{"
250+
);
251+
println!(" name: {}", em.name);
252+
println!(" is_builtin: {}", em.builtin_default);
253+
println!(" has_shared_library: {}", em.shared_library.is_some());
254+
println!(" has_object_files: {}", !em.object_file_data.is_empty());
255+
println!(" link_libraries: {:?}", em.link_libraries);
256+
println!("}}");
257+
},
258+
PythonResource::EggFile(e) => println!(
259+
"PythonEggFile {{ path: {} }}", match &e.data {
260+
DataLocation::Path(p) => p.display().to_string(),
261+
DataLocation::Memory(_) => "memory".to_string(),
262+
}
263+
),
264+
PythonResource::PathExtension(_pe) => println!(
265+
"PythonPathExtension",
266+
),
267+
}
268+
}
269+
149270
/// Initialize a PyOxidizer configuration file in a given directory.
150271
pub fn init_config_file(
151272
project_dir: &Path,

0 commit comments

Comments
 (0)