ottowhite · charlielidbury · May 10, 2026 · May 10, 2026 · May 10, 2026 · May 10, 2026
diff --git a/.gitignore b/.gitignore
@@ -35,3 +35,6 @@ logs_runtime/
 
 # Scraping
 src/superscraper/tools/.cache/
+
+# Local API response cache (PL harvester etc.)
+.cache/
diff --git a/Makefile b/Makefile
@@ -53,5 +53,11 @@ format/check:
 typecheck:
 	uv run ty check src/
 
-oversight/sync:
-	uv run python -m oversight.ArXivRepository --sync
+oversight/sync: oversight/sync/arxiv oversight/sync/pl
+
+oversight/sync/arxiv:
+	uv run python -m oversight.ArXivRepository --sync
+
+oversight/sync/pl:
+	uv run python -m oversight.PLConferenceHarvester --skip-existing-doi
+	uv run oversight consume data/pl_conferences/ --format scraped
diff --git a/data/pl_conferences/cc/1988.json b/data/pl_conferences/cc/1988.json
@@ -0,0 +1,20 @@
+[
+  {
+    "paper_id": "10.1007/3-540-51364-7_6",
+    "title": "Generators for High-Speed Front-Ends",
+    "abstract": "High-speed compilers can be constructed automatically. We present some existing tools for the generation of fast front-ends. Rex (Regular EXpression tool) is a scanner generator whose specifications are based on regular expressions and arbitrary semantic actions written in one of the target languages C or Modula-2. As scanners sometimes have to consider the context to unambiguously recognize a token the right context can be specified by an additional regular expression and the left context can be handled by so-called start states. The generated scanners automatically compute the line and column position of the tokens and offer an efficient mechanism to normalize identifiers and keywords to upper or lower case letters. The scanners are table-driven and run at a speed of 180,000 to 195,000 lines per minute on a MC 68020 processor. Lalr is a LALR(1) parser generator accepting grammars written in extended BNT notation which may be augmented by semantic actions expressed by statements of the target language. The generator provides a mechanism for S-attribution, that is synthesized attributes can be computed during parsing. In case of LR-conflicts, unlike other tools, Lalr provides not only information about an internal state consisting of a set of items but it prints a derivation tree which is much more useful to analyze the problem. Conflicts can be resolved by specifying precedence and associativity of operators and productions. The generated parsers include automatic error reporting, error recovery, and error repair. The parsers are table-driven and run at a speed of 400,000 lines per minute. Currently parsers can be generated in the target languages C and Modula-2. Ell is a LL(1) parser generator accepting the same specification language as Lalr except that the grammars must obey the LL(1) property. The generated parsers include automatic error reporting, recovery, and repair like Lalr. The parsers are implemented following the recursive descent method and reach a speed of 450,000 lines per minute. The possible target languages are again C and Modula-2 A comparison of the above tools with the corresponding UNIX tools shows that significant improvements have been achieved thus allowing the generation of high-speed compilers.",
+    "date": "1989-01-01",
+    "link": "https://doi.org/10.1007/3-540-51364-7_6",
+    "conference_name": "CC",
+    "authors": [
+      {
+        "first_name": "Josef",
+        "last_name": "Grosch",
+        "institution": "Karlsruhe Institute of Technology"
+      }
+    ],
+    "dblp_key": "conf/cc/Grosch88",
+    "venue": "cc",
+    "year": 1988
+  }
+]
diff --git a/data/pl_conferences/cc/1996.json b/data/pl_conferences/cc/1996.json
@@ -0,0 +1,20 @@
+[
+  {
+    "paper_id": "10.1007/3-540-61053-7_71",
+    "title": "Delegating Compiler Objects: An Object-Oriented Approach to Crafting Compilers",
+    "abstract": "Conventional compilers often are large entities that are highly complex, difficult to maintain and hard to reuse. In this article it is argued that this is due to the inherently functional approach to compiler construction. An alternative approach to compiler construction is proposed, based on object-oriented principles, which solves (or at least lessens) the problems of compiler construction. The approach is based on delegating compiler objects (Dcos) that provide a structural decomposition of compilers in addition to the conventional functional decomposition. The DCO approach makes use of the parser delegation and lexer delegation techniques, that provide reuse and modularisation of syntactical, respectively, lexical specifications.",
+    "date": "1996-01-01",
+    "link": "https://doi.org/10.1007/3-540-61053-7_71",
+    "conference_name": "CC",
+    "authors": [
+      {
+        "first_name": "Jan",
+        "last_name": "Bosch",
+        "institution": ""
+      }
+    ],
+    "dblp_key": "conf/cc/Bosch96",
+    "venue": "cc",
+    "year": 1996
+  }
+]
diff --git a/data/pl_conferences/cc/1998.json b/data/pl_conferences/cc/1998.json
@@ -0,0 +1,25 @@
+[
+  {
+    "paper_id": "10.1007/BFb0026420",
+    "title": "Generalised Recursive Descent parsing and Fellow-Determinism",
+    "abstract": "This paper presents a construct for mapping arbitrary non-left recursive context-free grammars into recursive descent parsers that: handle ambiguous grammars correctly; perform with LL(1) efficiency on LL(1) grammars; allow straightforward implementation of both inherited and synthesized attributes; and allow semantic actions to be added at any point in the grammar. We describe both the basic algorithm and a tool, GRDP, which generates parsers which use this technique. Modifications of the basic algorithm to improve efficiency lead to a discussion of follow-determinism, a fundamental property that gives insights into the behaviour of both LL and LR parsers.",
+    "date": "1998-01-01",
+    "link": "https://doi.org/10.1007/BFb0026420",
+    "conference_name": "CC",
+    "authors": [
+      {
+        "first_name": "Adrian",
+        "last_name": "Johnstone",
+        "institution": "Royal Holloway University of London"
+      },
+      {
+        "first_name": "Elizabeth",
+        "last_name": "Scott",
+        "institution": "Universidad de Londres"
+      }
+    ],
+    "dblp_key": "conf/cc/JohnstoneS98",
+    "venue": "cc",
+    "year": 1998
+  }
+]
diff --git a/data/pl_conferences/cc/1999.json b/data/pl_conferences/cc/1999.json
@@ -0,0 +1,25 @@
+[
+  {
+    "paper_id": "10.1007/978-3-540-49051-7_3",
+    "title": "Faster Generalized LR Parsing",
+    "abstract": "Tomita devised a method of generalized LR (GLR) parsing to parse ambiguous grammars efficiently. A GLR parser uses linear-time LR parsing techniques as long as possible, falling back on more expensive general techniques when necessary.Much research has addressed speeding up LR parsers. However, we argue that this previous work is not transferable to GLR parsers. Instead, we speed up LR parsers by building larger pushdown automata, trading space for time. A variant of the GLR algorithm then incorporates our faster LR parsers.Our timings show that our new method for GLR parsing can parse highly ambiguous grammars significantly faster than a standard GLR parser.",
+    "date": "1999-01-01",
+    "link": "https://doi.org/10.1007/978-3-540-49051-7_3",
+    "conference_name": "CC",
+    "authors": [
+      {
+        "first_name": "John",
+        "last_name": "Aycock",
+        "institution": "University of Victoria"
+      },
+      {
+        "first_name": "Nigel",
+        "last_name": "Horspool",
+        "institution": "University of Victoria"
+      }
+    ],
+    "dblp_key": "conf/cc/AycockH99",
+    "venue": "cc",
+    "year": 1999
+  }
+]
diff --git a/data/pl_conferences/cc/2001.json b/data/pl_conferences/cc/2001.json
@@ -0,0 +1,20 @@
+[
+  {
+    "paper_id": "10.1007/3-540-45306-7_1",
+    "title": "Virtual Classes and Their Implementation",
+    "abstract": "One of the characteristics of BETA [4] is the unification of abstraction mechanisms such as class, procedure, process type, generic class, interface, etc. into one abstraction mechanism: the pattern. In addition to keeping the language small, the unification has given a systematic treatment of all abstraction mechanisms and leads to a number of new possibilities. One of the interesting results of the unification is the notion of virtual class [[7],[8], which is the BETA mechanism for expressing genericity. A class may define an attribute in the form of a virtual class just as a class may define an attribute in the form of a virtual procedure. A subclass may then refine the definition of the virtual class attribute into a more specialized class. This is very much in the same way as a virtual procedure can be refined - resulting in a more specialized procedure. Virtual classes can be seen as an object-oriented version of generics. Other attempts to provide genericity for OO languages has been based on various forms of parametric polymorphism and function application rather than inheritance. Virtual classes have been used for more than 15 years in the BETA community and they have demonstrated their usefulness as a powerful abstraction mechanism. There has recently been an increasing interest in virtual classes and a number of proposals for adding virtual classes to other languages, extending virtual classes, and unifying virtual classes and parameterized classes have been made [[1],[2],[3],[13],[14],[15],[16],[17]. Another distinguishing feature of BETA is the notion of nested class [6]. The nested class construct originates already with Simula and is supported in a more general form in BETA. Nested classes have thus been available to the OO community for almost 4 decades, and the mechanism has found many uses in particular to structure large systems. Despite the usefulness, mainstream OO languages have not included general nesting mechanisms although C++ has a restricted form of nested classes, only working as a scoping mechanism. Recently nested classes has been added to the Java language. From a semantic analysis point of view the combination of inheritance, and general nesting adds some complexity to the semantic analysis, since the search space for names becomes two-dimensional. With virtual classes, the analysis becomes even more complicated — for details see ref. [10]. The unification of class and procedure has also lead to an inheritance mechanism for procedures [5] where method-combination is based on the inner-construct known from Simula. In BETA, patterns are first-class values, which implies that procedures as well as classes are first-class values. BETA also supports the notion of class-less objects, which has been adapted in the form of anonymous classes in Java. Finally, it might be mentioned that BETA supports coroutines as well as concurrent active objects. For further details about BETA, see [6,9,11]. The Mjølner System is a program development environment for BETA and may be obtained from ref. [12].",
+    "date": "2001-01-01",
+    "link": "https://doi.org/10.1007/3-540-45306-7_1",
+    "conference_name": "CC",
+    "authors": [
+      {
+        "first_name": "Ole Lehrmann",
+        "last_name": "Madsen",
+        "institution": "Aarhus University"
+      }
+    ],
+    "dblp_key": "conf/cc/Madsen01",
+    "venue": "cc",
+    "year": 2001
+  }
+]
diff --git a/data/pl_conferences/cc/2004.json b/data/pl_conferences/cc/2004.json
@@ -0,0 +1,30 @@
+[
+  {
+    "paper_id": "10.1007/978-3-540-24723-4_7",
+    "title": "Generalised Parsing: Some Costs",
+    "abstract": "We discuss generalisations of bottom up parsing, emphasising the relative costs for real programming languages. Our goal is to provide a roadmap of the available approaches in terms of their space and time performance for programming language applications, focusing mainly on GLR style algorithms. It is well known that the original Tomita GLR algorithm fails to terminate on hidden left recursion: here we analyse two approaches to correct GLR parsing (i) the modification due to Farshi that is incorporated into Visser’s work and (ii) our own right-nullable GLR (RNGLR) algorithm, showing that Farshi’s approach can be expensive. We also present results from our new Binary RNGLR algorithm which is asymptotically the fastest parser in this family and show that the recently reported reduction incorporated parsers can require automata that are too large to be practical on current machines.",
+    "date": "2004-01-01",
+    "link": "https://doi.org/10.1007/978-3-540-24723-4_7",
+    "conference_name": "CC",
+    "authors": [
+      {
+        "first_name": "Adrian",
+        "last_name": "Johnstone",
+        "institution": "Royal Holloway University of London"
+      },
+      {
+        "first_name": "Elizabeth",
+        "last_name": "Scott",
+        "institution": "Royal Holloway University of London"
+      },
+      {
+        "first_name": "Giorgios",
+        "last_name": "Economopoulos",
+        "institution": "Royal Holloway University of London"
+      }
+    ],
+    "dblp_key": "conf/cc/JohnstoneSE04",
+    "venue": "cc",
+    "year": 2004
+  }
+]
diff --git a/data/pl_conferences/cc/2008.json b/data/pl_conferences/cc/2008.json
@@ -0,0 +1,25 @@
+[
+  {
+    "paper_id": "10.1007/978-3-540-78791-4_11",
+    "title": "Compiler-Guaranteed Safety in Code-Copying Virtual Machines",
+    "abstract": "Virtual Machine authors face a difficult choice between low performance, cheap interpreters, or specialized and costly compilers. A method able to bridge this wide gap is the existing code-copying technique that reuses chunks of the VM’s binary code to create a simple JIT. This technique is not reliable without a compiler guaranteeing that copied chunks are still functionally equivalent despite aggressive optimizations. We present a proof-of-concept, minimal-impact modification of a highly optimizing compiler, GCC. A VM programmer marks chunks of VM source code as copyable. The chunks of native code resulting from compilation of the marked source become addressable and self-contained. Chunks can be safely copied at VM runtime, concatenated and executed together. This allows code-copying VMs to safely achieve speedup up to 3 times, 1.67 on average, over the direct interpretation. This maintainable enhancement makes the code-copying technique reliable and thus practically usable.",
+    "date": "2008-04-01",
+    "link": "https://doi.org/10.1007/978-3-540-78791-4_11",
+    "conference_name": "CC",
+    "authors": [
+      {
+        "first_name": "Gregory B.",
+        "last_name": "Prokopski",
+        "institution": "McGill University"
+      },
+      {
+        "first_name": "Clark",
+        "last_name": "Verbrugge",
+        "institution": "McGill University"
+      }
+    ],
+    "dblp_key": "conf/cc/ProkopskiV08",
+    "venue": "cc",
+    "year": 2008
+  }
+]
diff --git a/data/pl_conferences/cc/2013.json b/data/pl_conferences/cc/2013.json
@@ -0,0 +1,45 @@
+[
+  {
+    "paper_id": "10.1007/978-3-642-37051-9_6",
+    "title": "Simple and Efficient Construction of Static Single Assignment Form",
+    "abstract": "We present a simple SSA construction algorithm, which allows direct translation from an abstract syntax tree or bytecode into an SSA-based intermediate representation. The algorithm requires no prior analysis and ensures that even during construction the intermediate representation is in SSA form. This allows the application of SSA-based optimizations during construction. After completion, the intermediate representation is in minimal and pruned SSA form. In spite of its simplicity, the runtime of our algorithm is on par with Cytron et al.’s algorithm.",
+    "date": "2013-01-01",
+    "link": "https://doi.org/10.1007/978-3-642-37051-9_6",
+    "conference_name": "CC",
+    "authors": [
+      {
+        "first_name": "Matthias",
+        "last_name": "Braun",
+        "institution": "Karlsruhe Institute of Technology"
+      },
+      {
+        "first_name": "Sebastian",
+        "last_name": "Buchwald",
+        "institution": "Karlsruhe Institute of Technology"
+      },
+      {
+        "first_name": "Sebastian",
+        "last_name": "Hack",
+        "institution": "Saarland University"
+      },
+      {
+        "first_name": "Roland",
+        "last_name": "Leißa",
+        "institution": "Saarland University"
+      },
+      {
+        "first_name": "Christoph",
+        "last_name": "Mallon",
+        "institution": "Saarland University"
+      },
+      {
+        "first_name": "Andreas",
+        "last_name": "Zwinkau",
+        "institution": "Karlsruhe Institute of Technology"
+      }
+    ],
+    "dblp_key": "conf/cc/BraunBHLMZ13",
+    "venue": "cc",
+    "year": 2013
+  }
+]
diff --git a/data/pl_conferences/cc/2014.json b/data/pl_conferences/cc/2014.json
@@ -0,0 +1,48 @@
+[
+  {
+    "paper_id": "10.1007/978-3-642-54807-9_12",
+    "title": "String Analysis for Dynamic Field Access",
+    "abstract": "In JavaScript, and scripting languages in general, dynamic field access is a commonly used feature. Unfortunately, current static analysis tools either completely ignore dynamic field access or use overly conservative approximations that lead to poor precision and scalability. We present new string domains to reason about dynamic field access in a static analysis tool. A key feature of the domains is that the equal, concatenate and join operations take $\\mathcal{O}$ (1) time. Experimental evaluation on four common JavaScript libraries, including jQuery and Prototype, shows that traditional string domains are insufficient. For instance, the commonly used constant string domain can only ensure that at most 21% dynamic field accesses are without false positives. In contrast, our string domain $\\mathcal{H}$ ensures no false positives for up to 90% of all dynamic field accesses. We demonstrate that a dataflow analysis equipped with the $\\mathcal{H}$ domain gains significant precision resulting in an analysis speedup of more than 1.5x for 7 out of 10 benchmark programs.",
+    "date": "2014-01-01",
+    "link": "https://doi.org/10.1007/978-3-642-54807-9_12",
+    "conference_name": "CC",
+    "authors": [
+      {
+        "first_name": "Magnus",
+        "last_name": "Madsen",
+        "institution": "Aarhus University"
+      },
+      {
+        "first_name": "Esben",
+        "last_name": "Andreasen",
+        "institution": "Aarhus University"
+      }
+    ],
+    "dblp_key": "conf/cc/MadsenA14",
+    "venue": "cc",
+    "year": 2014
+  },
+  {
+    "paper_id": "10.1007/978-3-642-54807-9_8",
+    "title": "Taming Control Divergence in GPUs through Control Flow Linearization",
+    "abstract": "Branch divergence is a very commonly occurring performance problem in GPGPU in which the execution of diverging branches is serialized to execute only one control flow path at a time. Existing hardware mechanism to reconverge threads using a stack causes duplicate execution of code for unstructured control flow graphs. Also the stack mechanism cannot effectively utilize the available parallelism among diverging branches. Further, the amount of nested divergence allowed is also limited by depth of the branch divergence stack. In this paper we propose a simple and elegant transformation to handle all of the above mentioned problems. The transformation converts an unstructured CFG to a structured CFG without duplicating user code. It incurs only a linear increase in the number of basic blocks and also the number of instructions. Our solution linearizes the CFG using a predicate variable. This mechanism reconverges the divergent threads as early as possible. It also reduces the depth of the reconvergence stack. The available parallelism in nested branches can be effectively extracted by scheduling the basic blocks to reduce the effect of stalls due to memory accesses. It can also increase execution efficiency of nested loops with different trip counts for different threads. We implemented the proposed transformation at PTX level using the Ocelot compiler infrastructure. We evaluated the technique using various benchmarks to show that it can be effective in handling the performance problem due to divergence in unstructured CFGs.",
+    "date": "2014-01-01",
+    "link": "https://doi.org/10.1007/978-3-642-54807-9_8",
+    "conference_name": "CC",
+    "authors": [
+      {
+        "first_name": "Jayvant",
+        "last_name": "Anantpur",
+        "institution": "Indian Institute of Science Bangalore"
+      },
+      {
+        "first_name": "R.",
+        "last_name": "Govindarajan",
+        "institution": "Indian Institute of Science Bangalore"
+      }
+    ],
+    "dblp_key": "conf/cc/AnantpurG14",
+    "venue": "cc",
+    "year": 2014
+  }
+]