slide_index

A utility for generating an index for Powerpoint slide handouts.

IMPORTANT: This utilitity requires miniz (https://code.google.com/archive/p/miniz/) as well as SQLite (which is usually installed on any Mac/Linux machine). As my code requires miniz.h and as the header file is in fact included into miniz.c, I have extracted it and added to this repository.

I've written this utility for the benefit of my students while teaching at Kansas State University - Course notes for my classes consist of 4-slide per page handouts of the slides used during lectures. Slides for handouts are modified to make animations legible in a PDF file, and contain added annotations, as I avoid having too much text (other than programs, as I teach CS) on the slides I present, and as I want to make sure that my students have all the important information; I don't doubt their note-taking abilities, but I prefer them to listen and watch rather than try to take everything I say in notes. As I average one slide per minute, that's a lot of slides at the end of the term and remembering during which lecture this important topic was discussed may be challenging when reviewing before exams. The idea of this tool is to have something far superior to full-text search. I have only tested it on my Mac; it should run without any problem on any Linux.

There are different ways to use this tool:

The preferred way is to 'tag' slides. This is what I use. It requires a bit of preparation but allows to generate a "quality" index. Tags must be inserted in the slide notes between square brackets:

[tag1;tag2; ...;tagn]

Each tag will become an index entry. Note that tags aren't necessarily a single word, but can be a full expression. They simply musn't contain the tag separator, which is ';' by default but can be changed. I have recently added support for \ to escape characters, but it's untested.

As I have more than once goofed with tags (typos, using once plural and once singular, inconsistent case) I have added the '-w' option to analyze tags and report what looks suspicious - tags that are identical except for the case (sometimes legit, you may want to index separately a word and a computer language keyword), long entries that contain something that looks like the separator but isn't, and entries that only differ by one character (using the Levenshtein distance). There may be some noise, but it should normally tell you what should be corrected and where.

Example:

[Bonaparte, Napoleon;Napoleon;French Revolution]

This slide will be indexed under

  Bonaparte, Napoleon
  French Revolution

and

  Napoleon

The other option is to read the words (not expressions) to index from an external text file and to compare them to words found either in the slide or in the notes. Each line in the file of words to index must contain either a single word, or a set of words separated by the same separator as the tag separator above. When the line contains several words, they are understood as variants (singular/plural for instance) and only the first one will appear in the index. For instance, if you have this line:

Napoleon;Bonaparte

Bonaparte is understood as a variant of Napoleon which will be the index entry. There will be no entry index for Bonaparte, but all slides containing Bonaparte will be indexed under Napoleon.

Slide decks are searched for the words to index in slide text, in slide notes, and also in the name of image files that are included in the files. Note that the search isn't case sensitive, but index entries follow the capitalization found in the file of words to index.

To help with generating a list of words susceptible of appearing in an index, the program can also generate a list of words on demand, by analyzing the slides. It's possible to specify a list of "stop words" (a list of English stop words found on the internet is provided). Words that appear too often are eliminated. This automatically generated list should of course be edited before generating the final index.

When generating the index, there are also several options:

By default, the index entries are composed of the name of the slide deck, followed by a list of slide numbers. Personally, I provide handouts with 4 slides per page. There is a -p option followed by the number of slides per page that generates an index with the ultimate page rather than slide number (but remember that was is indexed is a set of .pptx, not .pdf, slides)
By default again, what is generated is a text file. A -r option allows to generate instead a Rich Text Format (.rtf) file instead. With a rich text format, there is the possibility of making a difference between keywords (which are important in my discipline) and other words. By default (it can be disabled), a word that is all in lower case will be understood to be a special keyword and its index entry will be set in a monospace font and in bold. Alternatively, you can also define a special character (for instance @) as indicative of something that must be set in monospace and bold, and any word prefixed by the defined chracter will appear as such in the index. In Rich Text Format, index entries are on two columns and kept with the next line so that page breaks don't mess up everything. Open the file with MS Word, save it as a PDF and you are done.

The program writes its output to the standard output, which must be redirected. It also writes some informative messages (number of slides in eac .pptx file processed, for example) to the standard error.

USAGE

slide_index [flags] <pptx file> [<pptx file> ...]

Flags:

-w : Don't generate an index but generate warnings about possible typos in indexed words.

-d : Disable the "auto-keyword" mode. By default, all-lowercase words are understood as keywords and displayed differently in an RTF index.

-f : Enable the "auto-function" mode. Entries ending in () will be understood as function names and displayed differently in an RTF index.

-I <filename> : Read words to index from <filename>. Each line can contain several variants of the same word, separated by a character that defaults to a semi-colon. The first entry on the line is the one that appears in the index.

-k <char> : Prefix for keywords (displayed differently in an RTF index); must be a single character.

-p <num> : By default the slide number appear in the index. If handouts contain multiple slides per page, the -p flag followed by the number of slides per page will list the page number in the index.

-r : Generate an RTF (Rich-Text-Format) index instead of a plain-text one.

-s <char> : Change the separator from ';' to <char>

-S <filename> : Read words that are NOT to index (stop words) from <filename>. There must be only one word per line and no stemming is performed (all variants must be explicitly listed in the file)

-t : Exclusively index from tags in slide notes.

EXAMPLES

Generate lecture notes index as an RTF file from tagged slides; Words starting with @ in tags will be displayed as keywords:

slide_index -t -k '@' $HOME/CIS209/HANDOUTS/*.pptx > LectureIndex.rtf

Generate lecture notes index as a text file using a file of words that should appear in the index:

slide_index -I words_to_index.txt $HOME/CIS209/HANDOUTS/*.pptx > LectureIndex.txt

Generate a list of words (to edit) that could be used to generate the index. Ignore words in stop_words.txt:

slide_index -S stop_words.txt $HOME/CIS209/HANDOUTS/*.pptx > words_to_index.txt

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
README.md		README.md
levenshtein.c		levenshtein.c
makefile		makefile
miniz.h		miniz.h
si_db_ops.c		si_db_ops.c
si_db_ops.h		si_db_ops.h
si_db_ops.o		si_db_ops.o
si_parse_ops.c		si_parse_ops.c
si_parse_ops.o		si_parse_ops.o
si_settings.c		si_settings.c
si_settings.h		si_settings.h
si_settings.o		si_settings.o
si_tree_ops.c		si_tree_ops.c
si_tree_ops.h		si_tree_ops.h
si_tree_ops.o		si_tree_ops.o
si_util.c		si_util.c
si_util.h		si_util.h
si_util.o		si_util.o
similar.c		similar.c
slide_index.c		slide_index.c
slide_index.h		slide_index.h
slide_index.o		slide_index.o
sqlitefn.h		sqlitefn.h
stop_words.txt		stop_words.txt

License

sfaroult/slide_index

Folders and files

Latest commit

History

Repository files navigation

slide_index

About

Resources

License

Stars

Watchers

Forks

Languages