Of course, once discovered, it had to be put into use. But, how do fingerprints work under Linux?
fprintd,
enrolls and verifies fingerprints.
The lsusb
command gives a long list of input devices. None of them
looked like a fingerprint reader, but upon closer inspection we have:
Bus 003 Device 006: ID 27c6:609c Shenzhen Goodix Technology Co.,Ltd. [unknown]
Indeed, it is included in the fprint list of supported devices!
On Fedora, this was straightforward. Just remember the PAM module as well, which we'll use later:
sudo dnf install fprintd fprintd-pam
I registered two fingers:
sudo fprintd-enroll stefan -l left-index-finger
sudo fprintd-enroll stefan -l right-index-finger
Note the username as the first argument, otherwise all your fingerprints are belong to root.
My first attempt to enable fingerprint was:
sudo authselect current
sudo authselect enable-feature with-fingerprint
sudo authselect apply-changes
HOWEVER, this results in both a password and a username being
required. And sudo
first gives you the option of taking a fingerprint
(this can be bypassed with Ctrl-C, and also does not appear when using
SSH).
I had no desire to use fingerprints for logging in; I just need an easy way to unlock my screen lock, swaylock.
Fortunately, swaylock
has built-in PAM support, but the same concept
shown here works for all apps that support PAM, including login.
Following ArchWiki fprintd instructions, I added a PAM profile for swaylock. In
/etc/pam.d/swaylock
:
auth sufficient pam_unix.so try_first_pass likeauth nullok
auth sufficient pam_fprintd.so
auth required pam_deny.so
account required pam_unix.so
By default, swaylock
will send through empty passwords to PAM for
authentication, which is what we want. But if you have a configuration
file in, e.g., ~/.swaylock/config
, you may need to comment out
ignore-empty-password
.
And, voila, either password or fingerprint accepted for unlocking! If
you need both, you can just modify the pam.d
profile from sufficient
to required
.
To enable fingerprint and password at the same time, you'd need pam-fprint-grosshack or similar, but I'm happy to press enter before fingerprint.
P.S. This is the first blog post I've written in org-syntax. Hugo supports it seamlessly, and since I keep work journal entries in org-mode anyway, it was a lot easier to copy content this way.
While I could not replicate that entire pipeline, specifically the voice part, I at least found a way to capture tasks on Android and transfer them to org mode.
I chose Google Tasks since (a) it has an API (unlike Keep) and (b) there is a strong possibility that we’ll have voice capture for tasks in the near future π€.
First, install Google Tasks on your phone, add a new TODO list, and add a few tasks.
Next, we are going to set up an Apps Script service that will convert our tasks to org format, and expose the result via a URL.
Google Tasks to Org
).Extensions β‘ Apps Script
; the editor opens. Name the app (say, tasks-to-org
).Services
and enable the Tasks API.// Once this code is deployed as a web app, you can call it via curl:
//
// URL="..."
// TOKEN="..."
// curl -s -S -L -d "$TOKEN" "$URL?clear=1" >> output.org
//
// Remember to customize `token` and `taskList` below.
function doPost(e) {
// A token (your password) can be anything; you can generate it using, e.g., Python:
//
// python -c "import base64, os; print(base64.b64encode(os.urandom(50)).decode('ascii'))"
//
const token = "abcdefg12345";
// The name of the task list in Google Tasks
// This name is case sensitive
const taskList = "org";
var response = "Invalid token";
const clear = (e.parameter['clear'] === '1');
if (e.postData.contents === token) {
response = tasksToOrg(taskList, clear);
}
return ContentService.createTextOutput(response);
}
function doGet(e) {
return ContentService.createTextOutput("OK");
}
function taskToOrg(task) {
// See https://developers.google.com/tasks/reference/rest/v1/tasks#Task
const days = ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"]
var entry = '* ';
const status = (task.status == "completed" ? "DONE" : "TODO");
var deadline = '';
var notes = '';
if (task.due) {
const due = new Date(task.due);
const year = due.getFullYear().toString();
const month = (due.getMonth() + 1).toString().padStart(2, '0');
const day = due.getDate().toString().padStart(2, '0');
const wkday = days[due.getDay()];
deadline = `\n DEADLINE: <${year}-${month}-${day} ${wkday}>`;
}
if (task.notes) {
notes = task.notes.split("\n");
for (let i = 0; i < notes.length; i++) {
notes[i] = "\n " + notes[i];
}
notes = notes.join("");
}
return `* ${status} ${task.title}${deadline}${notes}`;
}
function tasksToOrg(taskListTitle, clear) {
var org_file = [];
const taskLists = Tasks.Tasklists.list({maxResults: 100, showDeleted: true, showCompleted: true, showHidden: true}).items;
const taskList = taskLists.filter(tl => tl.title === taskListTitle)[0];
const taskListId = taskList.id;
const tasks = Tasks.Tasks.list(taskListId);
for (let i = 0; i < tasks.items.length; i++) {
const task = tasks.items[i];
org_file.push(taskToOrg(task));
}
if (clear) {
for (let i = 0; i < tasks.items.length; i++) {
const task = tasks.items[i];
Tasks.Tasks.remove(taskListId, task.id);
}
}
return org_file.join("\n");
}
We want to expose this script at a public URL, so go to Deploy β‘ New Deployment
(Deploy
is in the upper-righthand corner).
Set Execute as
to your email and Who has access
to anyone.
It needs to run as yourself, to access your tasks, and we need anyone to have access, since you will be calling the URL anonymously from the terminal (not signed in from a browser).
Click deploy.
Google will ask you to authenticate the app, and warn you that the app is not officially authenticated. Click away the warnings and proceed (you trust yourself, don’t you?).
Now, you should be presented with a URL which, when accessed, runs the app.
Set the variable token
to any long, hard-to-guess string of your choice. You can generate one with Python:
python -c "import base64, os; print(base64.b64encode(os.urandom(50)).decode('ascii'))"
Set the variable taskList
to the same name as the list in Google Tasks.
We can now access our app via a POST
request. E.g., using curl
:
curl -s -S -L -d "abcdefg12345 your token value" https://script.google.com/macros/s/abc123-app-id-generated-by-google/exec
This should yield something like:
* TODO First task
* TODO A task for today
DEADLINE: <2023-01-11 Wed>
* TODO Another task
This time, the task has a description.
To clear the tasks after downloading, add a clear=1
argument:
curl -s -S -L -d "abcdefg12345 your token value" https://script.google.com/macros/s/abc123-app-id-generated-by-google/exec?clear=1
Here’s the general script I use:
#!/bin/bash
TASKS_TO_ORG_URL="https://script.google.com/macros/s/.../exec"
ORG_INBOX="${HOME}/org/tasks-inbox.org"
TOKEN="abcdefg12345"
curl -s -S -L -d "$TOKEN" "$TASKS_TO_ORG_URL?clear=1" >> $ORG_INBOX
In my daily org planner, I then put:
- [ ] ([[shell:~/scripts/google-tasks-to-org.sh][fetch]]) [[file:~/org/tasks-inbox.org][Google Tasks]]
ls_hashtag
.Without further ado, some Python code which implements the above.
It caches secrets in ~/.config/mastodon-tags.yaml
.
I suspect tokens expire after a while, but the script doesn’t yet take that into account.
#!/usr/bin/env python
import requests
import yaml
import os
import webbrowser
import sys
SERVER = 'https://your.mastodon.server'
APP_NAME = 'ls_hashtag'
CONFIG = os.path.expandvars('$HOME/.config/mastodon-tags.yaml')
def get_config():
if os.path.isfile(CONFIG):
config = yaml.load(open(CONFIG, 'r'), Loader=yaml.SafeLoader) or {}
else:
config = {}
return config
def update_config(update_dict):
cfg = get_config()
cfg = {**update_dict, **cfg}
with open(CONFIG, 'w') as f:
yaml.dump(cfg, f)
return cfg
def post(url, **kwargs):
data = requests.post(url, **kwargs).json()
if 'error' in data:
print(f'POST {url}')
print()
print(json['error'])
sys.exit(1)
return data
cfg = get_config()
if not 'client_id' in cfg:
data = post(
f'{SERVER}/api/v1/apps',
json={
'client_name': APP_NAME,
'redirect_uris': 'urn:ietf:wg:oauth:2.0:oob',
'scopes': 'read'
}
)
cfg = update_config({
'client_id': data['client_id'],
'client_secret': data['client_secret']
})
if not 'authorization_code' in cfg:
oauth_url = f"{SERVER}/oauth/authorize?client_id={cfg['client_id']}&client_secret={cfg['client_secret']}&response_type=code&redirect_uri=urn:ietf:wg:oauth:2.0:oob"
webbrowser.open(oauth_url)
auth_code = input("Enter token from browser window: ")
cfg = update_config({
'authorization_code': auth_code
})
# Obtain OAuth access token
if not 'access_token' in cfg:
data = post(
f'{SERVER}/oauth/token',
json={
"grant_type": "authorization_code",
"code": cfg['authorization_code'],
"client_id": cfg['client_id'],
"client_secret": cfg['client_secret'],
"redirect_uri": "urn:ietf:wg:oauth:2.0:oob"
}
)
cfg = update_config({
'access_token': data['access_token']
})
data = requests.get(
f'{SERVER}/api/v1/followed_tags',
headers={'Authorization': f"Bearer {cfg['access_token']}"}
).json()
N = max(len(tag['name']) for tag in data)
for hashtag in data:
print(f'#{hashtag["name"]:{N}} {hashtag["url"]}')
In this post, I will define scientific software, describe how it is different from other types of software, and discuss the social, mechanical, and scholarly disciplines required for making it the best it can be.
Upfront, I should note that these are all rough guidelines, but that the context in which software is developed and used will affect the extent to which they apply. I make some generalizations to get my point across.
Scientific software is typically computational in nature, and aids us in our understanding of the world, often by processing observational measurements. Because scientific software aims to uncover truths, accuracy is of paramount importance. Of course, accuracy is important in many industrial applications too—you do not want a parcel delivered to the wrong address, or an email sent to the wrong recipient—but in science errors invariably invalidate the work.
Here, I consider a subset of scientific software: that which is written to aid in reproducibility. This software has to be reusable1—particularly by those who did not write the software. Even if reproducibility is not explicitly targeted, eventually most scientific codes become associated with research outputs, such as published papers. At this point, ideally, the software should be open for inspection, so that the science can be verified. The vast majority of research code has not historically fallen into this category2, but I would argue that there are systematic changes underway that will place more and more emphasis on openness.
We write software for different purposes, and that changes the amount of rigor applied. Just as you may make some initial calculations on a napkin, code is sometimes used to improve your understanding of a problem, without any intent of using that code in future scenarios.
Apart from this type of “scratch” code, in increasing levels of maturity we have: experimental code for private use, experimental code that will be published, and algorithmic or infrastructure code with an eye towards reusability.
I would argue that the first type—experimental code that never leaves the room—is to be avoided at all cost. This code cannot be verified, and is in some sense no better than a black box that takes some arguments and spits out an answer. Is it a good answer that reflects an underlying truth? There is no way for anyone on the outside to tell.
Experimental code should be tracked, just as you would fastidiously keep a lab notebook in a chemical laboratory.
Now, let us examine the mechanisms we can use to improve the quality of our software. I want to start with a higher-level concept: that of the social environment in which you develop software, and then move on to more specific technical mechanisms.
One day, while I was a postdoc at Berkeley’s Helen Wills Neuroscience Institute, I walked into my office to find the DiPy—that is, Diffusion Imaging in Python—team, busy finalizing their new API design after an entire day’s work.
I happened to be implementing identical functionality for my research, not realizing that their library covered similar ground. So, curious—and excited about the prospect of being able to collaborate and leverage their work—I started asking them about their new design, with my own use case in mind.
It soon became clear that the software would not yet be able to address my needs. To the team’s credit, despite some tired and exasperated looks, they worked with me for the rest of the weekend to refine their design until we had a general enough interface that addressed all of our needs.
For the next two years, I worked very closely with this team, helping them build out the package. That software—which serves a niche purpose in MRI—now has more than 450 citations.
The takeaway for me was that, here, an extra pair of eyes made a difference to the general utility of that software, and because of the newly discovered synergy the development team grew, which in turn led to many further hours of productive collaboration.
There is a large and ever-growing volume of research that shows the advantages of group work. For example, about an April 2006 article in the Journal of Personality and Social Psychology, author Patrick Laughlin from the University of Illinois says:
“We found that groups of size three, four, and five outperformed the best individuals and attribute this performance to the ability of people to work together to generate and adopt correct responses, reject erroneous responses, and effectively process information.”
This emphasizes the quality of work a team can do. Other similar experiments compare, e.g., teams of students against their professors.
For a team to work well, however, it has to have certain qualities, such as the right size and culture.
Jeff Sutherland, the co-inventor of SCRUM and co-author of the Agile Manifesto, said at the Global Scrum Gathering 2017 that he believed the optimal team size is 5.
But he also observe something remarkable3: that while individual developers different in productivity with a ratio of as much as 1:10, the difference can be orders of magnitude larger when it comes to teams.
In 2012, Google launched Project Aristotle to determine what makes a good team successful. One of their key findings was that group norms are a key to success. In other words, it makes a difference when team members feel comfortable enough to take risks, make mistakes, and voice their opinions.
There are also some further straightforward connections with teams and better work:
In summary: find good and varied collaborators; your software will be better because of it.
We now turn to more straightforward technical mechanisms, techniques, and disciplines for improving the quality of your software.
While working on a large open source project, we had a collaborator who did not “believe” in testing. That person also happened to be a very talented programmer; still, I made a point of writing a test for each piece of functionality they included. Invariably, almost every single contribution performed incorrectly on some corner case of the data.
I knew at this point already that almost all the code I wrote was flawed in some way, and that it only improved through iterative refinement. But it was an eye-opener to see that the best programmers still made mistakes—and frequently.
Just like when you drive a car, you have certain blind spots while programming. And just like you check that blind spot on the road, there are techniques we can use to make software development safer.
Let me highlight a few of those mechanisms:
At a very minimum, your software has to be run, frequently. This does not guarantee correct results, but at least allows you to go back and re-run earlier steps of an analysis, as needed. You’d be surprised how many researchers are stuck in a situation where they cannot explore new ideas, because parts of their pipelines are stuck in an unrunnable state. This is bad news: bad for your research, bad for reproducibility, and bad for any papers that rely on these stagnant results.
Ideally, however, your code will be tested thoroughly. While, or
immediately after, writing code, you will examine corner cases,
generate (by hand if needed) test data, and compare output against
pre-calculated results. The formula is simple: if I execute f(x)
,
do I get y
as expected?
Testing is my primary concern around Jupyter notebooks for serious science. They can be tested, but it is hard, and the mechanisms are not widely available. They also encourage the use of global state, instead of functions. As such, I strongly discourage my students from using them other than for scratch experiments, or as a by-product of research to allow others to explore data and results.
To automate your tests, you may make use of continuous integration systems such as Travis-CI, Azure Pipelines, et cetera. (See, e.g., PR #3584 from NumPy, and its test results on Travis-CI.)
Using a revision control system is a must, and I recommend Git. This lets you track the history of your software as it evolves, reduces the need to keep hundreds of backup copies, and allows other developers to take part in development. See revision control as the equivalent of your lab notebook, but for your software. You’ll never need to worry about lost code again, and your stress levels the day before publishing will thank you for it.
In “The Toyota Way: 14 Principles From The World’s Greatest Manufacturer”, Jeffrey Liker describes their 5S program for effective and productive workspaces. Three of these are: Sort, Straighten, and Shine. That is, sort out unneeded items, have a place for everything, and keep your workspace clean. Git provides the software tools to do this.
Your code may never see another pair of eyes, but even for your own benefit: document your code. I suggest identifying a documentation template, such as the one we use for the NumPy project. Here, we decide beforehand the types of things that should be documented, so that each function has the same information. For example, you can describe: the intent of the function, input parameters, return values, usage examples, and notes and references.
We all know that moment. Huzzah! Ten lines of code condensed into a one-liner. It’s a divine piece of art, beautifully compact.
And, yet, so utterly incomprehensible to future you.
I don’t know where the following idea originated, but Robert Martin states it well in “Clean Code”. He writes: βIndeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. …[Therefore,] making it easy to read makes it easier to write.β
In addition, Martin Fowler wrote: “Any fool can write code that a computer can understand. Good programmers write code that humans can understand.”
Making code shorter does not necessarily mean making it simpler to understand. Strive for clarity of expression, instead.
I’ll also note here that object orientation is a great distraction to new programmers, when functional design would suffice for the majority of cases.
We all know the Donald Knuth quote: “Premature optimization is the root of all evil”.
I experienced this first-hand while writing a library to calculate discrete pulse transforms of images. I spent weeks on this project, utilizing every trick in the book: I wrote a library in C, cached every intermediate calculation, and invented a new sparse data-structure. I felt good about this code; it was clever and fast.
One day, an old professor walked into my office. “Oh, by the way,” he said, “I wrote an implementation of the discrete pulse transform this weekend.” I ran his code: it executed correctly, and was at least as fast as mine. Confused, I examined his work more closely: 80 lines of pure Python. No particular optimizations. Just: the right idea, aided by the appropriate built-in data structures (priority queues, in this case).
It turns out the professor was the one who invented the algorithm, and he knew a thing or two about it. This experience taught me a valuable lesson: get the execution order right. You can write your code in assembler, and your $\mathcal{O}(N^2)$ algorithm still won’t beat $\mathcal{O}(N)$ runtime on a large enough dataset. Worry about implementing the correct algorithm, instead of optimizing the wrong one.
As far as possible, automate all processes: building of your software, testing, producing figures for papers, the works. When you automate a process, you have one place to go to make a fix that affects all of your work. Otherwise, you will be very tempted to only fix the outputs that urgently require attention.
The Japanese has the practice of Kaizen (ζΉε), or “continual improvement”. Small changes, applied many times over, can result in great progress.
As an example of a fully automated publication, see Elegant SciPy.
There is no single best way to structure and format code. Pick a style you like, and stick to it. Structure your entire code base the same way, and ask that collaborators use the same standards. In the Python world, we have PEP8; but use whatever your community recommends.
Finally, your scientific code will likely be of better quality if it gets made public. Why? Because if you know your code is going to be published, you will take better care in its construction, documentation, and testing. No one wants to have others look at their poor quality work!
There are several ways to make code public. The easiest is to publish it in a public git repository on, e.g., GitHub or GitLab. This also allows external developers to submit changes for your consideration.
If that is not an option, a copy of the software can be uploaded to a public repository, where it will receive a Digital Object Identifier, or DOI—essentially a snapshot, that shows the state of the software at a specific point in time.
A third option is to simply upload the software to your website. It’s the way we used to do it, but we now have better mechanisms.
Whichever medium you use, remember to add a license to your software. Without a license, your software cannot be reused by others. It is not simply public domain without a license; quite the contrary.
Scientific software has the potential to change our understanding of the world. Discoveries, however, depend on accurate results, which are more likely to arise from high quality code. We suggest you write your code in teams—it is not only more fun, but likely to be more widely usable. At a minimum, ensure that your code is tested and documented. Aim for reusability. Then, publish your work: it will make science more transparent, and it will make your code better.
A friend reviewing my talk pointed me to this article by Patrick Beukema that covers similar ground. Looks like we’re in agreement!
GaΓ«l Varoquaux, Beyond Computational Reproducibility, Let Us Aim for Reusability, IEEE CIS TC Cognitive and Developmental Systems (volume 32, number 2, 2016). ↩︎
Why Most Published Research Findings Are False, https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124 ↩︎
Jeff Sutherland, Doing Twice the Work in Half the Time, Chapter 3 ↩︎
I often want to capture tasks on the goβin a hurry. When there’s no time to fire up organice or Orgzly, being able to transcribe tasks comes in really handy.
In this post, I show how, on Android phones, you can hook up Google’s Assistant with org-mode, so that you can speak notes and have them appear as TODO items in a buffer.
First, we need to teach Google Assistant a new keyword, and tell it to store transcribed notes in an accessible location. We do this via the free If This Then That service. Add the “Log notes in a Google Drive spreadsheet” applet, and configure it as follows:
Add a task to $
new task $
task $
Google Assistant
This would allow you to say task <description>
and have Google Assistant log that to a spreadsheet in the Google Assistant
folder of your drive.
Save the applet and try it out: launch Google Assistant and say “task test out capture system”. Then, locate and open the new spreadsheet in your Google drive. The URL should be of the form:
https://docs.google.com/spreadsheets/d/8B...ZFk/edit#gid=0
Note down that long string after /d/
βthis is your spreadsheet ID.
Go to Tools -> Script Editor
, and include the script provided at
https://github.com/stefanv/org-assistant.
You have to customize two variables: the spreadsheet ID, and a random “token” (a password to make it harder for other to abuse the service).
Now, publish the script to the web: Publish -> Deploy as web app...
. Set Who has access to the app
to Anyone, even anonymous
and note down the published URL.
I have the following script that downloads TODOs and append them to an org-file:
#!/bin/bash
ASSISTANT_TO_ORG_URL="url-to-the-web-app"
ORG_INBOX="${HOME}/org/assistant-inbox.org"
TOKEN='token-value'
curl -s -S -L -d "$TOKEN" "$ASSISTANT_TO_ORG_URL?clear=1" >> $ORG_INBOX
I then have the following in my daily org checklist:
[[shell:~/scripts/assistant-tasks.sh][fetch tasks]] : [[file:~/org/assistant-inbox.org][tasks]]
The first link launches the script that fetches the latest tasks, and the second opens the tasks file.
Having a quick, hands-free way to capture tasks has been tremendously helpful to me. I hope you find it useful too!
]]>A few members of the audience familiar with scientific Python told me they had learned something, so I’ll highlight the few topics that I think may have qualified.
The first official release of SciPy was in 2001, and a mere 16 years later we reached 1.0. This says a lot about the developer community, and how careful they are to label their own work as “mature”! To celebrate this project milestone, we published a preprint on arXiv that outlines the project history and its current status. It mentions, among other achievements, that SciPy was instrumental in the first gravitational wave detection, as well as the recent imaging of the black hole in Messier 87.
__array_function__
protocolThe 1.17 release of NumPy (2019-07-26) has support for a new array function protocol, that allows external libraries to pass their array-like objects through NumPy without them being horribly mangled. E.g., you may call NumPy’s sum
on a CuPy array: the computation will happen on the GPU, and the resulting array will still be a CuPy array.
Here is an example:
In [24]: import cupy as cp
In [25]: x = cp.random.random([10, 10])
In [26]: y = x.sum(axis=0)
In [27]: type(y), y.shape
Out[27]: (cupy.core.core.ndarray, (10,))
In [28]: import numpy as np
In [29]: z = np.sum(x, axis=0)
In [30]: type(z), z.shape
Out[30]: (cupy.core.core.ndarray, (10,))
Note how the result is the same, whether you use CuPy or NumPy’s sum
.
Whereas NumPy used to be the reference implementation for array computation in Python, it is fast evolving into a standard API, implemented by multiple libraries.
Images in scientific Python (scikit-image
, opencv
, etc.) are represented as NumPy arrays. It is trivial to pass these arrays into deep learning libraries such as TensorFlow:
from tensorflow.keras.applications.inception_v3 import (
InceptionV3, preprocess_input, decode_predictions
)
from skimage import transform
net = InceptionV3()
def inception_predict(image):
# Rescale image to 299x299, as required by InceptionV3
image_prep = transform.resize(image, (299, 299, 3), mode='reflect')
# Scale image values to [-1, 1], as required by InceptionV3
image_prep = (img_as_float(image_prep) - 0.5) * 2
predictions = decode_predictions(
net.predict(image_prep[None, ...])
)
plt.imshow(image, cmap='gray')
for pred in predictions[0]:
(n, klass, prob) = pred
print(f'{klass:>15} ({prob:.3f})')
For example, when running inception_predict
on skimage.data.chelsea()
, I get:
Egyptian_cat (0.904)
tabby (0.054)
tiger_cat (0.035)
lynx (0.000)
plastic_bag (0.000)
Looks about right!
Philipp Hanslovsky, at SciPy2019, demonstrated his Python β Java bridge called imglyb
. In contrast to many previous efforts, this library allows you to share memory between Python and Java, avoiding costly (and, potentially fatal, dependent on memory constraints) reallocations. E.g., he showed how to manipulate volumes of data (3-D arrays) in Python, and to then view those using ImageJ’s impressive BigDataViewer, which can rapidly slice through the volume at an arbitrary plane.
dask
This is a trick I borrowed from Matt Rocklin’s blog post.
When you have a number of large images that, together, form a stack (3-D volume), it may not be possible to load the entire stack into memory. Instead, you can use dask
to lazily access parts of the volume on an as-needed basis.
This is achieved in four steps:
Convert skimage.io.imread
into a delayed function, i.e. instead of returning the image itself it returns a dask
Delayed
object (similar to a Future or a Promise), that can fetch the image when needed.
Use this function to load all images. The operation is instantaneous, returning a list of Delayed
objects.
Convert each Delayed
object to a dask
Array
.
Stack all of these dask
Array
s to form the volume.
Note that each one of these steps should execute almost instantaneously; no images files are accessed on disk: that only happens once we start operating on the dask
Array
volume.
Here is the code:
from glob import glob
from dask import delayed
import dask.array as da
from skimage import io
# Read one image to get dimensions
image = io.imread('samples/Test_TIRR_0_1p5_B0p2_01000.tiff')
# Turn imread into a delayed function, so that it does not immediately
# load an image file from disk
imread = delayed(io.imread, pure=True)
# Create a list of all our samples; since a delayed version of `imread`
# is used, no work is done immediately
samples = [imread(f) for f in sorted(glob('samples/*.tiff'))]
# Convert each "delayed" object in the list above into a dask array
sample_arrays = [da.from_delayed(sample, shape=image.shape, dtype=np.uint8) for sample in samples]
# Stack all these arrays into a volume
vol = da.stack(sample_arrays)
I have 101 slices of 2048x2048 each, so the resulting dask
Array
volume (at this stage fully virtual, without any data inside) is:
We can do numerous operations on this array, such as summing it with vol.sum(axis=0)
, although this still yields an uncomputed dask
Array
. To get actual values, we need to call:
vol.sum(axis=0).compute()
To visualize a volume like the one above, I could have sliced into it and displayed the result using matplotlib
. However, I used this opportunity to play around with a brand new open source image viewer called Napari.
Napari allows you to visualize layers interactively, similarly to GIMP or Photoshop. In Napari’s case, these layers can be images, labels, points, and a few others.
While this isn’t explicitly documented (Napari is still in alpha!), I had some insider knowledge (π J!) that Napari supports both dask
and Zarr arrays. So, we can pass in our volume from the example above as follows:
import napari
with napari.gui_qt():
viewer = napari.view(vol, clim_range=(0, 255))
(Instead of the context manager, you may also use %gui = qt
in Jupyter or IPython.)
I also happened to have ground truth labels available, so I loaded those up the same way I did the volume, and added it to the visualization:
viewer.add_labels(labels, name='Labels')
If you’d like to play with Napari yourself, I have a 3D cell segmentation example available online.
Toward the conclusion of my talk, I emphasized the role of community in building healthy scientific software ecosystems. In the end, it is all about people. I briefly highlight two community groups:
PanGeo, whom I think sets a great example of how to organize field-specific interest around existing open source tools, and building scalable online analysis platforms without reinventing the wheel.
OME, the Open Microscopy Environment, who is leading the charge on open data exchange formats for microscopy. Interestingly, it looks like Zarrβthe chunked, compressed array containerβmay well be part of the next open standard they recommend.
Thank you to the organizers of M&M 2019 for inviting me to speak; I very much enjoyed our session, and look forward to working with this community on making scientific Python an even better platform for mirocroscopy analysis!
]]>IA will also remain quite essential, because for the foreseeable future, computers will not be able to match humans in their ability to reason abstractly about real-world situations. We will need well-thought-out interactions of humans and computers to solve our most pressing problems. And we will want computers to trigger new levels of human creativity, not replace human creativity (whatever that might mean).
A week ago, I attended a family event where I got pulled into a surprisingly animated argument around Google Maps: is it accurate, is it helpful, and what are we losing by relying on it. The argument took all the predictable turns (yes, kids nowadays cannot use maps any more, and if Google decided to summon us all to the Mountainview HQ as an April Fool’s joke, the results would be comically sad). But an interesting outcome was the question: “How can we better design systems to help humans make good decisions?”
E.g., Google Maps tells you where to driveβthey maybe even give you one or two route options. But the overarching goal is to get you to your destination in the shortest amount of time. What if you felt like taking a scenic drive, or wanted to explore a bit? In that case, a map that showed a compass and traffic for all nearby roads would be much more helpful. How many times do we drive past a national monument such as Bodie, or a street festival on the next block over without realizing it? Maps could certainly alert us to these.
At the Berkeley Institute for Data Science, I build a lot of open source research software. I’ve learned that systems that work with humans are often both simpler to develop and ultimately more effective than fully automated systems. When we wrote Inselect with the Natural History Museum, it would have been very hard to do a 100% accurate segmentation of insect speciments (especially since many of the photos this would be applied to contained insects unseen during training). But if you can provide reasonable accuracy, humans can easily adjust for minor discrepancies and still save a lot of time.
With this blog post, I encourage software designers to:
Think about how to best empower your users, rather than to prescribe their behavior implicitly through design decisions. Be mindful that users may have experience to contribute and a desire to execute their own plan; augment their ability to do so effectively.
Circling back to Jordan’s article, I encourage you to read the various commentaries (for now, the easiest way to find them is to scroll down to the article on the journal front-page). I enjoyed, e.g., David Donoho’s, where he discusses the requirements for “true intelligence” in AI, although I think he may have misinterpreted Jordan’s intent with the term “augmented intelligence”.
I’ll end with a quote from Greg Cane’s commentary:
]]>For the humanist, intelligence augmentation must now and forever be our goal. Machine learning only matters insofar as it makes us fundamentally more intelligent and deepens our understanding.
message://
.
See “Other Systems” below.
org-mode is, to me, is one of the most valuable parts of the emacs ecosystem. I use it to take notes, plan projects, manage tasks, and write & publish documents.
Nowadays, a lot of work arrives via email, and so it is helpful to be able to refer to messages directly from my notes or lists of tasks.
The simplest option might be to store URLs pointing to an online inbox such as Fastmail or GMail, but I wanted a solution that was both future proof (i.e., what if I moved my emails to a different provider?) and worked with my terminal-based mail client of choice, neomutt.
I started with a solution provided by Stefano Zacchiroli, and simplified it for my specific use-case.
The solution has two parts: sending email links from neomutt to Emacs,
and later opening those links from Emacs by invoking neomutt. The
first achieved via org-protocol
, the latter via launching neomutt
and then simulating keypresses.
When launching neomutt, we have to tell it in which directory the
message lives. We therefore use notmuch
to find the message file
first, based on its Message-ID. maildir-utils
would be another way
of doing so. Please note that you have to have notmuch or
maildir-utils set up already for this scheme to work.
I initially avoided the org-protocol
package, because installation
looked complicated. That, it turns out, is only the case if you care
about web browser integration, which we don’t.
First, we have a Python script that can parse an e-mail and share the
Message-ID and Subject with emacs. I call it mutt-save-org-link.py
,
and make it executable using chmod +x mutt-save-org-link.py
.
#!/usr/bin/env python3
import sys
import email
import subprocess
import urllib.parse
# Parse the email from standard input
message_bytes = sys.stdin.buffer.read()
message = email.message_from_bytes(message_bytes)
# Grab the relevant message headers
message_id = urllib.parse.quote(message['message-id'][1:-1])
subject = message['subject']
# Ask emacsclient to save a link to the message
subprocess.Popen([
'emacsclient',
f'org-protocol://store-link?url=message://{message_id}&title={subject}'
])
We then configure neomutt (typically in ~/.muttrc
) to call the
script with a shortcut. I chose Esc-L (the same as Alt-L).
macro index,pager \el "|~/scripts/mutt-save-org-link.py\n"
Using org-protocol
, we instruct emacsclient to intercept URLs with
the org-protocol://
scheme, as used by our mutt-save-org-link.py
script. We also tell org-mode how to handle special URLs of the form
message://message-id+goes_here@mail.gmail.com
. Neomutt needs to know
which Maildir folder to open, so we ask notmuch
to tell us where the
message is located.
In my ~/.emacs
file I have:
; Make sure org-protocol is loaded
; Now, org-protocol:// schemas are intercepted.
(require org-protocol)
; Call this function, which spawns neomutt, whenever org-mode
; tries to open a link of the form message://message-id+goes_here@mail.gmail.com
(defun stefanv/mutt-open-message (message-id)
"In neomutt, open the email with the the given Message-ID"
(let*
((message-id (replace-regexp-in-string "^/*" "" message-id))
(mail-file
(replace-regexp-in-string
"\n$" "" (shell-command-to-string
(format "notmuch search --output=files id:%s" message-id))))
(mail-dir (replace-regexp-in-string "/\\(cur\\|new\\|tmp\\)/$" ""
(file-name-directory mail-file)))
(process-id (concat "neomutt-" message-id))
(message-id-escaped (regexp-quote message-id))
(mutt-keystrokes
(format "l~i %s\n\n" (shell-quote-argument message-id-escaped)))
(mutt-command (list "neomutt" "-R" "-f" mail-dir
"-e" (format "push '%s'" mutt-keystrokes))))
(message "Launching neomutt for message %s" message-id)
(call-process "setsid" nil nil
"-f" "gnome-terminal" "--window" "--"
"neomutt" "-R" "-f" mail-dir
"-e" (format "push '%s'" mutt-keystrokes))))
; Whenever org-mode sees a link starting with `message://`, it
; calls our `mutt-open-message` function
(org-add-link-type "message" 'stefanv/mutt-open-message)
There are a few caveats: if you use maildir-utils
, the search
command is mu find -f l i:%s
instead of notmuch; and if you are not
on Linux, then setsid
(which we use to launch a detached background
process) is not going to work, and you will want to use a different
terminal emulator.
Charl Botha mentioned in the comments that, on
MacOS,
org-mac-link
lets you grab hyperlinks from a wide variety of apps. Email messages,
specifically, are stored as message://message-id
URLs, which MacOS
knows how to open. This post has been updated to use the same link schema.
That’s it! I’ve added the code to https://github.com/stefanv/org-neomutt. Please file issues and PRs there, or tell me about your use cases in the comments below.
]]>Since an org-mode buffer can be searched just like any other, I can
simply invoke forward search with C-s
, but this will match all
occurrences of the text, instead of limiting the search to headings only.
This makes it hard to search for a phrase like “Travel”, for which I have a top-level heading, but also often occurs elsewhere in my notes.
I have a solution of the following form:
^*
so that only headings are
matched.First, define a custom search function. It puts the keys ^*
in the
“unread command events” list (i.e, a list of events waiting to be seen
by emacs), and then launches interactive forward regular expression search.
(defun stefan/isearch-heading ()
(interactive)
(setq unread-command-events (listify-key-sequence "^* "))
(isearch-mode t t nil t))
Next, we add a keybinding for org-mode:
(defun org-mode-keys ()
(interactive)
(local-set-key (kbd "C-c g") 'stefan/isearch-heading)
)
(add-hook 'org-mode-hook 'org-mode-keys)
And that’s it! Pressing C-c g
(for “go”) in org-mode will
now present you with a search prompt. Typing a heading name will take you
there directly, at which point you can choose to expand it with the
TAB key.
There are plenty of potential use cases, but consider, e.g., that you want to verify a credit card number submitted by your user. Traditionally, you’d submit the number, and then poll the backend repeatedly from the browser. Not very elegant :/
But with a WebSocket connection, you submit the credit card number and then forget about it. The server will let you know when it’s done by pushing a message to the frontend.
Not only does this solve the annoying polling problem, but it opens up the door to an entirely new universe of tools, such as Dan Abramov’s fantastic Redux. Many of these Javascript libraries rely on the server being able to notify the frontend when it needs to update itself.
Let’s talk a bit about Redux. The principles behind it are simple and elegant:
That centralization in turn enables other features such as logging, hot reloading, time travel, etc.
One of the great joys of Redux lies in moving away from the traditional
Model-View-Controller pattern. With MVC, you are never quite sure how changes
propagate through the system. With Redux, it is highly predictable. Say
your app has a toggle button, and associated state {toggle: true}
. An
action (e.g. “the red button was clicked”) is submitted to the central
dispatcher which then calculates the new state of the app:
new_state = reduce(current_state, action)
The implementation could look something like this:
function reduce(current_state, action) {
switch (action.type) {
case 'toggle_button':
return {toggle: !current_state.toggle}
default:
return current_state;
}
}
The toggle button monitors the app state, and when state['toggle']
is
updated, re-renders itself.
By vastly simplifying flow of information, by disentangling mutation and asynchronicity, and by getting rid of JQuery & hidden state stored somewhere in the bowels of the DOM, Redux has, for me, returned the joy of web development.
But, I’m getting distracted. WebSockets—in Python!
Pushing messages from your Python web server to the user’s browser can now be as simple as this:
from Flow import flow
self.flow.push('my_user@domain.com', 'message to the user',
{'data': 'to ship along'})
Please take a look at the more detailed technical description (with code!) on the Cesium blog.
]]>To work around the issue, modify tensorflow/workspace.bzl
and
change the re2 description to:
native.git_repository(
name = "com_googlesource_code_re2",
remote = "https://github.com/stefanv/re2.git",
commit = "86503cb89d82b723ae0bce35e1e09524910cd319",
)
The re2 library is now downloaded from my fork, which applies a one line patch.
Compile the TensorFlow Python package as usual with:
bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
After installing the pip wheel using
pip install /tmp/tensorflow_pkg/*.whl
you should have a working installation. If importing fails with
ImportError: cannot import name 'pywrap_tensorflow'
switch out of the TensorFlow source directory and try again.
]]>I bought rent more than 120 Kindle books, and to my shame only realized the invasive nature of these terms now.
Since your agreement with Amazon is simply a license to view material, they have the ability to withdraw that right whenever they choose. Which they have done, in an ironic twist, with George Orwell’s 1984.
As you may imagine, this whole situation riles me up quite badly, and spoils many a dinner-time conversation (sorry, friends). I am particularly irate because I just bought myself a beautiful Kobo Glo HD, in a slow but measured move to lessen my dependence on Amazon. Now, lo and behold, I cannot access any of the 120 books which, if I gave my money to other ebook stores, I would have owned!
So, caveat emptor, dear reader. Do not support Amazon’s great Kindle Swindle.
If you buy books from Kobo, you will be able to export your books to your Kindle devices, albeit encumbered by Digital Rights Management (DRM).
While this post is about Amazon Kindle, it is worth mentioning Digital Rights Management (DRM). DRM is a mechanism that distributers use to lock their books so that you may not easily copy it illegally. Of course, and very predictably, the DRM is not difficult to circumvent, and those who want to steal electronic books do so with impunity. Still, it is probably illegal to remove DRM in the United States, and if you’re a law abiding citizen, you cannot convert a protected ebook to another format such as PDF, even if a PDF would be more convenient for you.
Some distributors are worse than others, though, and unsurprisingly Amazon is one of the worst. Not only do they lock their books (making it difficult to copy to other devices), but they use a proprietary format (.mobi) that only works on their readers. If you want to copy their books to another device, you have to first remove the DRM which, as their terms state, is not allowed. You might start to wonder whether such business practices aren’t anti-competitive and perhaps even illegal.
There is a perfectly usable format for ebooks already in existence (.epub), that is also supported by many different models of readers. So, while locking books with some kind of DRM is far from ideal, it is still preferable to using a proprietary format to boot.
For those buying from the Kobo store, some of the books on their shelves are available DRM-free, and is indicated as follows:
Defective by Design maintains a list of DRM-free publishers and stores.
Please support the Electronic Frontier Foundation, who is currently challenging DMCA Section 1201 (which makes it illegal to circumvent DRM). There is no good reason why you should not be allowed to read material you bought on any device of your choosing. For more on the DRM, refer to author Cory Doctorow’s articles in the Guardian.
Below follows an excerpt, mentioned above, from the Amazon Kindle Store’s Terms of Use.
]]>1. Kindle Content
Use of Kindle Content. Upon your download of Kindle Content and payment of any applicable fees (including applicable taxes), the Content Provider grants you a non-exclusive right to view, use, and display such Kindle Content an unlimited number of times, solely through a Reading Application or as otherwise permitted as part of the Service, solely on the number of Supported Devices specified in the Kindle Store, and solely for your personal, non-commercial use. Kindle Content is licensed, not sold, to you by the Content Provider. The Content Provider may include additional terms for use within its Kindle Content. Those terms will also apply, but this Agreement will govern in the event of a conflict. Some Kindle Content, such as interactive or highly formatted content, may not be available to you on all Reading Applications.
Limitations. Unless specifically indicated otherwise, you may not sell, rent, lease, distribute, broadcast, sublicense, or otherwise assign any rights to the Kindle Content or any portion of it to any third party, and you may not remove or modify any proprietary notices or labels on the Kindle Content. In addition, you may not attempt to bypass, modify, defeat, or otherwise circumvent any digital rights management system or other content protection or features used as part of the Service.
brew install python3
pyvenv -v ~/envs/py3
source ~/envs/py3/bin/activate
pip install matplotlib
Pros/cons:
conda create -n py3 python=3.5 matplotlib
source activate py3
Pros/cons:
Some members of the community maintain their own channels, but there are still some issues to be aware of when mixing those channels and the official ones. ↩︎
I needed a way of blocking any access to the internet, unless it was leaving through the VPN (since I was sure I’d notice that pretty quickly).
I’d have preferred to use firewalld, which is neatly integrated into Ubuntu, but as of 01/24/2015 it doesn’t allow filtering outbound traffic1. What follows, then, is a simple approach implementing the following rule:
Block wifi traffic unless it goes to either the local network or the VPN.
Create a script (I called it ~/scripts/fw-up
) that sets up the
firewall:
#!/bin/bash
# Clear any existing rules
iptables -F
# Allow outbound DNS
iptables -A OUTPUT -p udp --dport 53 -j ACCEPT
iptables -A INPUT -p udp --sport 53 -j ACCEPT
# Allow TCP access to the work VPN
# Replace X.Y below with your VPN address range
iptables -A OUTPUT -p tcp -d X.Y.0.0/16 -o wlan1 -j ACCEPT
# Allow any traffic destined for the vpn to go out
iptables -A OUTPUT -o vpn0 -j ACCEPT
# Allow local traffic
iptables -A OUTPUT -p tcp -o wlan1 -d 10.0.0.0/8 -j ACCEPT
iptables -A OUTPUT -p tcp -o wlan1 -d 172.16.0.0/12 -j ACCEPT
iptables -A OUTPUT -p tcp -o wlan1 -d 192.168.0.0/16 -j ACCEPT
# Drop everything else on the wifi
iptables -A OUTPUT -p tcp -o wlan1 -j DROP
Make sure the script is set to executable (chmod +x fw-up
).
Add a symlink to if-up.d
to ensure that the firewall gets built
whenever the network is reconfigured:
sudo ln -s ~/scripts/fw-up /etc/network/if-up.d/iptables
Now, whenever you connect to a wifi hotspot, internet traffic will be blocked until you fire up your VPN. If, on occasion, you need to work without the VPN, simply raze the firewall:
sudo iptables -F
Update: Typically, you wouldn’t want the firewall to go up when
connecting to the wireless at work, so I added the following conditional
in the fw-up
script:
iptables -F
if [[ `iwgetid -r` != 'WorkSSID' ]]; then
# Firewall rules go here
...
fi
In version 1.2.0 of Docker, the image dependency tree is available via
the docker images --tree
command:
$ docker images --tree
Warning: '--tree' is deprecated, it will be removed soon. See usage.
ββ511136ea3c5a Virtual Size: 0 B
β ββ5bc37dc2dfba Virtual Size: 192.5 MB
β β ββ61cb619d86bc Virtual Size: 192.7 MB
β β ββ3f45ca85fedc Virtual Size: 192.7 MB
β β ββ78e82ee876a2 Virtual Size: 192.7 MB
β β ββdc07507cef42 Virtual Size: 192.7 MB
β β ββ86ce37374f40 Virtual Size: 192.7 MB
β β ββd76983dc2ebd Virtual Size: 213.3 MB
β β ββ04a01662a6a8 Virtual Size: 214.5 MB
β β ββ7769c00dfefe Virtual Size: 525.9 MB
β β ββ6ac8d6e449b1 Virtual Size: 525.9 MB
β β ββe3a84ca24205 Virtual Size: 525.9 MB
β β ββ26f10d07659d Virtual Size: 525.9 MB
β ββe12c576ad8a1 Virtual Size: 198.9 MB
β β ββ102eb2a101b8 Virtual Size: 199.1 MB
β β ββ530dbbae98a0 Virtual Size: 199.1 MB
β β ββ37dde56c3a42 Virtual Size: 199.1 MB
β β ββ8f118367086c Virtual Size: 228.5 MB
β β ββ277eb4304907 Virtual Size: 228.5 MB Tags: ubuntu:utopic, ubuntu:14.10
...
However, the Docker team is trying to streamline its client, and has scheduled this feature for deprecation. How, then, do we replicate its behavior?
Enter DockerViz. Grab a binary from gobuild.io and place it somewhere on your path.
The Docker server can be queried via its
public API.
It is typically available either on http://localhost:4243
or
/var/run/docker.sock
.
One of the following two calls should therefore extract the image status:
curl -s http://localhost:4243/images/json?all=1
echo -e "GET /images/json?all=1 HTTP/1.0\r\n" | nc -U /var/run/docker.sock
On my machine, the second query returns:
HTTP/1.0 200 OK
Content-Type: application/json
Date: Sun, 18 Jan 2015 17:41:34 GMT
[{"Created":1421528518,"Id":"d6244a9e8b5ff885579c8c7d203e4da703e3e80621449dbbd58c365dba5c83b1","ParentId":"b68521997660ae8a6916037696cf716ca415bba0766487bfa5b79cda4adfb62c","RepoTags":["datascience-base:latest"],"Size":0,"VirtualSize":2041562468}
,{"Created":1421528517,"Id":"b68521997660ae8a6916037696cf716ca415bba0766487bfa5b79cda4adfb62c","ParentId":"d3cb571e5e16fce16a59c16c87e01ea4051d7cae016dba90688c9e4a53a921c4","RepoTags":["\u003cnone\u003e:\u003cnone\u003e"],"Size":0,"VirtualSize":2041562468}
...
DockViz parses this JSON and outputs a formatted tree:
$ cat ~/scripts/docktree
echo -e "GET /images/json?all=1 HTTP/1.0\r\n" | nc -U /var/run/docker.sock | tail -n +5 | dockviz images --tree
$ docktree
ββ511136ea3c5a Virtual Size: 0.0 B
β ββ5bc37dc2dfba Virtual Size: 192.5 MB
β β ββ61cb619d86bc Virtual Size: 192.7 MB
β β ββ3f45ca85fedc Virtual Size: 192.7 MB
β β ββ78e82ee876a2 Virtual Size: 192.7 MB
β β ββdc07507cef42 Virtual Size: 192.7 MB
β β ββ86ce37374f40 Virtual Size: 192.7 MB
β β ββd76983dc2ebd Virtual Size: 213.3 MB
β β ββ04a01662a6a8 Virtual Size: 214.5 MB
β β ββ7769c00dfefe Virtual Size: 525.9 MB
β β ββ6ac8d6e449b1 Virtual Size: 525.9 MB
β β ββe3a84ca24205 Virtual Size: 525.9 MB
β β ββ26f10d07659d Virtual Size: 525.9 MB
β ββe12c576ad8a1 Virtual Size: 198.9 MB
β β ββ102eb2a101b8 Virtual Size: 199.1 MB
β β ββ530dbbae98a0 Virtual Size: 199.1 MB
β β ββ37dde56c3a42 Virtual Size: 199.1 MB
β β ββ8f118367086c Virtual Size: 228.5 MB
β β ββ277eb4304907 Virtual Size: 228.5 MB Tags: ubuntu:14.10, ubuntu:utopic
Note that, on my system, the first branch of the tree is dangling, i.e. not associated with a tagged image–I must have removed a tagged image earlier, and these are its remaining dependencies.
Built and downloaded Docker images quickly gobble up a lot of space:
$ sudo du -hcs /var/lib/docker/
10G /var/lib/docker/
10G total
The docker images
command allows us to
list dangling images:
docker images --filter dangling=true --quiet
And we obtain a list of containers (images that were fired up and modified) using:
docker ps -aq
I remove both of these with the following script (WARNING: This will delete ALL containers and any unused, downloaded images, so use with caution!):
#!/bin/bash
CONTAINERS=$(docker ps -aq)
IMAGES=$(docker images --filter dangling=true --quiet)
if [[ $CONTAINERS ]]; then
docker rm $CONTAINERS
else
echo "No containers to remove"
fi
if [[ $IMAGES ]]; then
docker rmi $IMAGES
else
echo "No dangling images to remove"
fi
Then:
$ docker-clean
$ sudo du -hcs /var/lib/docker/
6.6G /var/lib/docker/
6.6G total
$ docktree
ββ511136ea3c5a Virtual Size: 0.0 B
ββe12c576ad8a1 Virtual Size: 198.9 MB
β ββ102eb2a101b8 Virtual Size: 199.1 MB
β ββ530dbbae98a0 Virtual Size: 199.1 MB
β ββ37dde56c3a42 Virtual Size: 199.1 MB
β ββ8f118367086c Virtual Size: 228.5 MB
β ββ277eb4304907 Virtual Size: 228.5 MB Tags: ubuntu:utopic, ubuntu:14.10
ββd497ad3926c8 Virtual Size: 192.5 MB
β ββccb62158e970 Virtual Size: 192.7 MB
β ββe791be0477f2 Virtual Size: 192.7 MB
...
Note that, now, all branches of the tree are associated with tagged
images. If I remove ubuntu:utopic
, I can again run the pruning
process to get rid of its left-over dependencies.
This may come as a surprize to some, since in the past we have been unable to publish the proceedings in a timely manner. So, what changed?
For 2013 we followed a very light-weight review process, via comments on GitHub pull-requests. This change has an important consequence: in contrast to the traditional review process, where reviewers critically pull apart papers, the process now changes into a constructive conversation–the reviewer becomes an ally to the author, helping them to get their paper signed off on.
In addition, this is a very familiar process to most members of our community who regularly collaborate to open source projects. Most such projects nowadays follow a similar methodology for discussing and integrating patches.
Since we can’t expect reviewers to check out and build the papers themselves, a paper build bot is provided to generate PDFs from pull-requests, which contain papers in plain-text ReStructuredText format (see the proceedings repository for examples, and all papers starting 2010).
For authors, tools are provided to convert the ReStructuredText papers to PDFs in IEEE Computer Society paper style.
We welcome your feedback on the proceedings! If you spot a mistake, please submit a pull request on GitHub.
Finally, a big shout-out to the amazing team of people who organized this year’s conference, and to the wonderfully inclusive and talented Scientific Python community, of which I am proud to be part of.
]]>Here, then, is a summary of the simple but effective super-resolution algorithm described therein:
http://arxiv.org/abs/1210.3404
I also submitted this work to NIPS: the reviewers liked the paper, but they were not convinced of its novelty. Having spent a lot of time studying the existing literature, all I can say in response is that, while solving the problem as a sparse linear system was well known at the time, phrasing Drizzle as a linear operator and using it for super-resolution was not.
But the proof of the pudding is in the eating! Have a look at the results and published code – you can download it all (including a sample data-set) and play with the different reconstruction parameters. Quite a bit of the code has since graduated into scikit-image.
]]>Scikits-image is an image processing toolbox for SciPy that includes algorithms for segmentation, geometric transformations, color space manipulation, analysis, filtering, morphology, feature detection, and more.
For more information, examples, and documentation, please visit our website.
It’s been only 3 months since scikits-image 0.6 was released, but in that short time, we’ve managed to add plenty of new features and enhancements, including
Plus, this release adds a number of bug fixes, new examples, and performance enhancements.
This release was only possible due to the efforts of many contributors, both new and old.
To install on Linux:
~/.fonts
.fc-cache -f -v
.The font should now be available for selection in apps such as Firefox, Gnome Terminal, etc. To make it the default font in Emacs::
(set-default-font "Source Code Pro")
Here’s a comparison of Consolas (left) and Source Code Pro (right):
Comments also on Google+.
]]>use-package
.
I recently tried to install MuMaMo as one of the dependencies for Takafumi Arakaki’s Emacs-based IPython notebook. The instructions on the MuMaMo webpage were as clear as mud and aimed primarily at Windows users. Enters apt-get for Emacs!
My Emacs setup is shared across multiple machines: a synchronized elisp
folder, containing *.el
files, along with my .emacs
configuration.
el-get
allows you to share your package installation folder in a similar
fashion. Here are some relevant configuration snippets:
; Everything gets installed into ~/elisp, a folder
; I sync across all my machines
(setq el-get-dir "~/elisp/el-get")
(setq el-get-install-dir "~/elisp/el-get/el-get")
(add-to-list 'load-path el-get-install-dir)
; If el-get is missing, install it automatically
(unless (require 'el-get nil t)
(url-retrieve
"https://raw.github.com/dimitri/el-get/master/el-get-install.el"
(lambda (s)
(goto-char (point-max))
(eval-print-last-sexp))))
; Install these packages, and call the specified configuration snippets
; after each load
(setq el-get-sources
'(
(:name ethan-wspace
:after (progn
(global-ethan-wspace-mode 1)
(set-face-background 'ethan-wspace-face "gray95")))
(:name column-marker
:after (add-hook 'font-lock-mode-hook
(lambda () (interactive) (column-marker-1 80))))
; Also install these packages, no configuration required
(setq my-packages
(append
'(el-get maxframe markdown-mode ein python)
(mapcar 'el-get-source-name el-get-sources)
)
)
; Check packages and install any that are missing
(el-get 'sync my-packages)
There are two ways to specify packages to be installed: either include them in
the my-packages
list, or add them to el-get-sources
, which in addition
allows further customization upon successful loading of the package.
What’s in your stack? Here’s my list of Emacs packages:
Org Mode, Ethan's wspace, Tab Bar, Column Marker, Max Frame, EIN, Python,
JS2
Do you know of any other useful packages? Let me know!
]]>