# arXiv Paper Daily: Wed, 12 Jul 最新

资料-
本文为去找网小编(www.7zhao.net)为您推荐的arXiv Paper Daily: Wed, 12 Jul 最新，希望对您有所帮助，谢谢！

Neural and Evolutionary Computing

Comments: 33 pages, 9 tables, 3 figures. This paper covers some of the topics of the talk “When the gray box was opened, model-based evolutionary algorithms were already there” presented in the Model-Based Evolutionary Algorithms workshop on July 20, 2016, in Denver

**Subjects**:

Neural and Evolutionary Computing (cs.NE)

The concept of gray-box optimization, in juxtaposition to black-box

optimization, revolves about the idea of exploiting the problem structure to

implement more efficient evolutionary algorithms (EAs). Work on factorized

distribution algorithms (FDAs), whose factorizations are directly derived from

the problem structure, has also contributed to show how exploiting the problem

structure produces important gains in the efficiency of EAs. In this paper we

analyze the general question of using problem structure in EAs focusing on

confronting work done in gray-box optimization with related research

accomplished in FDAs. This contrasted analysis helps us to identify, in current

studies on the use problem structure in EAs, two distinct analytical

characterizations of how these algorithms work. Moreover, we claim that these

two characterizations collide and compete at the time of providing a coherent

framework to investigate this type of algorithms. To illustrate this claim, we

present a contrasted analysis of formalisms, questions, and results produced in

FDAs and gray-box optimization. Common underlying principles in the two

approaches, which are usually overlooked, are identified and discussed.

Besides, an extensive review of previous research related to different uses of

the problem structure in EAs is presented. The paper also elaborates on some of

the questions that arise when extending the use of problem structure in EAs,

such as the question of evolvability, high cardinality of the variables and

large definition sets, constrained and multi-objective problems, etc. Finally,

emergent approaches that exploit neural models to capture the problem structure

are covered.

, , ,

**Subjects**: Neural and Evolutionary Computing (cs.NE)Artificial neural networks (ANNs) trained using backpropagation are powerful

learning architectures that have achieved state-of-the-art performance in

various benchmarks. Significant effort has been devoted to developing custom

silicon devices to accelerate inference in ANNs. Accelerating the training

phase, however, has attracted relatively little attention. In this paper, we

describe a hardware-efficient on-line learning technique for feedforward

multi-layer ANNs that is based on pipelined backpropagation. Learning is

performed in parallel with inference in the forward pass, removing the need for

an explicit backward pass and requiring no extra weight lookup. This saves 50\%

of the weight lookups needed by a standard online implementation of

backpropagation. By using binary state variables in the feedforward network and

ternary errors in truncated-error backpropagation, the need for any

multiplications in the forward and backward passes is removed, and memory

requirements for the pipelining are drastically reduced. For proof-of-concept

validation, we demonstrate on-line learning of MNIST handwritten digit

classification on a Spartan 6 FPGA interfacing with an external 1Gb DDR2 DRAM,

that shows small degradation in test error performance compared to an

equivalently sized ANN trained off-line using standard back-propagation and

exact errors. Our results highlight an attractive synergy between pipelined

backpropagation and binary-state networks in substantially reducing computation

and memory requirements, making pipelined on-line learning practical in deep

networks.

Comments: Submitted to NIPS 2017, Long Beach. YuXuan Liu and Abhishek Gupta had equal contribution

**Subjects**:

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

Imitation learning is an effective approach for autonomous systems to acquire

control policies when an explicit reward function is unavailable, using

supervision provided as demonstrations from an expert, typically a human

operator. However, standard imitation learning methods assume that the agent

receives examples of observation-action tuples that could be provided, for

instance, to a supervised learning algorithm. This stands in contrast to how

humans and animals imitate: we observe another person performing some behavior

and then figure out which actions will realize that behavior, compensating for

changes in viewpoint, surroundings, and embodiment. We term this kind of

imitation learning as imitation-from-observation and propose an imitation

learning method based on video prediction with context translation and deep

reinforcement learning. This lifts the assumption in imitation learning that

the demonstration should consist of observations and actions in the same

environment, and enables a variety of interesting applications, including

learning robotic skills that involve tool use simply by observing videos of

human tool use. Our experimental results show that our approach can perform

imitation-from-observation for a variety of real-world robotic tasks modeled on

common household chores, acquiring skills such as sweeping from videos of a

human demonstrator. Videos can be found at

, , ,

**Subjects**: Artificial Intelligence (cs.AI) Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)Deep neural networks excel in regimes with large amounts of data, but tend to

struggle when data is scarce or when they need to adapt quickly to changes in

the task. Recent work in meta-learning seeks to overcome this shortcoming by

training a meta-learner on a distribution of similar tasks; the goal is for the

meta-learner to generalize to novel but related tasks by learning a high-level

strategy that captures the essence of the problem it is asked to solve.

However, most recent approaches to meta-learning are extensively hand-designed,

either using architectures that are specialized to a particular application, or

hard-coding algorithmic components that tell the meta-learner how to solve the

task. We propose a class of simple and generic meta-learner architectures,

based on temporal convolutions, that is domain- agnostic and has no particular

strategy or algorithm encoded into it. We validate our

temporal-convolution-based meta-learner (TCML) through experiments pertaining

to both supervised and reinforcement learning, and demonstrate that it

outperforms state-of-the-art methods that are less general and more complex.

## Computer Vision and Pattern Recognition

,

**Subjects**: Computer Vision and Pattern Recognition (cs.CV)What defines a visual style? Fashion styles emerge organically from how

people assemble outfits of clothing, making them difficult to pin down with a

computational model. Low-level visual similarity can be too specific to detect

stylistically similar images, while manually crafted style categories can be

too abstract to capture subtle style differences. We propose an unsupervised

approach to learn a style-coherent representation. Our method leverages

probabilistic polylingual topic models based on visual attributes to discover a

set of latent style factors. Given a collection of unlabeled fashion images,

our approach mines for the latent styles, then summarizes outfits by how they

mix those styles. Our approach can organize galleries of outfits by style

without requiring any style labels. Experiments on over 100K images demonstrate

its promise for retrieving, mixing, and summarizing fashion images by their

style.

Comments: Accepted as Classification Challenge Track paper in CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

This paper introduces the system we developed for the Youtube-8M Video

Understanding Challenge, in which a large-scale benchmark dataset was used for

multi-label video classification. The proposed framework contains hierarchical

deep architecture, including the frame-level sequence modeling part and the

video-level classification part. In the frame-level sequence modelling part, we

explore a set of methods including Pooling-LSTM (PLSTM), Hierarchical-LSTM

(HLSTM), Random-LSTM (RLSTM) in order to address the problem of large amount of

frames in a video. We also introduce two attention pooling methods, single

attention pooling (ATT) and multiply attention pooling (Multi-ATT) so that we

can pay more attention to the informative frames in a video and ignore the

useless frames. In the video-level classification part, two methods are

proposed to increase the classification performance, i.e.

Hierarchical-Mixture-of-Experts (HMoE) and Classifier Chains (CC). Our final

submission is an ensemble consisting of 18 sub-models. In terms of the official

evaluation metric Global Average Precision (GAP) at 20, our best submission

achieves 0.84346 on the public 50% of test dataset and 0.84333 on the private

50% of test data.

Comments: 5 Pages

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

; Information Theory (cs.IT)

This paper presents a novel autonomous quality metric to quantify the

rehabilitations progress of subjects with knee/hip operations. The presented

method supports digital analysis of human gait patterns using smartphones. The

algorithm related to the autonomous metric utilizes calibrated acceleration,

gyroscope and magnetometer signals from seven Inertial Measurement Unit

attached on the lower body in order to classify and generate the grading system

values. The developed Android application connects the seven Inertial

Measurement Units via Bluetooth and performs the data acquisition and

processing in real-time. In total nine features per acceleration direction and

lower body joint angle are calculated and extracted in real-time to achieve a

fast feedback to the user. We compare the classification accuracy and

quantification capabilities of Linear Discriminant Analysis, Principal

Component Analysis and Naive Bayes algorithms. The presented system is able to

classify patients and control subjects with an accuracy of up to 100\%. The

outcomes can be saved on the device or transmitted to treating physicians for

later control of the subject’s improvements and the efficiency of physiotherapy

treatments in motor rehabilitation. The proposed autonomous quality metric

solution bears great potential to be used and deployed to support digital

healthcare and therapy.

Comments: 5 pages, 4 figures

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

This article provides next step towards solving speed bottleneck of any

system that intensively uses convolutions operations (e.g. CNN). Method

described in the article is applied on deformable part models (DPM) algorithm.

Method described here is based on multidimensional tensors and provides

efficient tradeoff between DPM performance and accuracy. Experiments on various

databases, including Pascal VOC, show that the proposed method allows

decreasing a number of convolutions up to 4.5 times compared with DPM v.5,

while maintaining similar accuracy. If insignificant accuracy degradation is

allowable, higher computational gain can be achieved. The method consists of

filters tensor decomposition and convolutions shortening using the decomposed

filter. Mathematical overview of the proposed method as well as simulation

results are provided.

Comments: 25 pages, 7 figures

Journal-ref: Computers & Geosciences, 99, 100-106

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

Conventional manual surveys of rock mass fractures usually require large

amounts of time and labor; yet, they provide a relatively small set of data

that cannot be considered representative of the study region. Terrestrial laser

scanners are increasingly used for fracture surveys because they can

efficiently acquire large area, high-resolution, three-dimensional (3D) point

clouds from outcrops. However, extracting fractures and other planar surfaces

from 3D outcrop point clouds is still a challenging task. No method has been

reported that can be used to automatically extract the full extent of every

individual fracture from a 3D outcrop point cloud. In this study, we propose a

method using a region-growing approach to address this problem; the method also

estimates the orientation of each fracture. In this method, criteria based on

the local surface normal and curvature of the point cloud are used to initiate

and control the growth of the fracture region. In tests using outcrop point

cloud data, the proposed method identified and extracted the full extent of

individual fractures with high accuracy. Compared with manually acquired field

survey data, our method obtained better-quality fracture data, thereby

demonstrating the high potential utility of the proposed method.

, , , ,

**Subjects**: Computer Vision and Pattern Recognition (cs.CV)Deep-learning has proved in recent years to be a powerful tool for image

analysis and is now widely used to segment both 2D and 3D medical images.

Deep-learning segmentation frameworks rely not only on the choice of network

architecture but also on the choice of loss function. When the segmentation

process targets rare observations, a severe class imbalance is likely to occur

between candidate labels, thus resulting in sub-optimal performance. In order

to mitigate this issue, strategies such as the weighted cross-entropy function,

the sensitivity function or the Dice loss function, have been proposed. In this

work, we investigate the behavior of these loss functions and their sensitivity

to learning rate tuning in the presence of different rates of label imbalance

across 2D and 3D segmentation tasks. We also propose to use the class

re-balancing properties of the Generalized Dice overlap, a known metric for

segmentation assessment, as a robust and accurate deep-learning loss function

for unbalanced tasks.

Comments: MICCAI 2017 Workshop on Deep Learning in Medical Image Analysis

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

Convolutional neural networks (CNNs) have been applied to various automatic

image segmentation tasks in medical image analysis, including brain MRI

segmentation. Generative adversarial networks have recently gained popularity

because of their power in generating images that are difficult to distinguish

from real images.

In this study we use an adversarial training approach to improve CNN-based

brain MRI segmentation. To this end, we include an additional loss function

that motivates the network to generate segmentations that are difficult to

distinguish from manual segmentations. During training, this loss function is

optimised together with the conventional average per-voxel cross entropy loss.

The results show improved segmentation performance using this adversarial

training procedure for segmentation of two different sets of images and using

two different network architectures, both visually and in terms of Dice

coefficients.

Comments: published in IEEE Intelligent Vehicles Symposium, 2017

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

In this paper, we present RegNet, the first deep convolutional neural network

(CNN) to infer a 6 degrees of freedom (DOF) extrinsic calibration between

multimodal sensors, exemplified using a scanning LiDAR and a monocular camera.

Compared to existing approaches, RegNet casts all three conventional

calibration steps (feature extraction, feature matching and global regression)

into a single real-time capable CNN. Our method does not require any human

interaction and bridges the gap between classical offline and target-less

online calibration approaches as it provides both a stable initial estimation

as well as a continuous online correction of the extrinsic parameters. During

training we randomly decalibrate our system in order to train RegNet to infer

the correspondence between projected depth measurements and RGB image and

finally regress the extrinsic calibration. Additionally, with an iterative

execution of multiple CNNs, that are trained on different magnitudes of

decalibration, our approach compares favorably to state-of-the-art methods in

terms of a mean calibration error of 0.28 degrees for the rotational and 6 cm

for the translation components even for large decalibrations up to 1.5 m and 20

degrees.

Comments: IEEE International Conference on Image Processing, 2017

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

Foreground detection has been widely studied for decades due to its

importance in many practical applications. Most of the existing methods assume

foreground and background show visually distinct characteristics and thus the

foreground can be detected once a good background model is obtained. However,

there are many situations where this is not the case. Of particular interest in

video surveillance is the camouflage case. For example, an active attacker

camouflages by intentionally wearing clothes that are visually similar to the

background. In such cases, even given a decent background model, it is not

trivial to detect foreground objects. This paper proposes a texture guided

weighted voting (TGWV) method which can efficiently detect foreground objects

in camouflaged scenes. The proposed method employs the stationary wavelet

transform to decompose the image into frequency bands. We show that the small

and hardly noticeable differences between foreground and background in the

image domain can be effectively captured in certain wavelet frequency bands. To

make the final foreground decision, a weighted voting scheme is developed based

on intensity and texture of all the wavelet bands with weights carefully

designed. Experimental results demonstrate that the proposed method achieves

superior performance compared to the current state-of-the-art results.

, , ,

**Subjects**: Computer Vision and Pattern Recognition (cs.CV) Optics (physics.optics)Single-pixel imaging (SPI) is a novel technique capturing 2D images using a

photodiode, instead of conventional 2D array sensors. SPI owns high

signal-to-noise ratio, wide spectrum range, and low cost. Various algorithms

have been proposed for SPI reconstruction, including linear correlation methods

that consider measurements as the correlations between scenes and modulation

patterns, alternating projection methods treating measurements as

zero-frequency coefficients of light fields in Fourier space, and compressive

sensing based methods introducing priors of natural images. However, there is

no comprehensive review discussing respective advantages, which is important

for SPI’s further applications and development. In this paper, we reviewed and

compared these algorithms in a unified reconstruction framework. Besides, we

proposed two other SPI algorithms including a conjugate gradient descent based

method aiming to fit measurement formation, and a Poisson maximum likelihood

based method utilizing photons’ Poisson statistic. Experimental results on both

simulated and real captured data validate the following conclusions: to obtain

comparable reconstruction accuracy, the compressive sensing based total

variation regularization method requires the least measurements and consumes

the least running time for small-scale reconstruction; the conjugate gradient

descent method and the alternating projection method run fastest in large-scale

cases; the alternating projection method is the most robust to measurement

noise. In a word, there are trade-offs between capture efficiency,

computational complexity and noise robustness among different SPI

reconstruction algorithms. We have released our source code for non-commercial

use.

Comments: 13 pages, 12 figures, SPIE conference on Wavelets and Sparsity 17. I’ve also included a compiled version

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

In this paper, we apply the scattering transform (ST), a nonlinear map based

off of a convolutional neural network (CNN), to classification of underwater

objects using sonar signals. The ST formalizes the observation that the filters

learned by a CNN have wavelet like structure. We achieve effective binary

classification both on a real dataset of Unexploded Ordinance (UXOs), as well

as synthetically generated examples. We also explore the effects on the

waveforms with respect to changes in the object domain (e.g., translation,

rotation, and acoustic impedance, etc.), and examine the consequences coming

from theoretical results for the scattering transform. We show that the

scattering transform is capable of excellent classification on both the

synthetic and real problems, thanks to having more quasi-invariance properties

that are well-suited to translation and rotation of the object.

Comments: The 6th international conference on analysis of images, social networks, and texts (AIST 2017), 27-29 July, 2017, Moscow, Russia

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

This paper deals with impulse noise removal from color images. The proposed

noise removal algorithm employs a novel approach with morphological filtering

for color image denoising; that is, detection of corrupted pixels and removal

of the detected noise by means of morphological filtering. With the help of

computer simulation we show that the proposed algorithm can effectively remove

impulse noise. The performance of the proposed algorithm is compared in terms

of image restoration metrics and processing speed with that of common

successful algorithms.

Comments: 24 pages

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

Generative Adversarial Networks (GAN) have attracted much research attention

recently, leading to impressive results for natural image generation. However,

to date little success was observed in using GAN generated images for improving

classification tasks. Here we attempt to explore, in the context of car license

plate recognition, whether it is possible to generate synthetic training data

using GAN to improve recognition accuracy. With a carefully-designed pipeline,

we show that the answer is affirmative. First, a large-scale image set is

generated using the generator of GAN, without manual annotation. Then, these

images are fed to a deep convolutional neural network (DCNN) followed by a

bidirectional recurrent neural network (BRNN) with long short-term memory

(LSTM), which performs the feature learning and sequence labelling. Finally,

the pre-trained model is fine-tuned on real images. Our experimental results on

a few data sets demonstrate the effectiveness of using GAN images: an

improvement of 7.5% over a strong baseline with moderate-sized real data being

available. We show that the proposed framework achieves competitive recognition

accuracy on challenging test datasets. We also leverage the depthwise separate

convolution to construct a lightweight convolutional RNN, which is about half

size and 2x faster on CPU. Combining this framework and the proposed pipeline,

we make progress in performing accurate recognition on mobile and embedded

devices.

Comments: Presented at the Salient360!: Visual attention modeling for 360{deg} Images Grand Challenge of the IEEE International Conference on Multimedia and Expo (ICME) 2017

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

; Multimedia (cs.MM)

We introduce a deep neural network for scanpath prediction trained on

360-degree images, and a temporal-aware novel representation of saliency

information named saliency volume. The first part of the network consists of a

model trained to generate saliency volumes, whose parameters are learned by

back-propagation computed from a binary cross entropy (BCE) loss over

downsampled versions of the saliency volumes. Sampling strategies over these

volumes are used to generate scanpaths over the 360-degree images. Our

experiments show the advantages of using saliency volumes, and how they can be

used for related tasks. Our source code and trained models available at

, ,

**Subjects**: Computer Vision and Pattern Recognition (cs.CV) Learning (cs.LG)The increase of vehicle in highways may cause traffic congestion as well as

in the normal roadways. Predicting the traffic flow in highways especially, is

demanded to solve this congestion problem. Predictions on time-series

multivariate data, such as in the traffic flow dataset, have been largely

accomplished through various approaches. The approach with conventional

prediction algorithms, such as with Support Vector Machine (SVM), is only

capable of accommodating predictions that are independent in each time unit.

Hence, the sequential relationships in this time series data is hardly

explored. Continuous Conditional Random Field (CCRF) is one of Probabilistic

Graphical Model (PGM) algorithms which can accommodate this problem. The

neighboring aspects of sequential data such as in the time series data can be

expressed by CCRF so that its predictions are more reliable. In this article, a

novel approach called DM-CCRF is adopted by modifying the CCRF prediction

algorithm to strengthen the probability of the predictions made by the baseline

regressor. The result shows that DM-CCRF is superior in performance compared to

CCRF. This is validated by the error decrease of the baseline up to 9%

significance. This is twice the standard CCRF performance which can only

decrease baseline error by 4.582% at most.

Comments: in Russian

Journal-ref: ITHEA, Information Content and Processing, 2014, 1 (3) , 262-268

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

The article describes developed information technology for real-time

recognition of handwritten mathematical expressions that based on proposed

approaches to handwritten symbols recognition and structural analysis.

Comments: To appear in CVPR 2017; data available on

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

There is more to images than their objective physical content: for example,

advertisements are created to persuade a viewer to take a certain action. We

propose the novel problem of automatic advertisement understanding. To enable

research on this problem, we create two datasets: an image dataset of 64,832

image ads, and a video dataset of 3,477 ads. Our data contains rich annotations

encompassing the topic and sentiment of the ads, questions and answers

describing what actions the viewer is prompted to take and the reasoning that

the ad presents to persuade the viewer (“What should I do according to this ad,

and why should I do it?”), and symbolic references ads make (e.g. a dove

symbolizes peace). We also analyze the most common persuasive strategies ads

use, and the capabilities that computer vision systems should have to

understand these strategies. We present baseline classification results for

several prediction tasks, including automatically answering questions about the

messages of the ads.

, , , , , , , ,

**Subjects**: Computer Vision and Pattern Recognition (cs.CV)Whole slide imaging (WSI) has recently been cleared for primary diagnosis in

the US. A critical challenge of WSI is to perform accurate focusing in high

speed. Traditional systems create a focus map prior to scanning. For each focus

point on the map, sample needs to be static in the x-y plane and axial scanning

is needed to maximize the contrast. Here we report a novel focus map surveying

method for WSI. The reported method requires no axial scanning, no additional

camera and lens, works for stained and transparent samples, and allows

continuous sample motion in the surveying process. It can be used for both

brightfield and fluorescence WSI. By using a 20X, 0.75 NA objective lens, we

demonstrate a mean focusing error of ~0.08 microns in the static mode and ~0.17

microns in the continuous motion mode. The reported method may provide a

turnkey solution for most existing WSI systems for its simplicity, robustness,

accuracy, and high-speed. It may also standardize the imaging performance of

WSI systems for digital pathology and find other applications in high-content

microscopy such as DNA sequencing and time-lapse live-cell imaging.

, , , ,

**Subjects**: Computer Vision and Pattern Recognition (cs.CV) Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)Achieving artificial visual reasoning – the ability to answer image-related

questions which require a multi-step, high-level process – is an important step

towards artificial general intelligence. This multi-modal task requires

learning a question-dependent, structured reasoning process over images from

language. Standard deep learning approaches tend to exploit biases in the data

rather than learn this underlying structure, while leading methods learn to

visually reason successfully but are hand-crafted for reasoning. We show that a

general-purpose, Conditional Batch Normalization approach achieves

state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4%

error rate. We outperform the next best end-to-end method (4.5%) which uses

data augmentation and even methods that use extra supervision (3.1%). We probe

our model to shed light on how it reasons, showing it has learned a

question-dependent, multi-step process. Previous work has operated under the

assumption that visual reasoning calls for a specialized architecture, but we

show that a general architecture with proper conditioning can learn to visually

reason effectively.

Journal-ref: Signal & Image Processing : An International Journal (SIPIJ)

Vol.8, No.3, June 2017

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a device, algorithm and graphical user interface to

obtain anthropometric measurements of foot. Presented device facilitates

obtaining scale of image and image processing by taking one image from side

foot and underfoot simultaneously. Introduced image processing algorithm

minimizes a noise criterion, which is suitable for object detection in single

object images and outperforms famous image thresholding methods when lighting

condition is poor. Performance of image-based method is compared to manual

method. Image-based measurements of underfoot in average was 4mm less than

actual measures. Mean absolute error of underfoot length was 1.6mm, however

length obtained from side foot had 4.4mm mean absolute error. Furthermore,

based on t-test and f-test results, no significant difference between manual

and image-based anthropometry observed. In order to maintain anthropometry

process performance in different situations user interface designed for

handling changes in light conditions and altering speed of the algorithm.

Comments: 5 pages, 3 figures, submitted to the non-archival track of the 1st Conference on Robot Learning (CoRL2017), Mountain View, California

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

; Robotics (cs.RO)

Unlike classification, position labels cannot be assigned manually by humans.

For this reason, generating supervision for precise object localization is a

hard task. This paper details a method to create large datasets for 3D object

localization, with real world images, using an industrial robot to generate

position labels. By knowledge of the geometry of the robot, we are able to

automatically synchronize the images of the two cameras and the object 3D

position. We applied it to generate a screw-driver localization dataset with

stereo images, using a KUKA LBR iiwa robot. This dataset could then be used to

train a CNN regressor to learn end-to-end stereo object localization from a set

of two standard uncalibrated cameras.

Comments: Rejected by ICDAR 2017

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG)

The handwritten string recognition is still a challengeable task, though the

powerful deep learning tools were introduced. In this paper, based on TAO-FCN,

we proposed an end-to-end system for handwritten string recognition. Compared

with the conventional methods, there is no preprocess nor manually designed

rules employed. With enough labelled data, it is easy to apply the proposed

method to different applications. Although the performance of the proposed

method may not be comparable with the state-of-the-art approaches, it’s

usability and robustness are more meaningful for practical applications.

, , ,

**Subjects**: Computer Vision and Pattern Recognition (cs.CV) Hardware Architecture (cs.AR)Convolutional neural network (CNN) offers significant accuracy in image

detection. To implement image detection using CNN in the internet of things

(IoT) devices, a streaming hardware accelerator is proposed. The proposed

accelerator optimizes the energy efficiency by avoiding unnecessary data

movement. With unique filter decomposition technique, the accelerator can

support arbitrary convolution window size. In addition, max pooling function

can be computed in parallel with convolution by using separate pooling unit,

thus achieving throughput improvement. A prototype accelerator was implemented

in TSMC 65nm technology with a core size of 5mm2. The accelerator can support

major CNNs and achieve 152GOPS peak throughput and 434GOPS/W energy efficiency

at 350mW, making it a promising hardware accelerator for intelligent IoT

devices.

,

**Subjects**: Machine Learning (stat.ML) Computer Vision and Pattern Recognition (cs.CV)Procedural terrain generation for video games has been traditionally been

done with smartly designed but handcrafted algorithms that generate heightmaps.

We propose a first step toward the learning and synthesis of these using recent

advances in deep generative modelling with openly available satellite imagery

from NASA.

Comments: Submitted to NIPS 2017, Long Beach. YuXuan Liu and Abhishek Gupta had equal contribution

**Subjects**:

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

Imitation learning is an effective approach for autonomous systems to acquire

control policies when an explicit reward function is unavailable, using

supervision provided as demonstrations from an expert, typically a human

operator. However, standard imitation learning methods assume that the agent

receives examples of observation-action tuples that could be provided, for

instance, to a supervised learning algorithm. This stands in contrast to how

humans and animals imitate: we observe another person performing some behavior

and then figure out which actions will realize that behavior, compensating for

changes in viewpoint, surroundings, and embodiment. We term this kind of

imitation learning as imitation-from-observation and propose an imitation

learning method based on video prediction with context translation and deep

reinforcement learning. This lifts the assumption in imitation learning that

the demonstration should consist of observations and actions in the same

environment, and enables a variety of interesting applications, including

learning robotic skills that involve tool use simply by observing videos of

human tool use. Our experimental results show that our approach can perform

imitation-from-observation for a variety of real-world robotic tasks modeled on

common household chores, acquiring skills such as sweeping from videos of a

human demonstrator. Videos can be found at

, , , ,

**Subjects**: Machine Learning (stat.ML) Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)Sleep stage classification constitutes an important preliminary exam in the

diagnosis of sleep disorders and is traditionally performed by a sleep expert

who assigns to each 30s of signal a sleep stage, based on the visual inspection

of signals such as electroencephalograms (EEG), electrooculograms (EOG),

electrocardiograms (ECG) and electromyograms (EMG). In this paper, we introduce

the first end-to-end deep learning approach that performs automatic temporal

sleep stage classification from multivariate and multimodal Polysomnography

(PSG) signals. We build a general deep architecture which can extract

information from EEG, EOG and EMG channels and pools the learnt representations

into a final softmax classifier. The architecture is light enough to be

distributed in time in order to learn from the temporal context of each sample,

namely previous and following data segments. Our model, which is unique in its

ability to learn a feature representation from multiple modalities, is compared

to alternative automatic approaches based on convolutional networks or

decisions trees. Results obtained on 61 publicly available PSG records with up

to 20 EEG channels demonstrate that our network architecture yields

state-of-the-art performance. Our study reveals a number of insights on the

spatio-temporal distribution of the signal of interest: a good trade-off for

optimal classification performance measured with balanced accuracy is to use 6

EEG with some EOG and EMG channels. Also exploiting one minute of data before

and after each data segment to be classified offers the strongest improvement

when a limited number of channels is available. Our approach aims to improve a

key step in the study of sleep disorders. As sleep experts, our system exploits

the multivariate and multimodal character of PSG signals to deliver

state-of-the-art classification performance at a very low complexity cost.

, ,

**Subjects**: Optimization and Control (math.OC) Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)This paper provides a set of sensitivity analysis and activity identification

results for a class of convex functions with a strong geometric structure, that

we coined “mirror-stratifiable”. These functions are such that there is a

bijection between a primal and a dual stratification of the space into

partitioning sets, called strata. This pairing is crucial to track the strata

that are identifiable by solutions of parametrized optimization problems or by

iterates of optimization algorithms. This class of functions encompasses all

regularizers routinely used in signal and image processing, machine learning,

and statistics. We show that this “mirror-stratifiable” structure enjoys a nice

sensitivity theory, allowing us to study stability of solutions of optimization

problems to small perturbations, as well as activity identification of

first-order proximal splitting-type algorithms. Existing results in the

literature typically assume that, under a non-degeneracy condition, the active

set associated to a minimizer is stable to small perturbations and is

identified in finite time by optimization schemes. In contrast, our results do

not require any non-degeneracy assumption: in consequence, the optimal active

set is not necessarily stable anymore, but we are able to track precisely the

set of identifiable strata.We show that these results have crucial implications

when solving challenging ill-posed inverse problems via regularization, a

typical scenario where the non-degeneracy condition is not fulfilled. Our

theoretical results, illustrated by numerical simulations, allow to

characterize the instability behaviour of the regularized solutions, by

locating the set of all low-dimensional strata that can be potentially

identified by these solutions.

## Artificial Intelligence

,

**Subjects**: Artificial Intelligence (cs.AI) Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG)We introduce the Deep Symbolic Network (DSN) model, which aims at becoming

the white-box version of Deep Neural Networks (DNN). The DSN model provides a

simple, universal yet powerful structure, similar to DNN, to represent any

knowledge of the world, which is transparent to humans. The conjecture behind

the DSN model is that any type of real world objects sharing enough common

features are mapped into human brains as a symbol. Those symbols are connected

by links, representing the composition, correlation, causality, or other

relationships between them, forming a deep, hierarchical symbolic network

structure. Powered by such a structure, the DSN model is expected to learn like

humans, because of its unique characteristics. First, it is universal, using

the same structure to store any knowledge. Second, it can learn symbols from

the world and construct the deep symbolic networks automatically, by utilizing

the fact that real world objects have been naturally separated by

singularities. Third, it is symbolic, with the capacity of performing causal

deduction and generalization. Fourth, the symbols and the links between them

are transparent to us, and thus we will know what it has learned or not – which

is the key for the security of an AI system. Fifth, its transparency enables it

to learn with relatively small data. Sixth, its knowledge can be accumulated.

Last but not least, it is more friendly to unsupervised learning than DNN. We

present the details of the model, the algorithm powering its automatic learning

ability, and describe its usefulness in different use cases. The purpose of

this paper is to generate broad interest to develop it within an open source

project centered on the Deep Symbolic Network (DSN) model towards the

development of general AI.

Comments: 7 pages, 2 figures. Accepted for IJCAI 2017

**Subjects**:

Artificial Intelligence (cs.AI)

We propose and evaluate a new technique for learning hybrid automata

automatically by observing the runtime behavior of a dynamical system. Working

from a sequence of continuous state values and predicates about the

environment, CHARDA recovers the distinct dynamic modes, learns a model for

each mode from a given set of templates, and postulates causal guard conditions

which trigger transitions between modes. Our main contribution is the use of

information-theoretic measures (1)~as a cost function for data segmentation and

model selection to penalize over-fitting and (2)~to determine the likely causes

of each transition. CHARDA is easily extended with different classes of model

templates, fitting methods, or predicates. In our experiments on a complex

videogame character, CHARDA successfully discovers a reasonable

over-approximation of the character’s true behaviors. Our results also compare

favorably against recent work in automatically learning probabilistic timed

automata in an aircraft domain: CHARDA exactly learns the modes of these

simpler automata.

Comments: 8 pages, 2 figures. Accepted for CIG 2017

**Subjects**:

Artificial Intelligence (cs.AI)

While general game playing is an active field of research, the learning of

game design has tended to be either a secondary goal of such research or it has

been solely the domain of humans. We propose a field of research, Automated

Game Design Learning (AGDL), with the direct purpose of learning game designs

directly through interaction with games in the mode that most people experience

games: via play. We detail existing work that touches the edges of this field,

describe current successful projects in AGDL and the theoretical foundations

that enable them, point to promising applications enabled by AGDL, and discuss

next steps for this exciting area of study. The key moves of AGDL are to use

game programs as the ultimate source of truth about their own design, and to

make these design properties available to other systems and avenues of inquiry.

Comments: Published in SampTA 2017

**Subjects**:

Artificial Intelligence (cs.AI)

; Learning (cs.LG)

This paper provides a new similarity detection algorithm. Given an input set

of multi-dimensional data points, where each data point is assumed to be

multi-dimensional, and an additional reference data point for similarity

finding, the algorithm uses kernel method that embeds the data points into a

low dimensional manifold. Unlike other kernel methods, which consider the

entire data for the embedding, our method selects a specific set of kernel

eigenvectors. The eigenvectors are chosen to separate between the data points

and the reference data point so that similar data points can be easily

identified as being distinct from most of the members in the dataset.

, , , , ,

**Subjects**: Artificial Intelligence (cs.AI)This paper introduces the Intentional Unintentional (IU) agent. This agent

endows the deep deterministic policy gradients (DDPG) agent for continuous

control with the ability to solve several tasks simultaneously. Learning to

solve many tasks simultaneously has been a long-standing, core goal of

artificial intelligence, inspired by infant development and motivated by the

desire to build flexible robot manipulators capable of many diverse behaviours.

We show that the IU agent not only learns to solve many tasks simultaneously

but it also learns faster than agents that target a single task at-a-time. In

some cases, where the single task DDPG method completely fails, the IU agent

successfully solves the task. To demonstrate this, we build a playroom

environment using the MuJoCo physics engine, and introduce a grounded formal

language to automatically generate tasks.

Comments: AGI 2017

**Subjects**:

Artificial Intelligence (cs.AI)

Representing knowledge as high-dimensional vectors in a continuous semantic

vector space can help overcome the brittleness and incompleteness of

traditional knowledge bases. We present a method for performing deductive

reasoning directly in such a vector space, combining analogy, association, and

deduction in a straightforward way at each step in a chain of reasoning,

drawing on knowledge from diverse sources and ontologies.

Comments: 3 pages, Benelearn 2017 conference, Eindhoven

**Subjects**:

Artificial Intelligence (cs.AI)

; Learning (cs.LG)

We provide preliminary details and formulation of an optimization strategy

under current development that is able to automatically tune the parameters of

a Support Vector Machine over new datasets. The optimization strategy is a

heuristic based on Iterated Local Search, a modification of classic hill

climbing which iterates calls to a local search routine.

,

**Subjects**: Artificial Intelligence (cs.AI) Cryptography and Security (cs.CR); Learning (cs.LG)Machine learning based system are increasingly being used for sensitive tasks

such as security surveillance, guiding autonomous vehicle, taking investment

decisions, detecting and blocking network intrusion and malware etc. However,

recent research has shown that machine learning models are venerable to attacks

by adversaries at all phases of machine learning (eg, training data collection,

training, operation). All model classes of machine learning systems can be

misled by providing carefully crafted inputs making them wrongly classify

inputs. Maliciously created input samples can affect the learning process of a

ML system by either slowing down the learning process, or affecting the

performance of the learned mode, or causing the system make error(s) only in

attacker’s planned scenario. Because of these developments, understanding

security of machine learning algorithms and systems is emerging as an important

research area among computer security and machine learning researchers and

practitioners. We present a survey of this emerging area in machine learning.

, , ,

**Subjects**: Artificial Intelligence (cs.AI) Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)Deep neural networks excel in regimes with large amounts of data, but tend to

struggle when data is scarce or when they need to adapt quickly to changes in

the task. Recent work in meta-learning seeks to overcome this shortcoming by

training a meta-learner on a distribution of similar tasks; the goal is for the

meta-learner to generalize to novel but related tasks by learning a high-level

strategy that captures the essence of the problem it is asked to solve.

However, most recent approaches to meta-learning are extensively hand-designed,

either using architectures that are specialized to a particular application, or

hard-coding algorithmic components that tell the meta-learner how to solve the

task. We propose a class of simple and generic meta-learner architectures,

based on temporal convolutions, that is domain- agnostic and has no particular

strategy or algorithm encoded into it. We validate our

temporal-convolution-based meta-learner (TCML) through experiments pertaining

to both supervised and reinforcement learning, and demonstrate that it

outperforms state-of-the-art methods that are less general and more complex.

Comments: 15 pages, 7 figures

**Subjects**:

Artificial Intelligence (cs.AI)

A number of intriguing decision scenarios revolve around partitioning a

collection of objects to optimize some application specific objective function.

This problem is generally referred to as the Object Partitioning Problem (OPP)

and is known to be NP-hard. We here consider a particularly challenging version

of OPP, namely, the Stochastic On-line Equi-Partitioning Problem (SO-EPP). In

SO-EPP, the target partitioning is unknown and has to be inferred purely from

observing an on-line sequence of object pairs. The paired objects belong to the

same partition with probability (p) and to different partitions with

probability (1-p), with (p) also being unknown. As an additional complication,

the partitions are required to be of equal cardinality. Previously, only

sub-optimal solution strategies have been proposed for SO- EPP. In this paper,

we propose the first optimal solution strategy. In brief, the scheme that we

propose, BN-EPP, is founded on a Bayesian network representation of SO-EPP

problems. Based on probabilistic reasoning, we are not only able to infer the

underlying object partitioning with optimal accuracy. We are also able to

simultaneously infer (p), allowing us to accelerate learning as object pairs

arrive. Furthermore, our scheme is the first to support arbitrary constraints

on the partitioning (Constrained SO-EPP). Being optimal, BN-EPP provides

superior performance compared to existing solution schemes. We additionally

introduce Walk-BN-EPP, a novel WalkSAT inspired algorithm for solving large

scale BN-EPP problems. Finally, we provide a BN-EPP based solution to the

problem of order picking, a representative real-life application of BN-EPP.

Comments: 27 pages

**Subjects**:

Artificial Intelligence (cs.AI)

We investigate a generalisation of the coherent choice functions considered

by Seidenfeld et al. (2010), by sticking to the convexity axiom but imposing no

Archimedeanity condition. We define our choice functions on vector spaces of

options, which allows us to incorporate as special cases both Seidenfeld et

al.’s (2010) choice functions on horse lotteries and sets of desirable gambles

(Quaeghebeur, 2014), and to investigate their connections. We show that choice

functions based on sets of desirable options (gambles) satisfy Seidenfeld’s

convexity axiom only for very particular types of sets of desirable options,

which are in a one-to-one relationship with the lexicographic probabilities. We

call them lexicographic choice functions. Finally, we prove that these choice

functions can be used to determine the most conservative convex choice function

associated with a given binary relation.

Comments: Submitted to NIPS 2017, Long Beach. YuXuan Liu and Abhishek Gupta had equal contribution

**Subjects**:

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

Imitation learning is an effective approach for autonomous systems to acquire

control policies when an explicit reward function is unavailable, using

supervision provided as demonstrations from an expert, typically a human

operator. However, standard imitation learning methods assume that the agent

receives examples of observation-action tuples that could be provided, for

instance, to a supervised learning algorithm. This stands in contrast to how

humans and animals imitate: we observe another person performing some behavior

and then figure out which actions will realize that behavior, compensating for

changes in viewpoint, surroundings, and embodiment. We term this kind of

imitation learning as imitation-from-observation and propose an imitation

learning method based on video prediction with context translation and deep

reinforcement learning. This lifts the assumption in imitation learning that

the demonstration should consist of observations and actions in the same

environment, and enables a variety of interesting applications, including

learning robotic skills that involve tool use simply by observing videos of

human tool use. Our experimental results show that our approach can perform

imitation-from-observation for a variety of real-world robotic tasks modeled on

common household chores, acquiring skills such as sweeping from videos of a

human demonstrator. Videos can be found at

Comments: published in IEEE Intelligent Vehicles Symposium, 2017

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

In this paper, we present RegNet, the first deep convolutional neural network

(CNN) to infer a 6 degrees of freedom (DOF) extrinsic calibration between

multimodal sensors, exemplified using a scanning LiDAR and a monocular camera.

Compared to existing approaches, RegNet casts all three conventional

calibration steps (feature extraction, feature matching and global regression)

into a single real-time capable CNN. Our method does not require any human

interaction and bridges the gap between classical offline and target-less

online calibration approaches as it provides both a stable initial estimation

as well as a continuous online correction of the extrinsic parameters. During

training we randomly decalibrate our system in order to train RegNet to infer

the correspondence between projected depth measurements and RGB image and

finally regress the extrinsic calibration. Additionally, with an iterative

execution of multiple CNNs, that are trained on different magnitudes of

decalibration, our approach compares favorably to state-of-the-art methods in

terms of a mean calibration error of 0.28 degrees for the rotational and 6 cm

for the translation components even for large decalibrations up to 1.5 m and 20

degrees.

Comments: 14 pages

**Subjects**:

Robotics (cs.RO)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

Robotic motion planning problems are typically solved by constructing a

search tree of valid maneuvers from a start to a goal configuration. Limited

onboard computation and real-time planning constraints impose a limit on how

large this search tree can grow. Heuristics play a crucial role in such

situations by guiding the search towards potentially good directions and

consequently minimizing search effort. Moreover, it must infer such directions

in an efficient manner using only the information uncovered by the search up

until that time. However, state of the art methods do not address the problem

of computing a heuristic that explicitly minimizes search effort. In this

paper, we do so by training a heuristic policy that maps the partial

information from the search to decide which node of the search tree to expand.

Unfortunately, naively training such policies leads to slow convergence and

poor local minima. We present SaIL, an efficient algorithm that trains

heuristic policies by imitating “clairvoyant oracles” – oracles that have full

information about the world and demonstrate decisions that minimize search

effort. We leverage the fact that such oracles can be efficiently computed

using dynamic programming and derive performance guarantees for the learnt

heuristic. We validate the approach on a spectrum of environments which show

that SaIL consistently outperforms state of the art algorithms. Our approach

paves the way forward for learning heuristics that demonstrate an anytime

nature – finding feasible solutions quickly and incrementally refining it over

time.

, , , ,

**Subjects**: Computer Vision and Pattern Recognition (cs.CV) Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)Achieving artificial visual reasoning – the ability to answer image-related

questions which require a multi-step, high-level process – is an important step

towards artificial general intelligence. This multi-modal task requires

learning a question-dependent, structured reasoning process over images from

language. Standard deep learning approaches tend to exploit biases in the data

rather than learn this underlying structure, while leading methods learn to

visually reason successfully but are hand-crafted for reasoning. We show that a

general-purpose, Conditional Batch Normalization approach achieves

state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4%

error rate. We outperform the next best end-to-end method (4.5%) which uses

data augmentation and even methods that use extra supervision (3.1%). We probe

our model to shed light on how it reasons, showing it has learned a

question-dependent, multi-step process. Previous work has operated under the

assumption that visual reasoning calls for a specialized architecture, but we

show that a general architecture with proper conditioning can learn to visually

reason effectively.

, , ,

**Subjects**: Human-Computer Interaction (cs.HC) Artificial Intelligence (cs.AI)Human interactions are characterized by explicit as well as implicit channels

of communication. While the explicit channel transmits overt messages, the

implicit ones transmit hidden messages about the communicator (e.g., his/her

intentions and attitudes). There is a growing consensus that providing a

computer with the ability to manipulate implicit affective cues should allow

for a more meaningful and natural way of studying particular non-verbal signals

of human-human communications by human-computer interactions. In this pilot

study, we created a non-dynamic human-computer interaction while manipulating

three specific non-verbal channels of communication: gaze pattern, facial

expression, and gesture. Participants rated the virtual agent on affective

dimensional scales (pleasure, arousal, and dominance) while their physiological

signal (electrodermal activity, EDA) was captured during the interaction.

Assessment of the behavioral data revealed a significant and complex three-way

interaction between gaze, gesture, and facial configuration on the dimension of

pleasure, as well as a main effect of gesture on the dimension of dominance.

These results suggest a complex relationship between different non-verbal cues

and the social context in which they are interpreted. Qualifying considerations

as well as possible next steps are further discussed in light of these

exploratory findings.

## Information Retrieval

Comments: 12 pages, 2 figures

**Subjects**:

Information Retrieval (cs.IR)

In the e-commerce world, the follow-up of prices in detail web pages is of

great interest for things like buying a product when it falls below some

threshold. For doing this task, instead of bookmarking the pages and revisiting

them, in this paper we propose a novel web data extraction system, called

Wextractor. It consists of an extraction method and a web app for listing the

retrieved prices. As for the final user, the main feature of Wextractor is

usability because (s)he only has to signal the pages of interest and our system

automatically extracts the price from the page.

,

**Subjects**: Information Retrieval (cs.IR) Machine Learning (stat.ML)Recommender systems are widely used to predict personalized preferences of

goods or services using users’ past activities, such as item ratings or

purchase histories. If collections of such personal activities were made

publicly available, they could be used to personalize a diverse range of

services, including targeted advertisement or recommendations. However, there

would be an accompanying risk of privacy violations. The pioneering work of

Narayanan et al. demonstrated that even if the identifiers are eliminated, the

public release of user ratings can allow for the identification of users by

those who have only a small amount of data on the users’ past ratings.

In this paper, we assume the following setting. A collector collects user

ratings, then anonymizes and distributes them. A recommender constructs a

recommender system based on the anonymized ratings provided by the collector.

Based on this setting, we exhaustively list the models of recommender systems

that use anonymized ratings. For each model, we then present an item-based

collaborative filtering algorithm for making recommendations based on

anonymized ratings. Our experimental results show that an item-based

collaborative filtering based on anonymized ratings can perform better than

collaborative filterings based on 5–10 non-anonymized ratings. This surprising

result indicates that, in some settings, privacy protection does not

necessarily reduce the usefulness of recommendations. From the experimental

analysis of this counterintuitive result, we observed that the sparsity of the

ratings can be reduced by anonymization and the variance of the prediction can

be reduced if (k), the anonymization parameter, is appropriately tuned. In this

way, the predictive performance of recommendations based on anonymized ratings

can be improved in some settings.

Comments: Proceedings of Terminology and Knowledge Engineering 2014 (TKE’14), Berlin

**Subjects**:

Information Retrieval (cs.IR)

This paper presents a procedure to retrieve subsets of relevant documents

from large text collections for Content Analysis, e.g. in social sciences.

Document retrieval for this purpose needs to take account of the fact that

analysts often cannot describe their research objective with a small set of key

terms, especially when dealing with theoretical or rather abstract research

interests. Instead, it is much easier to define a set of paradigmatic documents

which reflect topics of interest as well as targeted manner of speech. Thus, in

contrast to classic information retrieval tasks we employ manually compiled

collections of reference documents to compose large queries of several hundred

key terms, called dictionaries. We extract dictionaries via Topic Models and

also use co-occurrence data from reference collections. Evaluations show that

the procedure improves retrieval results for this purpose compared to

alternative methods of key term extraction as well as neglecting co-occurrence

data.

## Computation and Language

Comments: 6 pages, 1 figure, 3 tables

**Subjects**:

Computation and Language (cs.CL)

Identifying public misinformation is a complicated and challenging task.

Stance detection, i.e. determining the relative perspective a news source takes

towards a specific claim, is an important part of evaluating the veracity of

the assertion. Automating the process of stance detection would arguably

benefit human fact checkers. In this paper, we present our stance detection

model which claimed third place in the first stage of the Fake News Challenge.

Despite our straightforward approach, our model performs at a competitive level

with the complex ensembles of the top two winning teams. We therefore propose

our model as the ‘simple but tough-to-beat baseline’ for the Fake News

Challenge stance detection task.

Comments: Proceedings of the 12th International conference on Terminology and Knowledge Engineering (TKE 2016)

**Subjects**:

Computation and Language (cs.CL)

In terminology work, natural language processing, and digital humanities,

several studies address the analysis of variations in context and meaning of

terms in order to detect semantic change and the evolution of terms. We

distinguish three different approaches to describe contextual variations:

methods based on the analysis of patterns and linguistic clues, methods

exploring the latent semantic space of single words, and methods for the

analysis of topic membership. The paper presents the notion of context

volatility as a new measure for detecting semantic change and applies it to key

term extraction in a political science case study. The measure quantifies the

dynamics of a term’s contextual variation within a diachronic corpus to

identify periods of time that are characterised by intense controversial

debates or substantial semantic transformations.

Comments: Proceedings of Terminology and Knowledge Engineering 2014 (TKE’14), Berlin

**Subjects**:

Computation and Language (cs.CL)

This paper presents the “Leipzig Corpus Miner”, a technical infrastructure

for supporting qualitative and quantitative content analysis. The

infrastructure aims at the integration of ‘close reading’ procedures on

individual documents with procedures of ‘distant reading’, e.g. lexical

characteristics of large document collections. Therefore information retrieval

systems, lexicometric statistics and machine learning procedures are combined

in a coherent framework which enables qualitative data analysts to make use of

state-of-the-art Natural Language Processing techniques on very large document

collections. Applicability of the framework ranges from social sciences to

media studies and market research. As an example we introduce the usage of the

framework in a political science study on post-democracy and neoliberalism.

Comments: 12 pages, 2 figures, 5 tables

Journal-ref: In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing

from Raw Text to Universal Dependencies, pages 152-162, Vancouver, Canada,

2017

**Subjects**:

Computation and Language (cs.CL)

The LyS-FASTPARSE team presents BIST-COVINGTON, a neural implementation of

the Covington (2001) algorithm for non-projective dependency parsing. The

bidirectional LSTM approach by Kipperwasser and Goldberg (2016) is used to

train a greedy parser with a dynamic oracle to mitigate error propagation. The

model participated in the CoNLL 2017 UD Shared Task. In spite of not using any

ensemble methods and using the baseline segmentation and PoS tagging, the

parser obtained good results on both macro-average LAS and UAS in the big

treebanks category (55 languages), ranking 7th out of 33 teams. In the all

treebanks category (LAS and UAS) we ranked 16th and 12th. The gap between the

all and big categories is mainly due to the poor performance on four parallel

PUD treebanks, suggesting that some `suffixed’ treebanks (e.g. Spanish-AnCora)

perform poorly on cross-treebank settings, which does not occur with the

corresponding `unsuffixed’ treebank (e.g. Spanish). By changing that, we obtain

the 11th best LAS among all runs (official and unofficial). The code is made

available at

Comments: 13 pages, 2 figures

**Subjects**:

Computation and Language (cs.CL)

Progress in natural language interfaces to databases (NLIDB) has been slow

mainly due to linguistic issues (such as language ambiguity) and domain

portability. Moreover, the lack of a large corpus to be used as a standard

benchmark has made data-driven approaches difficult to develop and compare. In

this paper, we revisit the problem of NLIDBs and recast it as a sequence

translation problem. To this end, we introduce a large dataset extracted from

the Stack Exchange Data Explorer website, which can be used for training neural

natural language interfaces for databases. We also report encouraging baseline

results on a smaller manually annotated test corpus, obtained using an

attention-based sequence-to-sequence neural network.

, , ,

**Subjects**: Computation and Language (cs.CL)In this paper we present the model used by the team Rivercorners for the 2017

Comments: 14 pages, 6 pages of supplementary, 10 figures

**Subjects**:

Computation and Language (cs.CL)

; Digital Libraries (cs.DL); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

Quantitative methods to measure the participation to parliamentary debate and

discourse of elected Members of Parliament and the parties they belong to are

lacking. This is an exploratory study in which we propose the development of a

new approach for a quantitative analysis of such participation. We utilize the

New Zealand governments Hansard database to construct a topic model of

parliamentary speeches consisting of nearly 40 million words in the period 2003

to 2016. A Latent Dirichlet Allocation topic model is implemented in order to

reveal the thematic structure of our set of documents. This enables the

detection of major themes or topics that are publicly discussed in the New

Zealand parliament, as well as permitting their classification by MP. We

observe patterns arising from time-series analysis of topic frequencies which

can be related to specific social, economic and legislative events.

Comments: ACL 2017. The first two authors contributed equally

**Subjects**:

Computation and Language (cs.CL)

Recent work has proposed several generative neural models for constituency

parsing that achieve state-of-the-art results. Since direct search in these

generative models is difficult, they have primarily been used to rescore

candidate outputs from base parsers in which decoding is more straightforward.

We first present an algorithm for direct search in these generative models. We

then demonstrate that the rescoring results are at least partly due to implicit

model combination rather than reranking effects. Finally, we show that explicit

model combination can improve performance even further, resulting in new

state-of-the-art numbers on the PTB of 94.25 F1 when training only on gold data

and 94.66 F1 when using external data.

, , , ,

**Subjects**: Computer Vision and Pattern Recognition (cs.CV) Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)Achieving artificial visual reasoning – the ability to answer image-related

questions which require a multi-step, high-level process – is an important step

towards artificial general intelligence. This multi-modal task requires

learning a question-dependent, structured reasoning process over images from

language. Standard deep learning approaches tend to exploit biases in the data

rather than learn this underlying structure, while leading methods learn to

visually reason successfully but are hand-crafted for reasoning. We show that a

general-purpose, Conditional Batch Normalization approach achieves

state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4%

error rate. We outperform the next best end-to-end method (4.5%) which uses

data augmentation and even methods that use extra supervision (3.1%). We probe

our model to shed light on how it reasons, showing it has learned a

question-dependent, multi-step process. Previous work has operated under the

assumption that visual reasoning calls for a specialized architecture, but we

show that a general architecture with proper conditioning can learn to visually

reason effectively.

## Distributed, Parallel, and Cluster Computing

, ,

**Subjects**: Distributed, Parallel, and Cluster Computing (cs.DC) Databases (cs.DB); Social and Information Networks (cs.SI)The rapid growth of movement data sources such as GPS traces, traffic

networks and social media have provided analysts with the opportunity to

explore collective patterns of geographical movements in a nearly real-time

fashion. A fast and interactive visualization framework can help analysts to

understand these massive and dynamically changing datasets. However, previous

studies on movement visualization either ignore the unique properties of

geographical movement or are unable to handle today’s massive data. In this

paper, we develop MovePattern, a novel framework to 1) efficiently construct a

concise multi-level view of movements using a scalable and spatially-aware

MapReduce-based approach and 2) present a fast and highly interactive webbased

environment which engages vector-based visualization to include on-the-fly

customization and the ability to enhance analytical functions by storing

metadata for both places and movements. We evaluate the framework using the

movements of Twitter users captured from geo-tagged tweets. The experiments

confirmed that our framework is able to aggregate close to 180 million

movements in a few minutes. In addition, we run series of stress tests on the

front-end of the framework to ensure that simultaneous user queries do not lead

to long latency in the user response.

Comments: 8 pages, 7 figures, Proceedings for the 22nd International Conference on Computing in High Energy and Nuclear Physics

**Subjects**:

Distributed, Parallel, and Cluster Computing (cs.DC)

; High Energy Physics – Experiment (hep-ex)

Grid-control is a lightweight and highly portable open source submission tool

that supports virtually all workflows in high energy physics (HEP). Since 2007

it has been used by a sizeable number of HEP analyses to process tasks that

sometimes consist of up 100k jobs. grid-control is built around a powerful

plugin and configuration system, that allows users to easily specify all

aspects of the desired workflow. Job submission to a wide range of local or

remote batch systems or grid middleware is supported. Tasks can be conveniently

specified through the parameter space that will be processed, which can consist

of any number of variables and data sources with complex dependencies on each

other. Dataset information is processed through a configurable pipeline of

dataset filters, partition plugins and partition filters. The partition plugins

can take the number of files, size of the work units, metadata or combinations

thereof into account. All changes to the input datasets or variables are

propagated through the processing pipeline and can transparently trigger

adjustments to the parameter space and the job submission. While the core

functionality is completely experiment independent, integration with the CMS

computing environment is provided by a small set of plugins.

## Learning

Comments: Submitted to NIPS 2017, Long Beach. YuXuan Liu and Abhishek Gupta had equal contribution

**Subjects**:

Learning (cs.LG)

Imitation learning is an effective approach for autonomous systems to acquire

control policies when an explicit reward function is unavailable, using

supervision provided as demonstrations from an expert, typically a human

operator. However, standard imitation learning methods assume that the agent

receives examples of observation-action tuples that could be provided, for

instance, to a supervised learning algorithm. This stands in contrast to how

humans and animals imitate: we observe another person performing some behavior

and then figure out which actions will realize that behavior, compensating for

changes in viewpoint, surroundings, and embodiment. We term this kind of

imitation learning as imitation-from-observation and propose an imitation

learning method based on video prediction with context translation and deep

reinforcement learning. This lifts the assumption in imitation learning that

the demonstration should consist of observations and actions in the same

environment, and enables a variety of interesting applications, including

learning robotic skills that involve tool use simply by observing videos of

human tool use. Our experimental results show that our approach can perform

imitation-from-observation for a variety of real-world robotic tasks modeled on

common household chores, acquiring skills such as sweeping from videos of a

human demonstrator. Videos can be found at

Comments: In UAI proceedings

**Subjects**:

Learning (cs.LG)

; Machine Learning (stat.ML)

Inference in log-linear models scales linearly in the size of output space in

the worst-case. This is often a bottleneck in natural language processing and

computer vision tasks when the output space is feasibly enumerable but very

large. We propose a method to perform inference in log-linear models with

sublinear amortized cost. Our idea hinges on using Gumbel random variable

perturbations and a pre-computed Maximum Inner Product Search data structure to

access the most-likely elements in sublinear amortized time. Our method yields

provable runtime and accuracy guarantees. Further, we present empirical

experiments on ImageNet and Word Embeddings showing significant speedups for

sampling, inference, and learning in log-linear models.

, , , ,

**Subjects**: Learning (cs.LG)In this paper, we consider the temporal pattern in traffic flow time series,

and implement a deep learning model for traffic flow prediction. Detrending

based methods decompose original flow series into trend and residual series, in

which trend describes the fixed temporal pattern in traffic flow and residual

series is used for prediction. Inspired by the detrending method, we propose

DeepTrend, a deep hierarchical neural network used for traffic flow prediction

which considers and extracts the time-variant trend. DeepTrend has two stacked

layers: extraction layer and prediction layer. Extraction layer, a fully

connected layer, is used to extract the time-variant trend in traffic flow by

feeding the original flow series concatenated with corresponding simple average

trend series. Prediction layer, an LSTM layer, is used to make flow prediction

by feeding the obtained trend from the output of extraction layer and

calculated residual series. To make the model more effective, DeepTrend needs

first pre-trained layer-by-layer and then fine-tuned in the entire network.

Experiments show that DeepTrend can noticeably boost the prediction performance

compared with some traditional prediction models and LSTM with detrending based

methods.

Comments: 16 pages, 5 figures, Appears in Proceedings of the 31th AAAI Conference on Artificial Intelligence (AAAI), San Francisco, California, USA, pp. 2287–2293, 2017

**Subjects**:

Learning (cs.LG)

; Machine Learning (stat.ML)

Recently, many variance reduced stochastic alternating direction method of

multipliers (ADMM) methods (e.g. SAG-ADMM, SDCA-ADMM and SVRG-ADMM) have made

exciting progress such as linear convergence rates for strongly convex

problems. However, the best known convergence rate for general convex problems

is O(1/T) as opposed to O(1/T^2) of accelerated batch algorithms, where (T) is

the number of iterations. Thus, there still remains a gap in convergence rates

between existing stochastic ADMM and batch algorithms. To bridge this gap, we

introduce the momentum acceleration trick for batch optimization into the

stochastic variance reduced gradient based ADMM (SVRG-ADMM), which leads to an

accelerated (ASVRG-ADMM) method. Then we design two different momentum term

update rules for strongly convex and general convex cases. We prove that

ASVRG-ADMM converges linearly for strongly convex problems. Besides having a

low per-iteration complexity as existing stochastic ADMM methods, ASVRG-ADMM

improves the convergence rate on general convex problems from O(1/T) to

O(1/T^2). Our experimental results show the effectiveness of ASVRG-ADMM.

, ,

**Subjects**: Learning (cs.LG) Machine Learning (stat.ML)In this paper we propose a mixture model, SparseMix, for clustering of sparse

high dimensional binary data, which connects model-based with centroid-based

clustering. Every group is described by a representative and a probability

distribution modeling dispersion from this representative. In contrast to

classical mixture models based on EM algorithm, SparseMix:

-is especially designed for the processing of sparse data,

-can be efficiently realized by an on-line Hartigan optimization algorithm,

-is able to automatically reduce unnecessary clusters.

We perform extensive experimental studies on various types of data, which

confirm that SparseMix builds partitions with higher compatibility with

reference grouping than related methods. Moreover, constructed representatives

often better reveal the internal structure of data.

, ,

**Subjects**: Learning (cs.LG)TAPAS is a novel adaptive sampling method for the softmax model. It uses a

two pass sampling strategy where the examples used to approximate the gradient

of the partition function are first sampled according to a squashed population

distribution and then resampled adaptively using the context and current model.

We describe an efficient distributed implementation of TAPAS. We show, on both

synthetic data and a large real dataset, that TAPAS has low computational

overhead and works well for minimizing the rank loss for multi-class

classification problems with a very large label space.

, , , , , , ,

**Subjects**: Machine Learning (stat.ML) Learning (cs.LG)The natural world is infinitely diverse, yet this diversity arises from a

relatively small set of coherent properties and rules, such as the laws of

physics or chemistry. We conjecture that biological intelligent systems are

able to survive within their diverse environments by discovering the

regularities that arise from these rules primarily through unsupervised

experiences, and representing this knowledge as abstract concepts. Such

representations possess useful properties of compositionality and hierarchical

organisation, which allow intelligent agents to recombine afinite set of

conceptual building blocks into an exponentially large set of useful new

concepts. This paper describes SCAN (Symbol-Concept Association Network), a new

framework for learning such concepts in the visual domain. We first use the

previously published beta-VAE (Higgins et al., 2017a) architecture to learn a

disentangled representation of the latent structure of the visual world, before

training SCAN to extract abstract concepts grounded in such disentangled visual

primitives through fast symbol association. Our approach requires very few

pairings between symbols and images and makes no assumptions about the choice

of symbol representations.Once trained, SCAN is capable of multimodal

bi-directional inference, generating a diverse set of image samples from

symbolic descriptions and vice versa. It also allows for traversal and

manipulation of the implicit hierarchy of compositional visual concepts through

symbolic instructions and learnt logical recombination operations. Such

manipulations enable SCAN to invent and learn novel visual concepts through

recombination of the few learnt concepts.

, ,

**Subjects**: Machine Learning (stat.ML) Learning (cs.LG)In this paper we develop a novel computational sensing framework for sensing

and recovering structured signals. When trained on a set of representative

signals, our framework learns to take undersampled measurements and recover

signals from them using a deep convolutional neural network. In other words, it

learns a transformation from the original signals to a near-optimal number of

undersampled measurements and the inverse transformation from measurements to

signals. This is in contrast to traditional compressive sensing (CS) systems

that use random linear measurements and convex optimization or iterative

algorithms for signal recovery. We compare our new framework with

(ell_1)-minimization from the phase transition point of view and demonstrate

that it outperforms (ell_1)-minimization in the regions of phase transition

plot where (ell_1)-minimization cannot recover the exact solution. In

addition, we experimentally demonstrate how learning measurements enhances the

overall recovery performance, speeds up training of recovery framework, and

leads to having fewer parameters to learn.

,

**Subjects**: Artificial Intelligence (cs.AI) Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG)We introduce the Deep Symbolic Network (DSN) model, which aims at becoming

the white-box version of Deep Neural Networks (DNN). The DSN model provides a

simple, universal yet powerful structure, similar to DNN, to represent any

knowledge of the world, which is transparent to humans. The conjecture behind

the DSN model is that any type of real world objects sharing enough common

features are mapped into human brains as a symbol. Those symbols are connected

by links, representing the composition, correlation, causality, or other

relationships between them, forming a deep, hierarchical symbolic network

structure. Powered by such a structure, the DSN model is expected to learn like

humans, because of its unique characteristics. First, it is universal, using

the same structure to store any knowledge. Second, it can learn symbols from

the world and construct the deep symbolic networks automatically, by utilizing

the fact that real world objects have been naturally separated by

singularities. Third, it is symbolic, with the capacity of performing causal

deduction and generalization. Fourth, the symbols and the links between them

are transparent to us, and thus we will know what it has learned or not – which

is the key for the security of an AI system. Fifth, its transparency enables it

to learn with relatively small data. Sixth, its knowledge can be accumulated.

Last but not least, it is more friendly to unsupervised learning than DNN. We

present the details of the model, the algorithm powering its automatic learning

ability, and describe its usefulness in different use cases. The purpose of

this paper is to generate broad interest to develop it within an open source

project centered on the Deep Symbolic Network (DSN) model towards the

development of general AI.

Comments: 4 pages, 6 figures, NOLTA, 2017

**Subjects**:

Numerical Analysis (math.NA)

; Learning (cs.LG); Machine Learning (stat.ML)

Accurate real time crime prediction is a fundamental issue for public safety,

but remains a challenging problem for the scientific community. Crime

occurrences depend on many complex factors. Compared to many predictable

events, crime is sparse. At different spatio-temporal scales, crime

distributions display dramatically different patterns. These distributions are

of very low regularity in both space and time. In this work, we adapt the

state-of-the-art deep learning spatio-temporal predictor, ST-ResNet [Zhang et

al, AAAI, 2017], to collectively predict crime distribution over the Los

Angeles area. Our models are two staged. First, we preprocess the raw crime

data. This includes regularization in both space and time to enhance

predictable signals. Second, we adapt hierarchical structures of residual

convolutional units to train multi-factor crime prediction models. Experiments

over a half year period in Los Angeles reveal highly accurate predictive power

of our models.

,

**Subjects**: Optimization and Control (math.OC) Computational Complexity (cs.CC); Learning (cs.LG); Machine Learning (stat.ML)In this paper, we consider multi-stage stochastic optimization problems with

convex objectives and conic constraints at each stage. We present a new

stochastic first-order method, namely the dynamic stochastic approximation

(DSA) algorithm, for solving these types of stochastic optimization problems.

We show that DSA can achieve an optimal ({cal O}(1/epsilon^4)) rate of

convergence in terms of the total number of required scenarios when applied to

a three-stage stochastic optimization problem. We further show that this rate

of convergence can be improved to ({cal O}(1/epsilon^2)) when the objective

function is strongly convex. We also discuss variants of DSA for solving more

general multi-stage stochastic optimization problems with the number of stages

(T > 3). The developed DSA algorithms only need to go through the scenario tree

once in order to compute an (epsilon)-solution of the multi-stage stochastic

optimization problem. To the best of our knowledge, this is the first time that

stochastic approximation type methods are generalized for multi-stage

stochastic optimization with (T ge 3).

Comments: Published in SampTA 2017

**Subjects**:

Artificial Intelligence (cs.AI)

; Learning (cs.LG)

This paper provides a new similarity detection algorithm. Given an input set

of multi-dimensional data points, where each data point is assumed to be

multi-dimensional, and an additional reference data point for similarity

finding, the algorithm uses kernel method that embeds the data points into a

low dimensional manifold. Unlike other kernel methods, which consider the

entire data for the embedding, our method selects a specific set of kernel

eigenvectors. The eigenvectors are chosen to separate between the data points

and the reference data point so that similar data points can be easily

identified as being distinct from most of the members in the dataset.

Comments: 3 pages, Benelearn 2017 conference, Eindhoven

**Subjects**:

Artificial Intelligence (cs.AI)

; Learning (cs.LG)

We provide preliminary details and formulation of an optimization strategy

under current development that is able to automatically tune the parameters of

a Support Vector Machine over new datasets. The optimization strategy is a

heuristic based on Iterated Local Search, a modification of classic hill

climbing which iterates calls to a local search routine.

,

**Subjects**: Artificial Intelligence (cs.AI) Cryptography and Security (cs.CR); Learning (cs.LG)Machine learning based system are increasingly being used for sensitive tasks

such as security surveillance, guiding autonomous vehicle, taking investment

decisions, detecting and blocking network intrusion and malware etc. However,

recent research has shown that machine learning models are venerable to attacks

by adversaries at all phases of machine learning (eg, training data collection,

training, operation). All model classes of machine learning systems can be

misled by providing carefully crafted inputs making them wrongly classify

inputs. Maliciously created input samples can affect the learning process of a

ML system by either slowing down the learning process, or affecting the

performance of the learned mode, or causing the system make error(s) only in

attacker’s planned scenario. Because of these developments, understanding

security of machine learning algorithms and systems is emerging as an important

research area among computer security and machine learning researchers and

practitioners. We present a survey of this emerging area in machine learning.

Comments: published in IEEE Intelligent Vehicles Symposium, 2017

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

In this paper, we present RegNet, the first deep convolutional neural network

(CNN) to infer a 6 degrees of freedom (DOF) extrinsic calibration between

multimodal sensors, exemplified using a scanning LiDAR and a monocular camera.

Compared to existing approaches, RegNet casts all three conventional

calibration steps (feature extraction, feature matching and global regression)

into a single real-time capable CNN. Our method does not require any human

interaction and bridges the gap between classical offline and target-less

online calibration approaches as it provides both a stable initial estimation

as well as a continuous online correction of the extrinsic parameters. During

training we randomly decalibrate our system in order to train RegNet to infer

the correspondence between projected depth measurements and RGB image and

finally regress the extrinsic calibration. Additionally, with an iterative

execution of multiple CNNs, that are trained on different magnitudes of

decalibration, our approach compares favorably to state-of-the-art methods in

terms of a mean calibration error of 0.28 degrees for the rotational and 6 cm

for the translation components even for large decalibrations up to 1.5 m and 20

degrees.

, , ,

**Subjects**: Artificial Intelligence (cs.AI) Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)Deep neural networks excel in regimes with large amounts of data, but tend to

struggle when data is scarce or when they need to adapt quickly to changes in

the task. Recent work in meta-learning seeks to overcome this shortcoming by

training a meta-learner on a distribution of similar tasks; the goal is for the

meta-learner to generalize to novel but related tasks by learning a high-level

strategy that captures the essence of the problem it is asked to solve.

However, most recent approaches to meta-learning are extensively hand-designed,

either using architectures that are specialized to a particular application, or

hard-coding algorithmic components that tell the meta-learner how to solve the

task. We propose a class of simple and generic meta-learner architectures,

based on temporal convolutions, that is domain- agnostic and has no particular

strategy or algorithm encoded into it. We validate our

temporal-convolution-based meta-learner (TCML) through experiments pertaining

to both supervised and reinforcement learning, and demonstrate that it

outperforms state-of-the-art methods that are less general and more complex.

**Subjects**: Machine Learning (stat.ML) Learning (cs.LG)In recent years Variation Autoencoders have become one of the most popular

unsupervised learning of complicated distributions.Variational Autoencoder

(VAE) provides more efficient reconstructive performance over a traditional

autoencoder. Variational auto enocders make better approximaiton than MCMC. The

VAE defines a generative process in terms of ancestral sampling through a

cascade of hidden stochastic layers. They are a directed graphic models.

Variational autoencoder is trained to maximise the variational lower bound.

Here we are trying maximise the likelihood and also at the same time we are

trying to make a good approximation of the data. Its basically trading of the

data log-likelihood and the KL divergence from the true posterior. This paper

describes the scenario in which we wish to find a point-estimate to the

parameters ( heta) of some parametric model in which we generate each

observations by first sampling a local latent variable and then sampling the

associated observation. Here we use least square loss function with

regularization in the the reconstruction of the image, the least square loss

function was found to give better reconstructed images and had a faster

training time.

, ,

**Subjects**: Computer Vision and Pattern Recognition (cs.CV) Learning (cs.LG)The increase of vehicle in highways may cause traffic congestion as well as

in the normal roadways. Predicting the traffic flow in highways especially, is

demanded to solve this congestion problem. Predictions on time-series

multivariate data, such as in the traffic flow dataset, have been largely

accomplished through various approaches. The approach with conventional

prediction algorithms, such as with Support Vector Machine (SVM), is only

capable of accommodating predictions that are independent in each time unit.

Hence, the sequential relationships in this time series data is hardly

explored. Continuous Conditional Random Field (CCRF) is one of Probabilistic

Graphical Model (PGM) algorithms which can accommodate this problem. The

neighboring aspects of sequential data such as in the time series data can be

expressed by CCRF so that its predictions are more reliable. In this article, a

novel approach called DM-CCRF is adopted by modifying the CCRF prediction

algorithm to strengthen the probability of the predictions made by the baseline

regressor. The result shows that DM-CCRF is superior in performance compared to

CCRF. This is validated by the error decrease of the baseline up to 9%

significance. This is twice the standard CCRF performance which can only

decrease baseline error by 4.582% at most.

Comments: 3 pages, 6 figures, In Robotics: Science and Systems (RSS) 2017 Workshop of “POMDPs in Robotics: State of The Art, Challenges, and Opportunities”

**Subjects**:

Systems and Control (cs.SY)

; Learning (cs.LG); Robotics (cs.RO)

This paper studies the partially observed stochastic optimal control problem

for systems with state dynamics governed by Partial Differential Equations

(PDEs) that leads to an extremely large problem. First, an open-loop

deterministic trajectory optimization problem is solved using a black box

simulation model of the dynamical system. Next, a Linear Quadratic Gaussian

(LQG) controller is designed for the nominal trajectory-dependent linearized

system, which is identified using input-output experimental data consisting of

the impulse responses of the optimized nominal system. A computational

nonlinear heat example is used to illustrate the performance of the approach.

Comments: 14 pages

**Subjects**:

Robotics (cs.RO)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

Robotic motion planning problems are typically solved by constructing a

search tree of valid maneuvers from a start to a goal configuration. Limited

onboard computation and real-time planning constraints impose a limit on how

large this search tree can grow. Heuristics play a crucial role in such

situations by guiding the search towards potentially good directions and

consequently minimizing search effort. Moreover, it must infer such directions

in an efficient manner using only the information uncovered by the search up

until that time. However, state of the art methods do not address the problem

of computing a heuristic that explicitly minimizes search effort. In this

paper, we do so by training a heuristic policy that maps the partial

information from the search to decide which node of the search tree to expand.

Unfortunately, naively training such policies leads to slow convergence and

poor local minima. We present SaIL, an efficient algorithm that trains

heuristic policies by imitating “clairvoyant oracles” – oracles that have full

information about the world and demonstrate decisions that minimize search

effort. We leverage the fact that such oracles can be efficiently computed

using dynamic programming and derive performance guarantees for the learnt

heuristic. We validate the approach on a spectrum of environments which show

that SaIL consistently outperforms state of the art algorithms. Our approach

paves the way forward for learning heuristics that demonstrate an anytime

nature – finding feasible solutions quickly and incrementally refining it over

time.

Comments: Rejected by ICDAR 2017

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG)

The handwritten string recognition is still a challengeable task, though the

powerful deep learning tools were introduced. In this paper, based on TAO-FCN,

we proposed an end-to-end system for handwritten string recognition. Compared

with the conventional methods, there is no preprocess nor manually designed

rules employed. With enough labelled data, it is easy to apply the proposed

method to different applications. Although the performance of the proposed

method may not be comparable with the state-of-the-art approaches, it’s

usability and robustness are more meaningful for practical applications.

## Information Theory

, ,

**Subjects**: Information Theory (cs.IT) Optimization and Control (math.OC)This paper studies sensor calibration in spectral estimation where the true

frequencies are located on a continuous domain. We consider a uniform array of

sensors that collects measurements whose spectrum is composed of a finite

number of frequencies, where each sensor has an unknown calibration parameter.

Our goal is to recover the spectrum and the calibration parameters

simultaneously from multiple snapshots of the measurements.In the noiseless

case, we prove uniqueness of this problem up to certain trivial, inevitable

ambiguities with an infinite number of snapshots as long as there are more

sensors than frequencies based on an algebraic method. We then analyze the

sensitivity of this approach with respect to the number of snapshots and noise.

We next propose an optimization approach that makes full use of the

measurements and consider a non-convex objective over all calibration

parameters and Toeplitz matrices. This objective is non-negative and

continuously differentiable. We prove that, in the case of infinite snapshots

of noiseless measurements, the objective vanishes only at the equivalent

solutions to the true calibration parameters and the measurement covariance

matrix with all calibration parameters being (1) which exhibits a Toeplitz

structure. The objective is minimized using Wirtinger gradient descent which we

prove converges to a critical point. We show empirically that this critical

point provides a good approximation of the true calibration parameters and the

underlying frequencies.

,

**Subjects**: Information Theory (cs.IT)The bound that arises out of sparse recovery analysis in compressed sensing

involves input signal sparsity and some property of the sensing matrix. An

effort has therefore been made in the literature to optimize sensing matrices

for optimal recovery using this property. We discover, in the specific case of

optimizing codes for the CACTI camera, that the popular method of mutual

coherence minimization does not produce optimal results: codes designed to

optimize effective dictionary coherence often perform worse than random codes

in terms of mean squared reconstruction error.

This surprising phenomenon leads us to investigate the reliability of the

coherence bound for matrix optimization, in terms of its looseness. We examine,

on simulated data, the looseness of the bound as it propagates across various

steps of the inequalities in a derivation leading to the final bound. We then

similarly examine an alternate bound derived by Tang, G. et al, based on the

(ell_1/ell_{infty}) notion of sparsity, which is a compromise between

coherence and the restricted isometry constant (RIC). Moreover, we also perform

a bound looseness analysis for the RIC as derived by Cai, T. et al. The

conclusion of these efforts is that coherence optimization is problematic not

only because of the coherence bound on the RIC, but also the RIC bound itself.

These negative results imply that despite the success of previous work in

designing sensing matrices based on optimization of a matrix quality factor,

one needs to exercise caution in using them for practical sensing matrix

design.

We then introduce a paradigm for optimizing sensing matrices that overcomes

the looseness of compressed sensing upper bounds using an average case error

approach. We show a proof-of-concept design using this paradigm that performs

convincingly better than coherence-based design in the CACTI case, and no worse

for general matrices.

Comments: 5 pages, 4 figures. arXiv admin note: substantial text overlap with

**Subjects**:

Information Theory (cs.IT)

Intersections are critical areas of the transportation infrastructure

associated with 47% of all road accidents. Vehicle-to-vehicle (V2V)

communication has the potential of preventing up to 35% of such serious road

collisions. In fact, under the 5G/LTE Rel.15+ standardization, V2V is a

critical use-case not only for the purpose of enhancing road safety, but also

for enabling traffic efficiency in modern smart cities. Under this anticipated

5G definition, high reliability of 0.99999 is expected for semi-autonomous

vehicles (i.e., driver-in-the-loop). As a consequence, there is a need to

assess the reliability, especially for accident-prone areas, such as

intersections. We unpack traditional average V2V reliability in order to

quantify its related fine-grained V2V reliability. Contrary to existing work on

infinitely large roads, when we consider finite road segments of significance

to practical real-world deployment, fine-grained reliability exhibits bimodal

behavior. Performance for a certain vehicular traffic scenario is either very

reliable or extremely unreliable, but nowhere in relative proximity to the

average performance.

, , ,

**Subjects**: Information Theory (cs.IT)A Z2Z4-additive code C is called cyclic if the set of coordinates can be

partitioned into two subsets, the set of Z_2 and the set of Z_4 coordinates,

such that any cyclic shift of the coordinates of both subsets leaves the code

invariant. We study the binary images of Z2Z4-additive cyclic codes. We

determine all Z2Z4-additive cyclic codes with odd beta whose Gray images are

linear binary codes.

Comments: 26 pages

**Subjects**:

Information Theory (cs.IT)

The problem of accurate nonparametric estimation of distributional

functionals (integral functionals of one or more probability distributions) has

received recent interest due to their wide applicability in signal processing,

information theory, machine learning, and statistics. In particular,

(k)-nearest neighbor (nn) based methods have received a lot of attention due to

their adaptive nature and their relatively low computational complexity. We

derive the mean squared error (MSE) convergence rates of leave-one-out (k)-nn

plug-in density estimators of a large class of distributional functionals

without boundary correction. We then apply the theory of optimally weighted

ensemble estimation to obtain weighted ensemble estimators that achieve the

parametric MSE rate under assumptions that are competitive with the state of

the art. The asymptotic distributions of these estimators, which are unknown

for all other (k)-nn based distributional functional estimators, are also

presented which enables us to perform hypothesis testing.

Comments: 32 pages, 8 figures. Submitted to IEEE Transactions on Wireless Communications

**Subjects**:

Information Theory (cs.IT)

This paper considers pilot design to mitigate pilot contamination and provide

good service for everyone in multi-cell Massive MIMO (multiple input multiple

output) systems. Instead of modeling the pilot design as a combinatorial

assignment problem, as in prior works, we express the pilot signals using a

pilot basis and treat the associated power coefficients as continuous

optimization variables. We compute a lower bound on the uplink (UL) capacity

for Rayleigh fading channels with maximum ratio (MR) detection that can be

applied with arbitrary pilot signals. We further formulate the max-min fairness

problem under power budget constraints, with the pilot signals and data powers

as optimization variables. Although this optimization problem is

non-deterministic polynomial-time hard (NP-hard) due to signomial constraints,

we demonstrate how to obtain the globally optimal solution. We then propose an

efficient algorithm to obtain a local optimum with polynomial complexity. Our

framework serves as a benchmark for pilot design in scenarios with either ideal

or non-ideal hardware. Numerical results manifest that the proposed

optimization algorithms are nearly optimal and the new pilot structure and

optimization bring large gains over the state-of-the-art suboptimal pilot

design.

Comments: IEEE Wireless Communications Letters (accepted for publication)

**Subjects**:

Information Theory (cs.IT)

Wireless powered backscatter communications is an attractive technology for

next-generation low-powered sensor networks such as the Internet of Things.

However, backscattering suffers from collisions due to multiple simultaneous

transmissions and a dyadic backscatter channel, which greatly attenuate the

received signal at the reader. This letter deals with backscatter

communications in sensor networks from a large-scale point-of-view and

considers various collision resolution techniques: directional antennas,

ultra-narrow band transmissions and successive interference cancellation. We

derive analytical expressions for the decoding probability and our results show

the significant gains, which can be achieved from the aforementioned

techniques.

, , ,

**Subjects**: Machine Learning (stat.ML) Information Theory (cs.IT)End-to-end learning of communications systems is a fascinating novel concept

that has so far only been validated by simulations for block-based

transmissions. It allows learning of transmitter and receiver implementations

as deep neural networks (NNs) that are optimized for an arbitrary

differentiable end-to-end performance metric, e.g., block error rate (BLER). In

this paper, we demonstrate that over-the-air transmissions are possible: We

build, train, and run a complete communications system solely composed of NNs

using unsynchronized off-the-shelf software-defined radios (SDRs) and

open-source deep learning (DL) software libraries. We extend the existing ideas

towards continuous data transmission which eases their current restriction to

short block lengths but also entails the issue of receiver synchronization. We

overcome this problem by introducing a frame synchronization module based on

another NN. A comparison of the BLER performance of the “learned” system with

that of a practical baseline shows competitive performance close to 1 dB, even

without extensive hyperparameter tuning. We identify several practical

challenges of training such a system over actual channels, in particular the

missing channel gradient, and propose a two-step learning procedure based on

the idea of transfer learning that circumvents this issue.

Comments: arXiv admin note: text overlap with

**Subjects**:

Commutative Algebra (math.AC)

; Information Theory (cs.IT)

We study the (r)-th generalized minimum distance function (gmd function for

short) and the corresponding generalized footprint function of a graded ideal

in a polynomial ring over a field. If (mathbb{X}) is a set of projective

points over a finite field and (I(mathbb{X})) is its vanishing ideal, we show

that the gmd function and the Vasconcelos function of (I(mathbb{X})) are equal

to the (r)-th generalized Hamming weight of the corresponding Reed-Muller-type

code (C_mathbb{X}(d)). We show that the (r)-th generalized footprint function

of (I(mathbb{X})) is a lower bound for the (r)-th generalized Hamming weight

of (C_mathbb{X}(d)). As an application to coding theory we show an explicit

formula and a combinatorial formula for the second generalized Hamming weight

of an affine cartesian code.

Comments: 5 Pages

**Subjects**:

Computer Vision and Pattern Recognition (cs.CV)

; Information Theory (cs.IT)

This paper presents a novel autonomous quality metric to quantify the

rehabilitations progress of subjects with knee/hip operations. The presented

method supports digital analysis of human gait patterns using smartphones. The

algorithm related to the autonomous metric utilizes calibrated acceleration,

gyroscope and magnetometer signals from seven Inertial Measurement Unit

attached on the lower body in order to classify and generate the grading system

values. The developed Android application connects the seven Inertial

Measurement Units via Bluetooth and performs the data acquisition and

processing in real-time. In total nine features per acceleration direction and

lower body joint angle are calculated and extracted in real-time to achieve a

fast feedback to the user. We compare the classification accuracy and

quantification capabilities of Linear Discriminant Analysis, Principal

Component Analysis and Naive Bayes algorithms. The presented system is able to

classify patients and control subjects with an accuracy of up to 100\%. The

outcomes can be saved on the device or transmitted to treating physicians for

later control of the subject’s improvements and the efficiency of physiotherapy

treatments in motor rehabilitation. The proposed autonomous quality metric

solution bears great potential to be used and deployed to support digital

healthcare and therapy.

, ,

**Subjects**: Methodology (stat.ME) Information Theory (cs.IT)In this work, we propose a method for determining a non-uniform sampling

scheme for multi-dimensional signals by solving a convex optimization problem

reminiscent of the sensor selection problem. The resulting sampling scheme

minimizes the sum of the Cram’er-Rao lower bound for the parameters of

interest, given a desired number of sampling points. The proposed framework

allows for selecting an arbitrary subset of the parameters detailing the model,

as well as weighing the importance of the different parameters. Also presented

is a scheme for incorporating any imprecise a priori knowledge of the locations

of the parameters, as well as defining estimation performance bounds for the

parameters of interest. Numerical examples illustrate the efficiency of the

proposed scheme.

Comments: 6 pages, 2 figures

**Subjects**:

Quantum Physics (quant-ph)

; Computational Complexity (cs.CC); Information Theory (cs.IT)

A device-independent dimension test for a Bell experiment aims to estimate

the underlying Hilbert space dimension that is required to produce given

measurement statistical data without any other assumptions concerning the

quantum apparatus. Previous work mostly deals with the two-party version of

this problem. In this paper, we propose a very general and robust approach to

test the dimension of any subsystem in a multiparty Bell experiment. Our

dimension test stems from the study of a new multiparty scenario which we call

prepare-and-distribute. This is like the prepare-and-measure scenario, but the

quantum state is sent to multiple, non-communicating parties. Through specific

examples, we show that our test results can be tight. Furthermore, we compare

the performance of our test to results based on known bipartite tests, and

witness remarkable advantage, which indicates that our test is of a true

multiparty nature. We conclude by pointing out that with some partial

information about the quantum states involved in the experiment, it is possible

to learn other interesting properties beyond dimension.

#### 欢迎加入我爱机器学习QQ12群：648711796

微信扫一扫，关注我爱机器学习公众号

微博：我爱机器学习

本文原文地址：https://www.52ml.net/21996.html

以上为arXiv Paper Daily: Wed, 12 Jul 最新文章的全部内容，若您也有好的文章，欢迎与我们分享！